import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from functools import reduce
import re
import probscale
import seaborn as sns
set(style="ticks", font_scale=1.5)
sns.from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()import urllib.request
8 Exercises
Import relevant packages
Go to NOAA’s National Centers for Environmental Information (NCEI)
Climate Data Online: Dataset Discovery
Find station codes in this map. On the left, click on the little wrench next to “Global Summary of the Month”, then click on “identify” on the panel that just opened, and click on a station (purple circle). You will see the station’s name, it’s ID, and the period of record. For example, for Ben-Gurion’s Airport in Israel:
BEN GURION, IS
STATION ID: ISM00040180
Period of Record: 1951-01-01 to 2020-03-01
You can download daily or monthly data for each station. Use the function below to download this data to your computer. station_name
can be whatever you want, station_code
is the station ID.
def download_data(station_name, station_code):
= 'https://www.ncei.noaa.gov/data/global-historical-climatology-network-daily/access/'
url_daily = 'https://www.ncei.noaa.gov/data/gsom/access/'
url_monthly # download daily data - uncomment the following 2 lines to make this work
# urllib.request.urlretrieve(url_daily + station_code + '.csv',
# station_name + '_daily.csv')
# download monthly data
+ station_code + '.csv',
urllib.request.urlretrieve(url_monthly + '_monthly.csv') station_name
Download daily rainfall data for Eilat, Israel. ID: IS000009972
'Eilat', 'IS000009972') download_data(
Then load the data into a dataframe.
IMPORTANT!! daily precipitation data is in tenths of mm, divide by 10 to get it in mm.
= pd.read_csv('Eilat_daily.csv', sep=",")
df # make 'DATE' the dataframe index
'DATE'] = pd.to_datetime(df['DATE'])
df[= df.set_index('DATE')
df # IMPORTANT!! daily precipitation data is in tenths of mm, divide by 10 to get it in mm.
'PRCP'] = df['PRCP'] / 10
df[ df
STATION | LATITUDE | LONGITUDE | ELEVATION | NAME | PRCP | PRCP_ATTRIBUTES | TMAX | TMAX_ATTRIBUTES | TMIN | TMIN_ATTRIBUTES | TAVG | TAVG_ATTRIBUTES | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DATE | |||||||||||||
1949-11-30 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | NaN | NaN | NaN | NaN | NaN | NaN |
1949-12-01 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | NaN | NaN | NaN | NaN | NaN | NaN |
1949-12-02 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | NaN | NaN | NaN | NaN | NaN | NaN |
1949-12-03 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | NaN | NaN | NaN | NaN | NaN | NaN |
1949-12-04 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-03-24 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,S | 287.0 | ,,S | NaN | NaN | 227.0 | H,,S |
2021-03-25 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | NaN | NaN | 253.0 | ,,S | 154.0 | ,,S | 202.0 | H,,S |
2021-03-26 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | NaN | NaN | 251.0 | ,,S | 134.0 | ,,S | 186.0 | H,,S |
2021-03-27 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | NaN | NaN | 222.0 | ,,S | 119.0 | ,,S | 173.0 | H,,S |
2021-03-28 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | NaN | NaN | 238.0 | ,,S | 119.0 | ,,S | 188.0 | H,,S |
26045 rows × 13 columns
Plot precipitation data (‘PRCP’ column) and see if everything is all right.
= plt.subplots(figsize=(10,7))
fig, ax 'PRCP'])
ax.plot(df["date")
ax.set_xlabel("daily rainfall (mm)")
ax.set_ylabel("Eilat, 1949–2021") ax.set_title(
Text(0.5, 1.0, 'Eilat, 1949–2021')
Based on what you see, you might want to exclude certain periods, e.g.:
= '2018-08-01'
last_date = '1950-08-01'
first_date = df[((df.index < last_date) & (df.index > first_date))]
df df
STATION | LATITUDE | LONGITUDE | ELEVATION | NAME | PRCP | PRCP_ATTRIBUTES | TMAX | TMAX_ATTRIBUTES | TMIN | TMIN_ATTRIBUTES | TAVG | TAVG_ATTRIBUTES | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DATE | |||||||||||||
1950-08-02 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | 400.0 | ,,G | 240.0 | ,,G | NaN | NaN |
1950-08-03 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | 410.0 | ,,G | 260.0 | ,,G | NaN | NaN |
1950-08-04 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | 400.0 | ,,G | 260.0 | ,,G | NaN | NaN |
1950-08-05 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | NaN | NaN | 240.0 | ,,G | NaN | NaN |
1950-08-06 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,E | 370.0 | ,,G | 240.0 | ,,G | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-07-27 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,S | 414.0 | ,,S | NaN | NaN | 359.0 | H,,S |
2018-07-28 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,S | 386.0 | ,,S | NaN | NaN | 329.0 | H,,S |
2018-07-29 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,S | NaN | NaN | 268.0 | ,,S | 334.0 | H,,S |
2018-07-30 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,S | 375.0 | ,,S | 277.0 | ,,S | 327.0 | H,,S |
2018-07-31 | IS000009972 | 29.55 | 34.95 | 12.0 | ELAT, IS | 0.0 | ,,S | 390.0 | ,,S | NaN | NaN | 336.0 | H,,S |
24836 rows × 13 columns
The rainfall data for Eilat is VERY seasonal, it’s easy to see that there is no rainfall at all during the summer. We can assume a hydrological year starting on 1 August. If you’re not sure, you can plot the monthly means (see last week’s lecture) and find what date makes sense best.
= df['PRCP'].resample('M').sum().to_frame()
df_month = np.arange(1,13)
month_numbers = np.array([]) # empty array
monthly_mean for m in month_numbers: # cycle over months (1, 2, 3, etc)
= df_month[df_month.index.month == m].mean() # this is the monthly mean
this_month_mean = np.append(monthly_mean, this_month_mean) # append
monthly_mean # make new df and return it
= pd.DataFrame({'monthly rainfall (mm)':monthly_mean,
df_month 'month number':month_numbers
})= plt.subplots(figsize=(10,7))
fig, ax 'month number'], df_month['monthly rainfall (mm)'])
ax.bar(df_month[set(xlabel="month",
ax.="monthly rainfall (mm)",
ylabel="Monthly average, Eilat, 1949--2018",
title=np.arange(1,13)); xticks
Let’s resample the data according to the hydrological year (1 August), and we’ll keep the maximum value:
= (df['PRCP'].resample('A-JUL')
max_annual max()
.
.to_frame()
) max_annual
PRCP | |
---|---|
DATE | |
1951-07-31 | 10.8 |
1952-07-31 | 15.0 |
1953-07-31 | 34.4 |
1954-07-31 | 24.3 |
1955-07-31 | 19.0 |
... | ... |
2014-07-31 | 11.5 |
2015-07-31 | 2.4 |
2016-07-31 | 8.5 |
2017-07-31 | 34.5 |
2018-07-31 | 11.7 |
68 rows × 1 columns
Make two graphs: a) the histogram for the annual maximum (pdf) b) the cumulative probability (cdf)
= plt.subplots(2, 1, figsize=(10,8))
fig, (ax1, ax2)
=max_annual['PRCP'].values
h=np.arange(0,100,10), density=True)
ax1.hist(h, bins=np.arange(0,100,10), cumulative=1, density=True)
ax2.hist(h, bins
set(ylabel="pdf")
ax1.set(xlabel="annual max (mm)",
ax2.="cdf",
ylabel; )
Compute the plotting position and return time. You’ll need to order the data in ascending order:
= max_annual.sort_values(by=['PRCP'], ascending=True) max_annual
\(P_m=\) plotting position, or probability of occurence for each event
\(n=\) total number of events
\(m=\) rank of each event, where \(m=1\) is the lowest value, and \(m=n\) is the highest
8.0.0.1 Weibull plotting position:
\[ P_m = \frac{m}{n+1} \]
8.0.0.2 Return period:
\[ \text{Return period} = T_r = \frac{1}{1-P_m} \]
Plot the annual maximum against \(P_m\) or against \(T_r\).
= plt.subplots(figsize=(10, 7))
fig, ax # resample daily data into yearly data (maximum yearly value)
= df['PRCP'].resample('A-JUL').max().to_frame()
max_annual # sort yearly max from lowest to highest
= max_annual.sort_values(by=['PRCP'], ascending=True)
max_annual 'rank'] = np.arange(1, len(max_annual) + 1)
max_annual[print(max_annual)
= len(max_annual['rank'])
n = max_annual['rank']
m = m / (n+1)
Pm = 1 / (1 - Pm)
Tr
# ax.plot(Tr, max_annual['PRCP'])
# ax.set(xlabel="return period (y)",
# ylabel="annual maximum (mm/24h)")
'PRCP'])
ax.plot(Pm, max_annual[set(xlabel="non-exeedance probability",
ax.="annual maximum (mm/24h)"); ylabel
PRCP rank
DATE
1996-07-31 0.5 1
2008-07-31 0.9 2
2000-07-31 1.2 3
2012-07-31 1.3 4
1959-07-31 1.5 5
... ... ...
1966-07-31 33.8 64
1953-07-31 34.4 65
2017-07-31 34.5 66
1981-07-31 40.6 67
1975-07-31 64.3 68
[68 rows x 2 columns]
Plot the annual maximum against the exceedance probability (\(1-P_m\)), in a log-log scale. Use
set(xscale="log",
ax."log")
yscale( )
See what data you’ll want to use for a linear fit.
= plt.subplots(figsize=(10, 6))
fig, ax
= max_annual['PRCP'].values
depth = (1-Pm).values
exc_prob
=3)
ax.plot(exc_prob, depth, lw
= 40
exclude = depth[exclude:]
depth_tofit = exc_prob[exclude:]
exc_prob_tofit 'o')
ax.plot(exc_prob_tofit, depth_tofit,
set(ylabel="annual maximum (mm/24h)",
ax.="exceedance probability",
xlabel="log",
xscale="log",
yscale; )
Let’s make a linear fit. Attention! Our data is not annual_max and exceedance_prob, but their log.
We make a linear fit using:
= np.polyfit(xdata, ydata, 1) # the number 1 in the order of the polynomial = linear slope, intercept
Write a function that receives an exceedance probability and returns the corresponding rainfall depth.
= plt.subplots(figsize=(10, 6))
fig, ax
= max_annual['PRCP'].values
depth = (1-Pm).values
exc_prob
=3, label="Weibull plotting position")
ax.plot(exc_prob, depth, lwset(ylabel="annual maximum (mm/24h)",
ax.="exceedance probability")
xlabel"log")
ax.set_xscale("log")
ax.set_yscale(
= 40
exclude = depth[exclude:]
depth_tofit = exc_prob[exclude:]
exc_prob_tofit
'o')
ax.plot(exc_prob_tofit, depth_tofit,
= np.log(exc_prob_tofit)
exc_prob_tofit_log = np.log(depth_tofit)
depth_tofit_log = np.polyfit(exc_prob_tofit_log, depth_tofit_log, 1)
slope, intercept
def equation(p):
return np.exp(slope*np.log(p) + intercept)
= [1e-3,1-1e-3]
prob =3, color="tab:red", alpha=0.4) ax.plot(prob, equation(prob), lw
8.0.1 Homework
Everything we did today was for 24h rainfall events. We might be interested in extreme events in longer or shorter time scales. Using the following code, calculate the return time for 3-day rainfall events:
= 3
number_of_days = (df['PRCP'].rolling(number_of_days)
df2 sum()
.
.dropna() )
All the rest after that is the same…