import matplotlib.pyplot as plt import numpy as np import pandas as pd from functools import reduce import re import probscale import seaborn as sns sns.set(style="ticks", font_scale=1.5) from pandas.plotting import register_matplotlib_converters register_matplotlib_converters() import urllib.request
def download_data(station_name, station_code): url_daily = 'https://www.ncei.noaa.gov/data/global-historical-climatology-network-daily/access/' url_monthly = 'https://www.ncei.noaa.gov/data/gsom/access/' # download daily data - uncomment to make this work urllib.request.urlretrieve(url_daily + station_code + '.csv', station_name + '_daily.csv') # download monthly data urllib.request.urlretrieve(url_monthly + station_code + '.csv', station_name + '_monthly.csv')
Download daily rainfall data for Eilat, Israel:
Then load the data into a dataframe
df = pd.read_csv('Eilat_daily.csv', sep=",") # make 'DATE' the dataframe index df['DATE'] = pd.to_datetime(df['DATE']) df = df.set_index('DATE') # IMPORTANT!! daily precipitation data is in tenths of mm, divide by 10 to get it in mm. df['PRCP'] = df['PRCP'] / 10 df
Plot precipitation data ('PRCP' column) and see if everything is all right.
Based on what you see, you might want to exclude certain periods, e.g.:
last_date = '2018-08-01' first_date = '1950-08-01' df = df[((df.index < last_date) & (df.index > first_date))]
The rainfall data for Eilat is VERY seasonal, it's easy to see that there is no rainfall at all during the summer. We can assume a hydrological year starting on 1 August. If you're not sure, you can plot the monthly means (see last week's lecture) and find what date makes sense best.
Let's resample the data according to the hydrological year (1 August), and we'll keep the maximum value:
max_annual = (df['PRCP'].resample('A-JUL') .max() .to_frame() )
Make two graphs: a) the histogram for the annual maximum (pdf) b) the cumulative probability (cdf)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8)) h=max_annual['PRCP'].values ax1.hist(h, bins=np.arange(0,700,50), density=True) ax2.hist(h, bins=np.arange(0,700,50), cumulative=1, density=True) ax1.set(ylabel="pdf") ax2.set(xlabel="annual max (mm)", ylabel="cdf", )
Compute the plotting position and return time. You'll need to order the data in ascending order:
max_annual = max_annual.sort_values(by=['PRCP'], ascending=True)
$P_m=$ plotting position, or probability of occurence for each event
$n=$ total number of events
$m=$ rank of each event, where $m=1$ is the lowest value, and $m=n$ is the highest
Plot the annual maximum against $P_m$ or against $T_r$.
Plot the annual maximum against the exceedance probability ($1-P_m$), in a log-log scale. Use
See what data you'll want to use for a linear fit.
Let's make a linear fit. Attention! Our data is not annual_max and exceedance_prob, but their log.
We make a linear fit using:
slope, intercept = np.polyfit(xdata, ydata, 1) # the number 1 in the order of the polynomial = linear
Write a function that receives an exceedance probability and returns the corresponding rainfall depth.
Everything we did today was for 24h rainfall events. We might be interested in extreme events in longer or shorter time scales. Using the following code, calculate the return time for 3-day rainfall events:
number_of_days = 3 df2 = (df['PRCP'].rolling(number_of_days) .sum() .dropna() )
All the rest after that is the same...