import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from functools import reduce
import re
import probscale
import seaborn as sns
sns.set(style="ticks", font_scale=1.5)
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import urllib.request
def download_data(station_name, station_code):
    url_daily = 'https://www.ncei.noaa.gov/data/global-historical-climatology-network-daily/access/'
    url_monthly = 'https://www.ncei.noaa.gov/data/gsom/access/'
#     download daily data - uncomment to make this work
    urllib.request.urlretrieve(url_daily + station_code + '.csv',
                               station_name + '_daily.csv')
    # download monthly data
    urllib.request.urlretrieve(url_monthly + station_code + '.csv',
                               station_name + '_monthly.csv')

Download daily rainfall data for Eilat, Israel:

download_data('Eilat', 'IS000009972')

Then load the data into a dataframe

df = pd.read_csv('Eilat_daily.csv', sep=",")
# make 'DATE' the dataframe index
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.set_index('DATE')
# IMPORTANT!! daily precipitation data is in tenths of mm, divide by 10 to get it in mm.
df['PRCP'] = df['PRCP'] / 10
df

Plot precipitation data ('PRCP' column) and see if everything is all right.

Based on what you see, you might want to exclude certain periods, e.g.:

last_date = '2018-08-01'
first_date = '1950-08-01'
df = df[((df.index < last_date) & (df.index > first_date))]

The rainfall data for Eilat is VERY seasonal, it's easy to see that there is no rainfall at all during the summer. We can assume a hydrological year starting on 1 August. If you're not sure, you can plot the monthly means (see last week's lecture) and find what date makes sense best.

Let's resample the data according to the hydrological year (1 August), and we'll keep the maximum value:

max_annual = (df['PRCP'].resample('A-JUL')
                        .max()
                        .to_frame()
             )

Make two graphs: a) the histogram for the annual maximum (pdf) b) the cumulative probability (cdf)

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8))

h=max_annual['PRCP'].values
ax1.hist(h, bins=np.arange(0,700,50), density=True)
ax2.hist(h, bins=np.arange(0,700,50), cumulative=1, density=True)

ax1.set(ylabel="pdf")
ax2.set(xlabel="annual max (mm)",
        ylabel="cdf",
        )

Compute the plotting position and return time. You'll need to order the data in ascending order:

max_annual = max_annual.sort_values(by=['PRCP'], ascending=True)

$P_m=$ plotting position, or probability of occurence for each event
$n=$ total number of events
$m=$ rank of each event, where $m=1$ is the lowest value, and $m=n$ is the highest

Weibull plotting position:

$$ P_m = \frac{m}{n+1} $$

Return period:

$$ \text{Return period} = T_r = \frac{1}{1-P_m} $$

Plot the annual maximum against $P_m$ or against $T_r$.

Plot the annual maximum against the exceedance probability ($1-P_m$), in a log-log scale. Use

ax.set_xscale("log")
ax.set_yscale("log")

See what data you'll want to use for a linear fit.

Let's make a linear fit. Attention! Our data is not annual_max and exceedance_prob, but their log.

We make a linear fit using:

slope, intercept = np.polyfit(xdata, ydata, 1) # the number 1 in the order of the polynomial = linear

Write a function that receives an exceedance probability and returns the corresponding rainfall depth.

Homework

Everything we did today was for 24h rainfall events. We might be interested in extreme events in longer or shorter time scales. Using the following code, calculate the return time for 3-day rainfall events:

number_of_days = 3
df2 = (df['PRCP'].rolling(number_of_days)
                 .sum()
                 .dropna()
      )

All the rest after that is the same...