import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from calendar import month_abbr
import seaborn as sns
sns.set(style="ticks", font_scale=1.5)
import urllib.request


## intra-annual variability

Go to NOAA's National Centers for Environmental Information (NCEI)
Climate Data Online: Dataset Discovery

Find station codes in this map. On the left, click on the little wrench next to "Global Summary of the Month", then click on "identify" on the panel that just opened, and click on a station (purple circle). You will see the station's name, it's ID, and the period of record. For example, for Ben-Gurion's Airport in Israel:
BEN GURION, IS
STATION ID: ISM00040180
Period of Record: 1951-01-01 to 2020-03-01

def download_data(station_name, station_code):
url_daily = 'https://www.ncei.noaa.gov/data/global-historical-climatology-network-daily/access/'
url_monthly = 'https://www.ncei.noaa.gov/data/gsom/access/'
# urllib.request.urlretrieve(url_daily + station_code + '.csv',
#                            station_name + '_daily.csv')
urllib.request.urlretrieve(url_monthly + station_code + '.csv',
station_name + '_monthly.csv')


Now, choose any station with a period of record longer than 30 years, and download its data:

download_data('BEN GURION', 'ISM00040180')


Load the data into a datafram, and before you continue with the analysis, plot the rainfall data, to see how it looks like.

df = pd.read_csv('BEN GURION_monthly.csv', sep=",")
# make 'DATE' the dataframe index
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.set_index('DATE')
plt.plot(df['PRCP'])


It doesn't look great for Ben-Gurion airport. You might need to choose another station...

download_data('BEER SHEVA', 'IS000051690')


We need to aggregate all data from each month, so we can calculate monthly averages. How to do that?

# choose only the precipitation column
df_month = df['PRCP']
# calculate monthly mean
monthly_mean = np.array([])  # empty array
month_numbers = np.arange(1,13)
month_names = [month_abbr[i] for i in month_numbers]
# THIS IS THE IMPORTANT PART !!!!
# write a loop that takes the same month from all years (e.g. all January), and stores the mean on monthly_mean


Double click this cell to see a possible realization of this loop. Don't do this right away! Try to write your own code first!

Now it is time to create a new dataframe with the monthly means:

df_beersheva = pd.DataFrame({'monthly rainfall (mm)':monthly_mean,
'month names':month_names,
'month number':month_numbers
})


Plot the data and see if it makes sense. Try to get a figure like this one:

• double click this markdown cell to reveal the code I used to produce the figure. Don't do this right away, try to go as far as you can!

Let's calculate now the Walsh and Lawler Seasonality Index.
Write a function that receives a dataframe like the one we have just created, and returns the seasonality index.
http://leddris.aegean.gr/ses-parameters/293-rainfall-seasonality.html#:~:text=Rainfall%20seasonality%20index%20is%20a,in%20relation%20to%20water%20availability

$R=$ mean annual precipitation
$m_i$ precipitation mean for month $i$

$$SI = \displaystyle \frac{1}{R} \sum_{n=1}^{n=12} \left| m_i - \frac{R}{12} \right|$$

SI Precipitation Regime
<0.19 Precipitation spread throughout the year
0.20-0.39 Precipitation spread throughout the year, but with a definite wetter season
0.40-0.59 Rather seasonal with a short dry season
0.60-0.79 Seasonal
0.80-0.99 Marked seasonal with a long dry season
1.00-1.19 Most precipitation in < 3 months
• double click this markdown cell to reveal the code I wrote for this function. Don't do this right away, try to go as far as you can!

# interannual variability

Plot monthly rainfall for your station.

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,7))
ax1.plot(df['PRCP'])
ax2.plot(df['PRCP']['2010-07-01':'2015-07-01'])


How to aggregate rainfall accoding to the hydrological year? We use the function resample.

https://pandas.pydata.org/pandas-docs/version/0.12.0/timeseries.html#offset-aliases

also, annual resampling can be anchored to the end of specific months: https://pandas.pydata.org/pandas-docs/version/0.12.0/timeseries.html#anchored-offsets

# annual frequency, anchored 31 December
df_year_all = df['PRCP'].resample('A').sum().to_frame()
# annual frequency, anchored 01 January
df_year_all = df['PRCP'].resample('AS').sum().to_frame()
# annual frequency, anchored end of September
df_year_all = df['PRCP'].resample('A-SEP').sum().to_frame()
# rename 'PRCP' column to 'rain (mm)'
df_year_all.columns = ['rain (mm)']
df_year_all


you might need to exclude the first or the last line, since their data might have less that 12 months. For example

# exclude 1st row
df_year = df_year_all.iloc[1:]
# exclude last row
df_year = df_year_all.iloc[:-1]
# exclude both 1st and last rows
df_year = df_year_all.iloc[1:-1]


Calculate the average annual rainfall. Plot annual rainfall for the whole range, together with the average. You should get something like this:

• double click this markdown cell to reveal the code I used to produce the figure. Don't do this right away, try to go as far as you can!

Plot a histogram of annual rainfall, with the mean and standard deviation. Calculate the coefficient of variation. Try to plot something like this:

• double click this markdown cell to reveal the code I used to produce the figure. Don't do this right away, try to go as far as you can!

Calculate the mean annual rainfall for various 30-year intervals

• double click this markdown cell to reveal the code I used to produce the figure. Don't do this right away, try to go as far as you can!

## homework

1. Download both daily and monthly data for London (LONDON HEATHROW, ID: UKM00003772). You should be aware that 'PRCP' for monthly data is in millimeters, while 'PRCP' for daily data is in tens of millimiters.
2. Aggregate daily data into monthly intervals using resample('MS').sum(). 'MS' means that the sum of all days in the month will be stored in the first day of the month. Supposedly both datasets are equal now.
3. Calculate the average annual rainfall, using each of these datasets.
4. Why is there such a big difference?