5  motivation

5.1 Jerusalem, 2019

Data from the Israel Meteorological Service, IMS.

See the temperature at a weather station in Jerusalem, for the whole 2019 year. This is an interactive graph: to zoom in, play with the bottom panel.

discussion
The temperature fluctuates on various time scales, from daily to yearly. Let’s think together a few questions we’d like to ask about the data above.

Now let’s see precipitation data:

discussion
What would be interesting to know about precipitation?

We have not talked about what kind of data we have in our hands here. The csv file provided by the IMS looks like this:

Station Date & Time (Winter) Diffused radiation (W/m^2) Global radiation (W/m^2) Direct radiation (W/m^2) Relative humidity (%) Temperature (°C) Maximum temperature (°C) Minimum temperature (°C) Wind direction (°) Gust wind direction (°) Wind speed (m/s) Maximum 1 minute wind speed (m/s) Maximum 10 minutes wind speed (m/s) Time ending maximum 10 minutes wind speed (hhmm) Gust wind speed (m/s) Standard deviation wind direction (°) Rainfall (mm)
0 Jerusalem Givat Ram 01/01/2019 00:00 0.0 0.0 0.0 80.0 8.7 8.8 8.6 75.0 84.0 3.3 4.3 3.5 23:58 6.0 15.6 0.0
1 Jerusalem Givat Ram 01/01/2019 00:10 0.0 0.0 0.0 79.0 8.7 8.8 8.7 74.0 82.0 3.3 4.1 3.3 00:01 4.9 14.3 0.0
2 Jerusalem Givat Ram 01/01/2019 00:20 0.0 0.0 0.0 79.0 8.7 8.8 8.7 76.0 82.0 3.2 4.1 3.3 00:19 4.9 9.9 0.0
3 Jerusalem Givat Ram 01/01/2019 00:30 0.0 0.0 0.0 79.0 8.7 8.7 8.6 78.0 73.0 3.6 4.2 3.6 00:30 5.2 11.7 0.0
4 Jerusalem Givat Ram 01/01/2019 00:40 0.0 0.0 0.0 79.0 8.6 8.7 8.5 80.0 74.0 3.6 4.4 3.8 00:35 5.4 10.5 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
52549 Jerusalem Givat Ram 31/12/2019 22:20 0.0 0.0 1.0 81.0 7.4 7.6 7.3 222.0 255.0 0.5 0.9 1.0 22:11 1.0 47.9 0.0
52550 Jerusalem Givat Ram 31/12/2019 22:30 0.0 0.0 1.0 83.0 7.3 7.4 7.3 266.0 259.0 0.6 0.8 0.6 22:28 1.1 22.8 0.0
52551 Jerusalem Givat Ram 31/12/2019 22:40 0.0 0.0 1.0 83.0 7.5 7.6 7.3 331.0 317.0 0.5 0.8 0.6 22:35 1.0 31.6 0.0
52552 Jerusalem Givat Ram 31/12/2019 22:50 0.0 0.0 1.0 83.0 7.5 7.6 7.4 312.0 285.0 0.6 1.0 0.6 22:50 1.4 31.3 0.0
52553 Jerusalem Givat Ram 31/12/2019 23:00 0.0 0.0 1.0 83.0 7.6 7.7 7.4 315.0 321.0 0.7 1.0 0.8 22:54 1.3 23.5 0.0

52554 rows × 18 columns

We see that we have data points spaced out evenly every 10 minutes.

5.2 Challenges

Let’s try to answer the following questions:

First we have to divide temperature data by month, and then take the average for each month.

a possible solution
df_month = df['temperature'].resample('M').mean()

This is a bit trickier.

  1. We need to find the maximum/minimum temperature for each day.
  2. Only then we split the daily data by month and take the average.
a possible solution
df_day['max temp'] = df['temperature'].resample('D').max()
df_month['max temp'] = df_day['max temp'].resample('MS').mean()
  1. We need to filter our data to contain only night times.
  2. We need to divide rain data by seasons (3 months), and then take the mean for each season.
a possible solution
# filter only night data
df_night = df.loc[((df.index.hour < 6) | (df.index.hour >= 18))]
season_average_night_temp = df_night['temperature'].resample('Q').mean()
another possible solution
# filter using between_time
df_night = df.between_time('18:00', '06:00', inclusive='left')
season_average_night_temp = df_night['temperature'].resample('Q').mean()

First we have to divide rain data by day, and then take the sum for each day.

a possible solution
daily_precipitation = df['rain'].resample('D').sum()

We have to divide rain data by month, and then sum the totals of each month.

a possible solution
monthly_precipitation = df['rain'].resample('M').sum()
  1. We need to sum rain by day.
  2. We need to count how many days are there each month where rain > 0.
a possible solution
daily_precipitation = df['rain'].resample('D').sum()
only_rainy_days = daily_precipitation.loc[daily_precipitation > 0]
rain_days_per_month = only_rainy_days.resample('M').count()
  1. We need to divide our data into two: rainy_season_1 and rainy_season_2.
  2. We need to find the time of the last rain in rainy_season_1.
  3. We need to find the time of the first rain in rainy_season_2.
  4. We need to compute the time difference between the two dates.
a possible solution
split_date = '2019-08-01'
rainy_season_1 = df[:split_date]  # everything before split date
rainy_season_2 = df[split_date:]  # everything after split date
malkosh = rainy_season_1['rain'].loc[rainy_season_1['rain'] > 0].last_valid_index()
yoreh = rainy_season_2['rain'].loc[rainy_season_2['rain'] > 0].first_valid_index()
dry_period = yoreh - malkosh
# extracting days, hours, and minutes
days = dry_period.days
hours = dry_period.components.hours
minutes = dry_period.components.minutes
print(f'The dry period of 2019 was {days} days, {hours} hours and {minutes} minutes.')
  1. We need to filter our data to contain only morning times.
  2. We need to sum rain by day.
  3. We need to find the day with the maximum value.
a possible solution
# filter to only day data
morning_df = df.loc[((df.index.hour >= 6) & (df.index.hour < 18))]
morning_rain = morning_df['rain'].resample('D').sum()
rainiest_morning = morning_rain.idxmax()
# plot
morning_rain.plot()
plt.axvline(rainiest_morning, c='r', alpha=0.5, linestyle='--')
bonus solution
# filter to only night data
df_night = df.loc[((df.index.hour < 6) | (df.index.hour >= 18))]
# resampling night for each day is tricky because the date changes at 12:00. We can do this trick:
# we shift the time back by 6 hours so all the data for the same night will have the same date.
df_shifted = df_night.tshift(-6, freq='H')
night_rain = df_shifted['rain'].resample('D').sum()
rainiest_night = night_rain.idxmax()
# plot
night_rain.plot()
plt.axvline(rainiest_night, c='r', alpha=0.5, linestyle='--')

Note: this whole webpage is actually a Jupyter Notebook rendered as html. If you want to know how to make interactive graphs, go to the top of the page and click on “ Code”

Useful functions compatible with pandas.resample() can be found here. The full list of resampling frequencies can be found here.