Introduction - some first exercises in python
let's have fun plotting some data 😀
download the data
- Go to the Faculty of Agriculture's weather station.
- Click on
משיכת נתונים
and download data for 1 September to 28 February, with a 24h interval. Call it
data-sep2020-feb2021
- Open the .csv file with Excel, see how it looks like
- If you can't download the data, just click here.
import packages
We need to import this data into python. First we import useful packages. Type (don't copy and paste) the following lines in the code cell below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="ticks", font_scale=1.5)
df = pd.read_csv("data-sep2020-feb2021.csv", header=[4])
df
df.columns = ['date', 'tmax', 'tmin', 'wind', 'rain24h', 'rain_cumulative']
df
plt.plot(df['tmin'])
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df
df = df.set_index('date')
df
plt.plot(df['tmin'])
# creates figure (the canvas) and the axis (rectangle where the plot sits)
fig, ax = plt.subplots(1, figsize=(10,7))
# two line plots
ax.plot(df['tmin'], color="red", label="Temp (min)")
ax.plot(df['tmax'], color="blue", label="Temp (max)")
# axes labels and figure title
ax.set_xlabel('date')
ax.set_ylabel('temperature (°C)')
ax.set_title('maximum and minimum temperatures')
# some ticks adjustments
ax.set_yticks([10,15,20,25]) # we can choose where to put ticks
ax.grid(axis='y') # makes horizontal lines
plt.gcf().autofmt_xdate() # makes slated dates
# legend
ax.legend(loc='upper right')
# save png figure
plt.savefig("temp_max_min.png")
# creates figure (the canvas) and the axis (rectangle where the plot sits)
fig, ax = plt.subplots(1, figsize=(10,7))
# line and bar plots
ax.bar(df.index, df['rain24h'], color="blue", label="daily rainfall")
# there are many ways of calculating the cumulative rain
# method 1, use a for loop:
# rain = df['rain24h'].to_numpy()
# cumulative = rain * 0
# for i in range(len(rain)):
# cumulative[i] = np.sum(rain[:i])
# df['cumulative1'] = cumulative
# method 2, use list comprehension:
# rain = df['rain24h'].to_numpy()
# cumulative = [np.sum(rain[:i]) for i in range(len(rain))]
# df['cumulative2'] = cumulative
# method 3, use existing functions:
df['cumulative3'] = np.cumsum(df['rain24h'])
ax.plot(df['cumulative3'], color="red", label="cumulative rainfall")
# compare our cumulative rainfall with the downloaded data
# ax.plot(df['rain_cumulative'], 'x')
# axes labels and figure title
ax.set_xlabel('date')
ax.set_ylabel('rainfall (mm)')
ax.set_title('daily and cumulative rainfall')
ax.set_xlim(['2020-11-01','2021-02-28'])
# some ticks adjustments
plt.gcf().autofmt_xdate() # makes slated dates
# legend
ax.legend(loc='upper left')
# save png figure
plt.savefig("cumulative_rainfall.png")
# creates figure (the canvas) and the axis (rectangle where the plot sits)
fig, ax = plt.subplots(1, figsize=(10,7))
# define date range
start_date = '2021-01-01'
end_date = '2021-01-31'
january = df[start_date:end_date]['tmax']
# plots
ax.plot(january, color="red", label="daily max")
ax.plot(january*0 + january.mean(), color="purple", linestyle="--", label="average daily max")
# axes labels and figure title
ax.set_xlabel('date')
ax.set_ylabel('temperature (°C)')
ax.set_title('average daily maximum temperature for January 2021')
# some ticks adjustments
plt.gcf().autofmt_xdate() # makes slated dates
# legend
ax.legend(loc='lower left')
# save png figure
plt.savefig("average_max_temp.png")
# creates figure (the canvas) and the axis (rectangle where the plot sits)
fig, ax = plt.subplots(1, figsize=(10,7))
# histogram
b = np.arange(0, 56, 5) # bins from 0 to 55, width = 5
ax.hist(df['wind'], bins=b, density=True)
# axes labels and figure title
ax.set_xlabel('max wind speed (km/h)')
ax.set_ylabel('frequency')
ax.set_title('frequency of maximum wind speed')
# save png figure
plt.savefig("wind-histogram.png")
homework
Go back to the weather station website, download one year of data from 01.01.2020 to 31.12.2020 (24h data). If you can't download the data, just click here. Make the following graph:
- daily tmax and tmin
- smoothed data for tmax and tmin
In order to smooth the data with a 30 day window, use the following function:
df['tmin'].rolling(30, center=True).mean()
This means that you will take the mean of 30 days, and put the result in the center of this 30-day window.
Play with this function, see what you can do with it. What happens when you change the size of the window? Why is the smoothed data shorter than the original data? See the documentation for rolling
to find more options.
fig, ax = plt.subplots(figsize=(10,7))
df2 = pd.read_csv("1year.csv", header=[4])
df2['date'] = pd.to_datetime(df2['date'], dayfirst=True)
df2 = df2.set_index('date')
plt.plot(df2['tmax'], label='tmax', color="tab:red")
plt.plot(df2['tmin'], label='tmin', color="tab:blue")
tmin_smooth = df2['tmin'].rolling(30, center=True).mean()
tmax_smooth = df2['tmax'].rolling(30, center=True).mean()
plt.plot(tmax_smooth, label='tmax smoothed', color="tab:pink", linestyle="--", linewidth=3)
plt.plot(tmin_smooth, label='tmin smoothed', color="tab:cyan", linestyle="--", linewidth=3)
plt.legend()
plt.savefig("t_smoothed.png")