7 upsampling
In the previous chapter, we resampled from fine temporal resolution to a coarser one. This is also called downsampling. We will learn the upsampling now: how to go from coarse data to a finer scale.
Sadly, there is no free lunch, and we just can’t get data that was not measured. What to do then?
It’s best to consider a practical example.
7.1 Potential Evapotranspiration using Penman’s equation
We want to calculate the daily potential evapotranspiration using Penman’s equation. Part of the calculation involves characterizing the energy budget on soil surface. When direct solar radiation measurements are not available, we can estimate the energy balance by knowing the “cloudless skies mean solar radiation”, R_{so}. This is the amount of energy (MJ/m^2/d) that hits the surface, assuming no clouds. This radiation depends on the season and on the latitude you are. For Israel, located at latitude 32° N, we can use the following data for 30°:
dates = pd.date_range(start='2021-01-01', periods=13, freq='MS')
values = [17.46, 21.65, 25.96, 29.85, 32.11, 33.20, 32.66, 30.44, 26.67, 22.48, 18.30, 16.04, 17.46]
df = pd.DataFrame({'date': dates, 'radiation': values})
df = df.set_index('date')
df
radiation | |
---|---|
date | |
2021-01-01 | 17.46 |
2021-02-01 | 21.65 |
2021-03-01 | 25.96 |
2021-04-01 | 29.85 |
2021-05-01 | 32.11 |
2021-06-01 | 33.20 |
2021-07-01 | 32.66 |
2021-08-01 | 30.44 |
2021-09-01 | 26.67 |
2021-10-01 | 22.48 |
2021-11-01 | 18.30 |
2021-12-01 | 16.04 |
2022-01-01 | 17.46 |
fig, ax = plt.subplots()
ax.plot(df['radiation'], color='black', marker='d', linestyle='None')
ax.set(ylabel=r'radiation (MJ/m$^2$/d)',
title="cloudless skies mean solar radiation for latitude 30° N")
ax.xaxis.set_major_locator(mdates.MonthLocator())
date_form = DateFormatter("%b")
ax.xaxis.set_major_formatter(date_form)
plt.gcf().autofmt_xdate() # makes slanted dates
We only have 12 values for the whole year, and we can’t use this dataframe to compute daily ET. We need to upsample!
In the example below, we resample the monthly data into daily data, and do nothing else. Pandas doesn’t know what to do with the new points, so it fills them with NaN.
radiation | |
---|---|
date | |
2021-01-01 | 17.46 |
2021-01-02 | NaN |
2021-01-03 | NaN |
2021-01-04 | NaN |
2021-01-05 | NaN |
2021-01-06 | NaN |
2021-01-07 | NaN |
2021-01-08 | NaN |
2021-01-09 | NaN |
2021-01-10 | NaN |
2021-01-11 | NaN |
2021-01-12 | NaN |
2021-01-13 | NaN |
2021-01-14 | NaN |
2021-01-15 | NaN |
2021-01-16 | NaN |
2021-01-17 | NaN |
2021-01-18 | NaN |
2021-01-19 | NaN |
2021-01-20 | NaN |
2021-01-21 | NaN |
2021-01-22 | NaN |
2021-01-23 | NaN |
2021-01-24 | NaN |
2021-01-25 | NaN |
2021-01-26 | NaN |
2021-01-27 | NaN |
2021-01-28 | NaN |
2021-01-29 | NaN |
2021-01-30 | NaN |
2021-01-31 | NaN |
2021-02-01 | 21.65 |
2021-02-02 | NaN |
7.2 Forward/Backward fill
We can forward/backward fill these NaNs:
fig, ax = plt.subplots()
ax.plot(df['radiation'], color='black', marker='d', linestyle='None', label="original")
ax.plot(df_forw['radiation'], color='tab:blue', label="forward fill")
ax.plot(df_back['radiation'], color='tab:orange', label="backward fill")
ax.set(ylabel=r'radiation (MJ/m$^2$/d)',
title="cloudless skies mean solar radiation for latitude 30° N")
ax.legend(frameon=False, fontsize=12)
ax.xaxis.set_major_locator(mdates.MonthLocator())
date_form = DateFormatter("%b")
ax.xaxis.set_major_formatter(date_form)
plt.gcf().autofmt_xdate() # makes slanted dates
This does the job, but I want something better, not step functions. The radiation should vary smoothly from day to day. Let’s use interpolation.
7.3 Interpolation
fig, ax = plt.subplots()
ax.plot(df['radiation'], color='black', marker='d', linestyle='None', label="original")
ax.plot(df_linear['radiation'], color='tab:blue', label="linear interpolation")
ax.plot(df_cubic['radiation'], color='tab:orange', label="cubic interpolation")
ax.set(ylabel=r'radiation (MJ/m$^2$/d)',
title="cloudless skies mean solar radiation for latitude 30° N")
ax.legend(frameon=False, fontsize=12)
ax.xaxis.set_major_locator(mdates.MonthLocator())
date_form = DateFormatter("%b")
ax.xaxis.set_major_formatter(date_form)
plt.gcf().autofmt_xdate() # makes slanted dates
There are many ways to fill NaNs and to interpolate. A nice detailed guide can be found here.