7  upsampling

In the previous chapter, we resampled from fine temporal resolution to a coarser one. This is also called downsampling. We will learn the upsampling now: how to go from coarse data to a finer scale.

Sadly, there is no free lunch, and we just can’t get data that was not measured. What to do then?

It’s best to consider a practical example.

7.1 Potential Evapotranspiration using Penman’s equation

We want to calculate the daily potential evapotranspiration using Penman’s equation. Part of the calculation involves characterizing the energy budget on soil surface. When direct solar radiation measurements are not available, we can estimate the energy balance by knowing the “cloudless skies mean solar radiation”, R_{so}. This is the amount of energy (MJ/m^2/d) that hits the surface, assuming no clouds. This radiation depends on the season and on the latitude you are. For Israel, located at latitude 32° N, we can use the following data for 30°:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
import seaborn as sns
sns.set(style="ticks", font_scale=1.5)  # white graphs, with large and legible letters
dates = pd.date_range(start='2021-01-01', periods=13, freq='MS')
values = [17.46, 21.65, 25.96, 29.85, 32.11, 33.20, 32.66, 30.44, 26.67, 22.48, 18.30, 16.04, 17.46]
df = pd.DataFrame({'date': dates, 'radiation': values})
df = df.set_index('date')
df
radiation
date
2021-01-01 17.46
2021-02-01 21.65
2021-03-01 25.96
2021-04-01 29.85
2021-05-01 32.11
2021-06-01 33.20
2021-07-01 32.66
2021-08-01 30.44
2021-09-01 26.67
2021-10-01 22.48
2021-11-01 18.30
2021-12-01 16.04
2022-01-01 17.46
fig, ax = plt.subplots()
ax.plot(df['radiation'], color='black', marker='d', linestyle='None')
ax.set(ylabel=r'radiation (MJ/m$^2$/d)',
       title="cloudless skies mean solar radiation for latitude 30° N")
ax.xaxis.set_major_locator(mdates.MonthLocator())
date_form = DateFormatter("%b")
ax.xaxis.set_major_formatter(date_form)
plt.gcf().autofmt_xdate()  # makes slanted dates

We only have 12 values for the whole year, and we can’t use this dataframe to compute daily ET. We need to upsample!

In the example below, we resample the monthly data into daily data, and do nothing else. Pandas doesn’t know what to do with the new points, so it fills them with NaN.

df_nan = df['radiation'].resample('D').asfreq().to_frame()
df_nan.head(33)
radiation
date
2021-01-01 17.46
2021-01-02 NaN
2021-01-03 NaN
2021-01-04 NaN
2021-01-05 NaN
2021-01-06 NaN
2021-01-07 NaN
2021-01-08 NaN
2021-01-09 NaN
2021-01-10 NaN
2021-01-11 NaN
2021-01-12 NaN
2021-01-13 NaN
2021-01-14 NaN
2021-01-15 NaN
2021-01-16 NaN
2021-01-17 NaN
2021-01-18 NaN
2021-01-19 NaN
2021-01-20 NaN
2021-01-21 NaN
2021-01-22 NaN
2021-01-23 NaN
2021-01-24 NaN
2021-01-25 NaN
2021-01-26 NaN
2021-01-27 NaN
2021-01-28 NaN
2021-01-29 NaN
2021-01-30 NaN
2021-01-31 NaN
2021-02-01 21.65
2021-02-02 NaN

7.2 Forward/Backward fill

We can forward/backward fill these NaNs:

df_forw = df['radiation'].resample('D').ffill().to_frame()
df_back = df['radiation'].resample('D').bfill().to_frame()
fig, ax = plt.subplots()
ax.plot(df['radiation'], color='black', marker='d', linestyle='None', label="original")
ax.plot(df_forw['radiation'], color='tab:blue', label="forward fill")
ax.plot(df_back['radiation'], color='tab:orange', label="backward fill")
ax.set(ylabel=r'radiation (MJ/m$^2$/d)',
       title="cloudless skies mean solar radiation for latitude 30° N")
ax.legend(frameon=False, fontsize=12)
ax.xaxis.set_major_locator(mdates.MonthLocator())
date_form = DateFormatter("%b")
ax.xaxis.set_major_formatter(date_form)
plt.gcf().autofmt_xdate()  # makes slanted dates

This does the job, but I want something better, not step functions. The radiation should vary smoothly from day to day. Let’s use interpolation.

7.3 Interpolation

df_linear = df['radiation'].resample('D').interpolate(method='time').to_frame()
df_cubic = df['radiation'].resample('D').interpolate(method='cubic').to_frame()
fig, ax = plt.subplots()
ax.plot(df['radiation'], color='black', marker='d', linestyle='None', label="original")
ax.plot(df_linear['radiation'], color='tab:blue', label="linear interpolation")
ax.plot(df_cubic['radiation'], color='tab:orange', label="cubic interpolation")
ax.set(ylabel=r'radiation (MJ/m$^2$/d)',
       title="cloudless skies mean solar radiation for latitude 30° N")
ax.legend(frameon=False, fontsize=12)
ax.xaxis.set_major_locator(mdates.MonthLocator())
date_form = DateFormatter("%b")
ax.xaxis.set_major_formatter(date_form)
plt.gcf().autofmt_xdate()  # makes slanted dates

There are many ways to fill NaNs and to interpolate. A nice detailed guide can be found here.