12 Exercises

12.1 Tasks

Download the paper
Nassif, S. H., and E. M. Wilson, 1975, “The influence of slope and rain intensity on runoff and infiltration”, Hydrological Sciences Journal.
Google the following: plot digitizer
Load image “nassif-16percent-slope.png” (see below)
Digitize data points from left to right. Create 4 csv files, one for each data set. Call them whatever you want.
Legend:
```
 * white circle = 312 mm/h,
 * triangle = 234 mm/h,
 * x = 156 mm/h,
 * black circle = 78 mm/h.
```

The image is the second panel of Fig. 8, from the paper you downloaded.

If for any reason you’re having trouble digitizing the data from the graph, download here each csv file I have already prepared.

Import relevant packages

Show the code

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="ticks", font_scale=1.5)
from scipy.optimize import curve_fit
import matplotlib.patches as patches

Load all four files you created. Use numpy’s function loadtxt. Make sure that the first point in each table corresponds to the appropriate rainfall rate. You can normalize the data if it is not.

d1 = np.loadtxt("input_rate_078mm_per_h_16percent_slope.csv", delimiter=',')
d2 = np.loadtxt("input_rate_156mm_per_h_16percent_slope.csv", delimiter=',')
d3 = np.loadtxt("input_rate_234mm_per_h_16percent_slope.csv", delimiter=',')
d4 = np.loadtxt("input_rate_312mm_per_h_16percent_slope.csv", delimiter=',')

"""
In the digitization process, not all points are necessarily
located in the "correct" location. Because we know the initial
infiltration rate for each data series, we can correct that and
rescale the data so that the first point is exactly where we
expect it to be
"""
d1[:,1] = d1[:,1] * 78  / d1[:,1].max()
d2[:,1] = d2[:,1] * 156 / d2[:,1].max()
d3[:,1] = d3[:,1] * 234 / d3[:,1].max()
d4[:,1] = d4[:,1] * 312 / d4[:,1].max()

Reproduce the original figure, make it look good, something like this:

fig, ax = plt.subplots(figsize=(10,7))
ax.plot(d4[:,0], d4[:,1], 'o', markerfacecolor="None", label=r"water input = 312 mm h$^{-1}$")
ax.plot(d3[:,0], d3[:,1], '^', label=r"water input = 234 mm h$^{-1}$")
ax.plot(d2[:,0], d2[:,1], 'x', label=r"water input = 156 mm h$^{-1}$")
ax.plot(d1[:,0], d1[:,1], 'o', label=r"water input = 78 mm h$^{-1}$")
ax.set(xlabel="time (min)",
       ylabel=r"infiltration rate (mm h$^{-1}$)")
ax.legend(loc="upper right");

12.2 Horton’s equation

f = f_c+(f_0-f_c)e^{-\beta t}

f: infiltration rate
f_c: infiltration capacity at large t
f_0: initial infiltration capacity
\beta: best fit empirical parameter

Write a function called horton, that receives time t and the three parameters, and returns the right-hand side of the equation above. Plot one of the data sets, together with a guess of the parameters that should roughly fit the data.

def horton(t, fc, f0, beta):
    return fc + (f0 - fc) * np.exp(-beta*t)

fig, ax = plt.subplots(figsize=(10,7))
t = d1[:,0]
t = t - t[0]
f = d1[:,1]
ax.plot(t, f, 'o', label="data")
param_guess = [35, 80, 0.5]
ax.plot(t, horton(t, *param_guess), '-', label="horton")
ax.set(xlabel="time (min)",
       ylabel="infiltration rate (mm/h)")
ax.legend(loc="upper right");

Find the best fit for the parameters f_c, f_0, \beta. Calculate the R^2 for each data set.

For the best fit, use scipy’s curve_fit. Write a function to compute the R-squared of your fit.

def best_fit(data):
    t = data[:,0]
    t = t - t[0]
    f = data[:,1]
    # best fit
    popt, pcov = curve_fit(f=horton,             # model function
                           xdata=t,              # x data
                           ydata=f,              # y data
                           p0=(130, 800, 0.5),   # initial guess of the parameters
                          )
    return [popt, pcov]

def calculate_r_squared(data, popt):
    t = data[:,0]
    t = t - t[0]
    f = data[:,1]
    # Calculate residuals
    residuals = f - horton(t, *popt)
    # You can get the residual sum of squares (ss_res) with
    ss_res = np.sum(residuals**2)
    # You can get the total sum of squares (ss_tot) with
    ss_tot = np.sum((f - np.mean(f))**2)
    # And finally, the r_squared-value with,
    r_squared = 1 - (ss_res / ss_tot)
    return r_squared

def plot_best_fit(data, axis, marker, markercolor):
    # calculate best fit parameters
    popt, pcov = best_fit(data)
    t = data[:,0]
    f = data[:,1]
    # plot data points
    ax.plot(t, f, marker, markerfacecolor=markercolor, markeredgecolor="black")
    # plot best fit line
    r_squared = calculate_r_squared(data, popt)
    labeltext = r"$f_c=$ {:.2f}, $f_0=$ {:.2f}, $\beta=$ {:.2f}, $R^2=$ {:.2f}".format(popt[0],popt[1],popt[2], r_squared)
    ax.plot(t, horton(t-t[0], *popt), color=markercolor, label=labeltext)    

fig, ax = plt.subplots(figsize=(10,7))
plot_best_fit(d1, ax, 'o', "tab:red")
plot_best_fit(d2, ax, 'x', "tab:blue")
plot_best_fit(d3, ax, '^', "tab:orange")
plot_best_fit(d4, ax, 'd', "tab:green")
ax.set(xlabel="time (min)",
       ylabel="infiltration rate (mm/h)")
ax.legend();

Make a graph of the infiltration rate and of the runoff, as a function of time. Use any of the four data sets you have.

fig, ax = plt.subplots(figsize=(10,7))
data = d4
t = data[:, 0]
f = data[:, 1]
t = np.concatenate([ [0], t])
f = np.concatenate([ [f[0]], f])
runoff = f[0] - f
ax.plot(t, f*0 + f[0], ls="--", color="black", label="rainfall")
ax.plot(t, f, color="tab:blue", lw=3, label=r"infiltration")
ax.plot(t, runoff, color="tab:orange", lw=3, label=r"runoff")
ax.set(xlabel="Time (min)",
       ylabel=r"Rate (mm h$^{-1}$)")
ax.legend(loc="lower right");

12.3 What does fit really mean?

Let’s take as an example data (x,y) that look to be organized in a linear trend.

plot synthetic data

N = 30
x_data = np.arange(N)
a = 2.0
b = 30.0
noise = 6 * (np.random.random(N)-0.5)
y_data = a * x_data + b + noise

fig, ax = plt.subplots(figsize=(10,7))
ax.plot(x_data, y_data, 'o', label="data")
ax.plot(x_data, -0.5*x_data + 40, label="first guess of a linear model")

ax.annotate("",
            xy=(x_data[20], y_data[20]), xycoords='data',
            xytext=(x_data[20], 29), textcoords='data',
            size=20,
            arrowprops=dict(arrowstyle="<->",
                            color="black",
                            connectionstyle="arc3"),
            )
ax.text(20.5, 50, "error")
ax.legend(frameon=False)
ax.set(xlabel="x",
       ylabel="y",
       ylim=[0,120]);

Our guess doesn’t look good. We used the model

y = ax + b

where a=-0.5 and b=40.

We will adjust these two parameters by minimizing the mean error between each data point and the model (straight line). We need to minimize the following expression:

\begin{split} MSE &= \text{mean}(\text{error}^2) \\ MSE &= \frac{1}{N}\sum_{i=1}^N\text{error}^2 \\ MSE &= \frac{1}{N}\sum_{i=1}^N\large( \text{measured} - \text{modelled} \large)^2 \\ MSE &= \frac{1}{N}\sum_{i=1}^N \large[ y_i - (a\cdot x_i +b) \large]^2 \end{split}

The MSE is a function of a and b. For every set of values (a,b), MSE has value, so you can imagine a surface in the (a,b) plane.

define useful funcitons

def compute_mse(y_measured, y_predicted):
    MSE = np.mean(
        (y_measured - y_predicted)**2
    )
    return MSE

def linear_model(x, a, b):
    return a*x + b

def gradient_descent(x, y, a, b, learning_rate):
    # linear model
    y_predicted = a * x + b
    error = y - y_predicted
    # partial derivatives
    da = np.mean(-2 * x * error)
    db = np.mean(-2 * error)
    # update a and b. minus sign because gradient points to direction
    # of maximal growth, we want to minimize MSE, not maximize
    a = a - da * learning_rate
    b = b - db * learning_rate
    mse = compute_mse(y, y_predicted)
    return a, b, mse

compute MSE landscape

A_vec = np.linspace(-1, 5, 100)
B_vec = np.linspace(0, 60, 100)
A_mesh, B_mesh = np.meshgrid(A_vec, B_vec)
MSE_mesh = np.zeros_like(A_mesh)
for i in range(100):
    for j in range(100):
        Y_predicted_mesh = linear_model(x_data,
                                        A_mesh[i, j],
                                        B_mesh[i, j])
        MSE_mesh[i, j] = compute_mse(y_data, Y_predicted_mesh)

MSE_mesh_log = np.log(MSE_mesh)

plot landscape

fig, ax = plt.subplots(figsize=(10,7))
A_min = A_vec[0]; A_max = A_vec[-1];
B_min = B_vec[0]; B_max = B_vec[-1];
asp = (A_max - A_min) / (B_max - B_min)
cax = ax.imshow(MSE_mesh_log, extent=[A_min, A_max, B_min, B_max],
                origin='lower', cmap='viridis', aspect = asp)
fig.colorbar(cax)
ax.set(xlabel=r"parameter $a$",
       ylabel=r"parameter $b$",
       title="log(MSE)");

The gradient of MSE is:

\nabla MSE = \frac{\partial}{\partial a}MSE\; \widehat{a} + \frac{\partial}{\partial b}MSE\; \widehat{b}

Taking the partial derivatives:

\begin{split} \frac{\partial}{\partial a}MSE &= \frac{1}{N}\sum_{i=1}^N (-2x_i)\large[ y_i - (a\cdot x_i +b)\large] \\ &= \text{mean}\large[ (-2x_i) \cdot \large( \text{measured} - \text{modelled} \large) \large] \\ \frac{\partial}{\partial b}MSE &= \frac{1}{N}\sum_{i=1}^N (-2) \large[y_i - (a\cdot x_i +b)\large] \\ &= \text{mean}\large[ (-2) \cdot \large( \text{measured} - \text{modelled} \large) \large] \end{split}

optimize parameters using gradient descent

lr = 1.0e-3
n_iterations = 30000
a_fit = np.zeros(n_iterations)
b_fit = np.zeros(n_iterations)
mse_fit = np.zeros(n_iterations)
a_fit[0] = -0.5
b_fit[0] = 40.0
for i in np.arange(1, n_iterations):
    a_fit[i], b_fit[i], mse_fit[i] = gradient_descent(x=x_data, y=y_data, a=a_fit[i-1], b=b_fit[i-1], learning_rate=lr)
print(f"best fit: a={a_fit[-1]:.2f}, b={b_fit[-1]:.2f}")

best fit: a=2.03, b=29.60

plot landscape and optimization path

fig, ax = plt.subplots(figsize=(10,7))
A_min = A_vec[0]; A_max = A_vec[-1];
B_min = B_vec[0]; B_max = B_vec[-1];
asp = (A_max - A_min) / (B_max - B_min)
cax = ax.imshow(MSE_mesh_log, extent=[A_min, A_max, B_min, B_max],
                origin='lower', cmap='viridis', aspect = asp)
fig.colorbar(cax)

ax.plot(a_fit, b_fit, color="xkcd:hot pink")
ax.plot(a_fit[0], b_fit[0], linestyle='None', marker='>', markerfacecolor='none', markeredgecolor='xkcd:hot pink', markersize=12)
ax.plot(a_fit[-1], b_fit[-1], linestyle='None', marker='s', markerfacecolor='none', markeredgecolor='xkcd:hot pink', markersize=12)

ax.set(xlabel=r"parameter $a$",
       ylabel=r"parameter $b$",
       title="log(MSE)");

plot incremental improvement of the model

fig, ax = plt.subplots(figsize=(10,7))
ax.plot(x_data, y_data, 'o', label="data")

iter_list = [1, 2, 10, 1000, 30000]
color_list = np.linspace(0.8, 0.1, len(iter_list))

for i in range(len(iter_list)):    
    ax.plot(x_data, a_fit[iter_list[i]-1] * x_data + b_fit[iter_list[i]-1],
            label=f"iteration={iter_list[i]:d}",
            color=[color_list[i]]*3
            )
ax.set(xlabel="x",
       ylabel="y")
ax.legend(frameon=False, fontsize=12);

12.3.1 learning rate

The learning rate multiplies the gradient (partial derivatives in each parameter direction), therefore by playing with its values, we can converge slower or faster to the optimal solution we are seeking. There is a tradeoff in play here. If the learning curve is too small, we will have to run a lot of simulations steps to converge to the desired solution. On the other hand, if the learning rate is too large, the algorithm might not be able to converge to the region of the parameter space with the lowest MSE.

optimize parameters using gradient descent

lr = 1.4e-3
n_iterations = 5000
a_fit_2 = np.zeros(n_iterations)
b_fit_2 = np.zeros(n_iterations)
mse_fit_2 = np.zeros(n_iterations)
a_fit_2[0] = -0.5
b_fit_2[0] = 40.0
for i in np.arange(1, n_iterations):
    a_fit_2[i], b_fit_2[i], mse_fit_2[i] = gradient_descent(x=x_data, y=y_data, a=a_fit_2[i-1], b=b_fit_2[i-1], learning_rate=lr)
# print(f"best fit: a={a_fit_2[-1]:.2f}, b={b_fit_2[-1]:.2f}")

best fit: a=2.01, b=29.87

optimize parameters using gradient descent

lr = 5e-4
n_iterations = 5000
a_fit_3 = np.zeros(n_iterations)
b_fit_3 = np.zeros(n_iterations)
mse_fit_3 = np.zeros(n_iterations)
a_fit_3[0] = -0.5
b_fit_3[0] = 40.0
for i in np.arange(1, n_iterations):
    a_fit_3[i], b_fit_3[i], mse_fit_3[i] = gradient_descent(x=x_data, y=y_data, a=a_fit_3[i-1], b=b_fit_3[i-1], learning_rate=lr)
# print(f"best fit: a={a_fit_3[-1]:.2f}, b={b_fit_3[-1]:.2f}")

best fit: a=1.88, b=32.44

optimize parameters using gradient descent

lr = 3.5e-3
n_iterations = 5000
a_fit_4 = np.zeros(n_iterations)
b_fit_4 = np.zeros(n_iterations)
mse_fit_4 = np.zeros(n_iterations)
a_fit_4[0] = -0.5
b_fit_4[0] = 40.0
for i in np.arange(1, n_iterations):
    a_fit_4[i], b_fit_4[i], mse_fit_4[i] = gradient_descent(x=x_data, y=y_data, a=a_fit_4[i-1], b=b_fit_4[i-1], learning_rate=lr)
# print(f"best fit: a={a_fit_4[-1]:.2f}, b={b_fit_4[-1]:.2f}")

best fit: a=1551.68, b=108.47

Show the code

fig, ax = plt.subplots(figsize=(10,7))

cmap = plt.cm.Blues
colors = [cmap(0.9), cmap(0.6), cmap(0.3)]

ax.plot((mse_fit_3), label=f"learning rate = {5e-4:.2e}", color=colors[0])
ax.plot((mse_fit), label=f"learning rate = {1.0e-3:.2e}", color=colors[1])
ax.plot((mse_fit_2), label=f"learning rate = {1.4e-3:.2e}", color=colors[2])
ax.plot([], [], label=f"learning rate = {3.5e-3:.2e}", color="tab:red")

ax.set_yscale('log')
ax.set(ylim=[0, 50],
       xlim=[0, 5000],
       xlabel="iteration",
       ylabel="MSE",
       title="larger learning rates converge faster!...\nuntil they explode")

ax2 = ax.twinx()
ax2.plot((mse_fit_4), color="red")
ax2.set_yscale('log')
ax2.tick_params(axis='y', labelcolor='tab:red')
ax.legend(frameon=False);

/var/folders/c3/7hp0d36n6vv8jc9hm2440__00000gn/T/ipykernel_86050/3729129073.py:12: UserWarning: Attempt to set non-positive ylim on a log-scaled axis will be ignored.
  ax.set(ylim=[0, 50],

This way of finding the optimal set of parameters is the first and very simplest method. Much more sophisticated methods were developed throughout the years. This basic idea of finding the optimal values for a list of parameters is the basic idea behind machine learning methods.

12.3.2 local and global minima

The landscape we saw in the example above was well behaved, and we easily converged to a reasonable solution. Sometimes the landscape is more convoluted, and we might get stuck on local minima, depending on the initial conditions for the parameters. In that case, we need to help the algorithm by providing it with other initial guesses for the parameters.

12.4 Green & Ampt

f = \frac{A}{F} + B

where

A = K_\text{sat}\cdot|\psi_f|\cdot \left( \phi - \theta_0 \right)
B= K_\text{sat}

Write a function that calculates the cumulative of the infiltration rate.

F(t) = \int_0^t f(t) \text{ d}t

Use numpy’s trapz function, that implements the “trapezoidal rule”

def cumulative_F(t, f):
    F = np.array([0])
    t = t/60 # convert minute to hour
    for i in np.arange(2,len(t)+1):
        area = np.trapz(f[:i], t[:i])
        F = np.concatenate([F, [area]])
    return F

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,7))
t, f = d1[:,0], d1[:,1]
F = cumulative_F(t, f)
ax1.plot(t, f, label="f, rate")
ax2.plot(t, F, label="F, cumulative")
ax1.set(xlabel="t (min)",
        ylabel="f (mm/h)")
ax2.set(xlabel="t (min)",
        ylabel="F (mm)")
ax2.yaxis.set_label_position("right")

Plot f as a function of F. Try to guess A and B that give reasonable results.

fig, ax = plt.subplots(figsize=(10,7))
t, f = d1[:,0], d1[:,1]
F = cumulative_F(t, f)
ax.plot(F, f)
A=50; B=30;
ax.plot(F, A/F + B, 'o')
ax.set(xlabel="F",
       ylabel="f");

/var/folders/c3/7hp0d36n6vv8jc9hm2440__00000gn/T/ipykernel_86050/768544551.py:7: RuntimeWarning: divide by zero encountered in divide
  ax.plot(F, A/F + B, 'o')

Use the curve_fit to find the optimal values for A and B.

def G_and_A(F, A, B):
    return A/F + B

popt, pcov = curve_fit(f=G_and_A,     # model function
                       xdata=F[1:],       # x data
                       ydata=f[1:],       # y data
                       p0=(50, 30),   # initial guess of the parameters
                      )

# popt, pcov = curve_fit(G_and_A, F[1:], f[1:], p0=(50, 30))  # p0 = initial guess
print(popt)

fig, ax = plt.subplots(figsize=(10,7))
ax.plot(F, f)
ax.plot(F[1:], popt[0]/F[1:] + popt[1], 'o')
ax.set(xlabel="F",
       ylabel="f");

[24.12368526 36.34242813]

12.5 Homework

Go to Soil Texture Calculator, estimate the texture of “standard soil” in Nassif & Wilson, 1975.