7.1 Bilbao, Spain

7.2 Today

7.3 August 1983

On Friday, August 26, 1983, Bilbao was celebrating its Aste Nagusia or Great Week, the main annual festivity in the city, when it and other municipalities of the Basque Country, Burgos, and Cantabria suffered devastating flooding due to heavy rains. In 24 hours, the volume of water registered 600 liters per square meter. Across all the affected areas, the weather service recorded 1.5 billion tons of water. In areas of Bilbao, the water reached a height of 5 meters (15 feet). Transportation, electricity and gas services, drinking water, food, telephone, and many other basic services were severely affected. 32 people died in Biscay, 4 people died in Cantabria, 2 people died in Alava, and 2 people died Burgos. 5 more people went missing.

7.4 How often will such rainfall happen?

How often does it rain 50 mm in 1 day? What about 100 mm in 1 day? How big is a “once-in-a-century event”?

Let’s examine Bilbao’s daily rainfall (mm), between 1947 to 2021

On the week of 22-28 August 1983, Bilbao’s weather station measured 45 cm of rainfall!

Let’s analyze this data and find out how rare such events are. First we need to find the annual maximum for each hydrological year.

Show/hide the code
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.read_csv('BILBAO_daily.csv', sep=",")
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.set_index('DATE')
# IMPORTANT!! daily precipitation data is in tenths of mm, divide by 10 to get it in mm.
df['PRCP'] = df['PRCP'] / 10

import altair as alt
alt.data_transformers.disable_max_rows()

# Altair only recognizes column data; it ignores index values.
# You can plot the index data by first resetting the index
# I know that I've just made 'DATE' the index, but I want to have this here nonetheless so I can refer to this in the future
df_new = df.reset_index()#.replace({0.0:np.nan})
source = df_new[['DATE', 'PRCP']]

brush = alt.selection(type='interval', encodings=['x'])

base = alt.Chart(source).mark_line().encode(
    x = 'DATE:T',
    y = 'PRCP:Q'
).properties(
    width=600,
    height=200
)

upper = base.encode(
    alt.X('DATE:T', scale=alt.Scale(domain=brush)),
    alt.Y('PRCP:Q', scale=alt.Scale(domain=(0,100)))
)

lower = base.properties(
    height=60
).add_selection(brush)

alt.vconcat(upper, lower)
/home/yairmau/miniforge3/lib/python3.10/site-packages/altair/utils/deprecation.py:65: AltairDeprecationWarning:

'selection' is deprecated.
   Use 'selection_point()' or 'selection_interval()' instead; these functions also include more helpful docstrings.

/home/yairmau/miniforge3/lib/python3.10/site-packages/altair/utils/deprecation.py:65: AltairDeprecationWarning:

'add_selection' is deprecated. Use 'add_params' instead.

We will consider a hydrological year starting on 1 August.

  • Top: Histogram of annual daily precipitation maximum events.
  • Bottom: The cumulative answers the question: “How many events can be found below a given threshold?”

  • Top: We can normalize the histogram such that the total area is 1. Now the histogram is called probability density function (pdf). The probability is NOT the pdf, but the area between two thresholds.
  • Bottom: The cumulative now becomes a probability between 0 and 1. It is now called cumulative density function (cdf). The cdf answers the question: “What is the probability to choose an event below a given threshold?”

We are interested in extreme events, and we want to estimate how many years, on average, do we have to wait to get an annual maximum above a given threshold?

This question is very similar to what we asked regarding the cdf. 🤔
We switched the word “below” for “above”. The complementary of the cumulative is called the exceedance (or survival) probability:

\text{exceedance, survival} = 1 - \text{cdf}

7.5 Return Period

We will follow Brutsaert (2005), page 513. He defines quantities a little different from what we did above.

F(x) is the CDF of the PDF f(x). F(x) indicates the non-exceedance probability, i.e., the probability that a certain event above x has not occurred (or that an event below x has occurred, same thing). Modifying the graph shown above, we have

1-F(x) is the probability that a certain event above x has occurred. It’s reciprocal is the return period:

T_r(x) = \frac{1}{1-F(x)}

This return period is the expected number of observations required until x is exceeded once. In our case, we can ask the question: how many years will pass (on average) until we see a rainfall event greater that that of 26 August 1983?

Let’s call p=F(x) the probability that we measured once and that an event greater than x has not occurred. What is the probability that a rainfall above x will occur only on year number k?

  • it hasn’t occurred on year 1 (probability p)
  • it hasn’t occurred on year 2 (probability p)
  • it hasn’t occurred on year 3 (probability p)
  • it has occurred on year k (probability 1-p)

P\{k \text{ trials until }X>x\} = p^{k-1}(1-p)

Every time the number k will be different. What will be k on average?

\bar{k} = \displaystyle\sum_{k=1}^{\infty} k P(k) = \displaystyle\sum_{k=1}^{\infty} k p^{k-1}(1-p)

Let’s open that up:

\begin{split} \bar{k} &= 1-p + 2p(1-p) + 3p^2(1-p) + 4p^3(1-p)+ \cdots\\ \bar{k} &= 1-p + 2p - 2p^2 + 3p^2 - 3p^4 + 4p^3 - 4p^4+ \cdots \\ \bar{k} &= 1 + p + p^2 + p^3 + p^4 + \cdots \end{split}

For p<1, the series converges to 1 + p + p^2 + p^3 + p^4 + \cdots = \frac{1}{1-p}, therefore \bar{k} = \frac{1}{1-p}.

We conclude that if we know the exceedance probability, we immediately can say what the return times are. We now need a way of estimating this exceedance probability.

7.6 Generalized extreme value distribution

This part of the lecture was heavily inspired by Alexandre Martinez’s excellent blog post.

The Generalized Extreme Value (GEV) distribution is the limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables (from Wikipedia).

It has three parameters: shape, location and scale. We can fit Bilbao’s annual daily maxima with a GEV distribution, yielding:

(shape=-0.18, location=52, shape=16)

We will use the fitted parameters to plot the cdf, and then we will find the return periods. How can we calculate the cdf numerically form the data?

7.6.1 cdf from data

Here is a plot of annual daily precipitation maxima for Bilbao.

There are 74 points here. Let’s order them from small to large:

Now, instead of having the order of the event on the horizontal axis, let’s make it a fraction from 0 to 1.

We are getting there! We can interpret this fraction as the probability of randomly choosing a point in the graph below the threshold.

Now we just need to flip the vertical and horizontal axes, and we’re done! We have our cdf!

Now that we have the cdf from data, we can plot on top of it the GEV cdf with the parameters we found before.

The highest data point in this graph goes only to 252 mm, corresponding to the highest event recorded in 74 years. We can use the GEV cdf to calculate return times for any desired levels, simply by converting the vertical axis (cdf) to return period, using the equation we found earlier. T_r(x) = \frac{1}{1-F(x)},

where T_r is the return period (in years), and F is the cdf.

The information contained in the last two graphs is exactly the same, but somehow this last graph looks much worse! Why is this so? Acording to the GEV distribution we got, the return time for a 252 mm event is about 700 years!

7.7 Extra: K-S test

to be completed…

7.8 Extra: confidence interval

to be completed…