Extreme cold weather events, such as the winter of 1962/63, the third coldest winter ever recorded
in the Central England Temperature record, or more recently the winter of
2010/11, have significant consequences for the society and economy. This
paper assesses the probability of such extreme cold weather across the United
Kingdom (UK), as part of a probabilistic catastrophe model for insured losses
caused by the bursting of pipes. A statistical model is developed in order to
model the extremes of the Air Freezing Index (AFI), which is a common measure
of the magnitude and duration of freezing temperatures. A novel approach in
the modelling of the spatial dependence of the hazard has been followed which
takes advantage of the vine copula methodology. The method allows complex
dependencies to be modelled, especially between the tails of the AFI
distributions, which is important to assess the extreme behaviour of such
events. The influence of the North Atlantic Oscillation and of anthropogenic
climate change on the frequency of UK cold winters has also been taken into
account. According to the model, the occurrence of extreme cold events, such
as the 1962/63 winter, has decreased approximately 2 times during the course
of the 20th century as a result of anthropogenic climate change. Furthermore,
the model predicts that such an event is expected to become more uncommon,
about 2 times less frequent, by the year 2030. Extreme cold spells in the UK
have been found to be heavily modulated by the North Atlantic Oscillation
(NAO) as well. A cold event is estimated to be

Extended periods of extreme cold weather can cause severe disruptions in
human societies – in terms of human health, by exacerbating previous medical conditions
or due to reduction of the food supply, which can lead to famine and disease;
in terms of agriculture, by devastating crops, particularly if the freeze occurs early or
late in the growing season; and in terms of infrastructure, e.g. severe disruptions in the
transport system or the bursting of residential or system water pipes

Of particular interest for the insurance industry are the economical losses
that originate as a result of the bursting of pipes due to freeze events. Water
pipes burst because the water inside them expands as it gets close to
freezing, which causes an increase in pressure inside the pipe. Whether a pipe
will break or not depends on the water temperature (and consequently on the
air temperature), the freezing duration, the pipe diameter and composition,
the wind chill effect (due to wind and air leakage on water pipes), and the
presence of insulation

Insurance losses from burst pipes have a significant impact on the UK
insurance industry. They have amounted to more than GBP 900 million in the
last 10 years, representing around 10 % of total insured losses, mainly
due to flood events and windstorms, in the United Kingdom (UK) during the
same period

Probabilistic catastrophe modelling is generally agreed to be the most
appropriate method to analyse such problems. The main goal of catastrophe
models is to estimate the full spectrum of probability of loss for a specific
insurance portfolio (i.e. comprised of several residential,
auto, commercial, or industrial risks). This requires the ability to
extrapolate the possible losses for each risk to high return periods, which
is usually achieved by simulating synthetic events
that are likely to happen in the near future (typically a year). More
importantly, it also requires considering how all risks relate to each other
and their potential synergy to create catastrophic losses. Such spatial
dependence between risks can result from various sources, for example, due to
the spatial structure of the hazard (e.g. the footprint in a windstorm or the
catchment area in a flood event) or due to similar building vulnerabilities
between risks in the same geographical area (e.g. due to common building
practices)

Modelling the spatial dependence of the hazard is usually achieved by taking
advantage of certain characteristic properties of the hazard footprint, like,
for example, the track path and the radius of maximum wind for windstorms or
the elevation in the case of floods. In the case of temperature, however,
such a property cannot be easily defined; an alternative solution is to use
multivariate copula models. Based on Sklar's theorem

However, one important difficulty is the limited choice of adequate copulas
for more than two dimensions. For example, standard multivariate copula
models such as the elliptical and Archimedean copulas do not allow for
different dependency models between pairs of variables. Vine copulas provide
a flexible solution to this problem based on a pairwise decomposition of a
multivariate model into bivariate copulas. This approach is very flexible, as
the bivariate copulas can be selected independently for each pair, from a
wide range of parametric families, which enables modelling of a wide range of
complex dependencies

In this paper, the vine copula methodology is used in a novel application to
develop a catastrophe model on insurance losses due to pipe bursts resulting
from freeze events in the United Kingdom. The focus here is on the hazard
component (Sect.

the North Atlantic Oscillation (NAO), a leading pattern of weather and climate variability over the Northern Hemisphere mid-latitudes, which accounts for more than half of the year-to-year variability in winter surface temperature over UK;

anthropogenic climate change and its direct effects in the temperature profile in the UK.

Stochastic winter seasons are simulated, taking into account the correlation
of the hazard between all pair cells with the help of regular vine copulas
(Sect.

The hazard component of the catastrophe model is based on the European Centre
for Medium-Range Weather Forecasts (ECMWF) 20th century reanalysis
(ERA-20C) covering the entire 20th century from 1900 to 2010

The ERA-20C product provides daily 3 h forecasts (i.e. eight forecast steps
starting at 06:00 UTC each day) of minimum and maximum temperature at 2 m. These are used to compute daily minimum and maximum values at every
grid cell for the entire period. The daily average temperatures are then
computed as 0.5 (

The coarse horizontal resolution is expected to have relatively small
influence in most cases, given that winter climate anomalies are often
coherent across large parts of the UK as they are primarily associated with
large-scale atmospheric circulation patterns

For comparison purposes, the observed daily average temperature gridded data
set developed by the UK Met Office is also used

The NAO refers to a redistribution of atmospheric mass between the Arctic and
the subtropical Atlantic and swings from one phase to another producing large
changes in weather, and in particular in surface air temperature, over the
Atlantic and the adjacent continents

Interannual variation of

Increases in concentration of greenhouse gases, such as carbon dioxide
(

The daily temperature data are used to compute the AFI at each grid cell as
the sum of the absolute average daily temperatures of all days with
temperatures below 0

Maps of AFI values from ERA-20C for the severe winters of 1946/47, 1962/63,
and 2009/10 are shown in Fig.

Based on the AFI, the 1962/1963 winter season was the most severe winter in
the 20th century and one of the coldest on record in the United Kingdom

Map of AFI values (in

After 1962/63, a long run of mild winters followed until late 1978 and early
1979. However, temperatures in 1978/79 were not as low and the cold weather
was interrupted frequently by brief periods of thaw

For the last 10 years of our study period (from 2000 to 2010), the mAFI seems to
be underestimated in the reanalysis data set (Fig.

As shown in Fig.

Since the historical data only extend for 110 years and our interest lies in
very rare events (such as 1-in 200-year events), it is necessary to extrapolate by
fitting an extreme value distribution. The generalized extreme value (GEV)
family of distributions has been chosen, which includes the Gumbel, the
Fréchet, and Weibull distributions. An additional term was included, the
probability of no hazard (

There are various methods of parameter estimation for fitting the GEV
distribution, such as least squares estimation, maximum likelihood estimation
(MLE), and probability weighted moments. Traditional parameter
estimation techniques give equal weight to every observation in the data set.
However, the focus in catastrophe modelling is mainly on the extreme
outcomes, and, thus, it is preferable to give more weight to the long return periods.
The tail-weighted maximum likelihood estimation (TWMLE) method developed by

Along with the TWMLE method described above, a second modification has been
implemented in order to geographically smooth the GEV parameters. The
smoothing is incorporated into the fitting process by minimizing the local
(ranked) log-likelihood. More precisely, the log-likelihood at each grid cell

The smoothing increases the sample size at each grid point, which thus leads
to a more precise estimation of the parameters, especially for the shape
parameter which is highly influential in estimating the hazard levels at high
return periods. Because the data grid resolution is already coarse, a small
length-scale parameter

Finally, in order to avoid an overestimate of the positive value of the
shape parameter due to the small sample size

Model parameters for a single cell over London.

Estimates of

As an example, the GEV fit for a single cell over London is shown in Fig.

Maps of the fitted parameters are shown in Fig.

AFI return period curves for a single cell over London: empirical fit (black circles), GEV fitted with MLE (grey line), and GEV fitted with TWMLE and geographical smoothing (black line).

Maps showing the spatial distribution of the model fitted
parameters:

In stationary models, the distribution parameter space is assumed to be
constant for the period under consideration. However, such an assumption is
not valid in the presence of atmospheric circulation patterns or
anthropogenic changes. Regression approaches are often used to assess the
influence of climatic factors on hazards and covariates such as global mean
temperature, and

Despite the caveats,

The influence of NAO and of global warming is examined by exploring
improvements to the distribution fits, after incorporating linear covariates
on the distribution parameters, as follows:

Only non-stationarity with respect to

As before, the parameters of each cell are estimated, taking also into account
its neighbouring cells weighted by their distance. The most pertinent model is
selected, for each cell, using the

The spatial distribution of the parameters of the final model is shown in
Fig.

Maps showing the spatial distribution of the non-stationary model
parameters:

The stochastic behaviour of the hazard (i.e. the AFI) at each cell is fully
described by its corresponding GEV probability distribution, as described in
Sect.

In this study, the joint multivariate hazard distribution of the AFI across
all the model domain (67 cells) is decomposed as a product of marginal and
pair-copula probability density functions (pdfs). The pair-copulas are fitted
using the R (

Percentage of family types used for the first five trees of the R-vine model.

The copula family types for each selected pair in the first tree are
determined using the Akaike information criterion

The percentage of family types used for the first few trees of the selected
R-vine model (RVM) is shown in Table

The small sample size used (110 years of data) in conjunction with the high
dimensions of the modelled pdf (67) is of concern in this study since this
can lead to large uncertainties in the resulting pdf, which can also
propagate in the estimated return periods. The impact of the short sample
size on the uncertainties in the results is quantified using a bootstrap
technique, as described in Sect.

Goodness of fit (GOF) is calculated using the Cramér–von Mises test, which
compares the final selected RVM with the empirical copula. The RVineGofTest
algorithm of the same R package implements different methods to compute the
test, which, however, usually perform poorly in cases of small sample sizes and
at higher dimensions as is the case for this work

Goodness-of-fit values for the Cramér–von Mises (CvM) statistic based on the empirical copula process (ECP) and based on the combination of the probability integral transform and empirical copula process (ECP2) as implemented in the VineCopula R package.

In the case of the stationary model, the vine copula is employed to model the
entire spatial dependence of the AFI in the UK. On the other hand, the
spatial AFI structure in the case of the non-stationary model is modelled in
two ways: (a) by quantifying the dependence on NAO or

The pdf is used to simulate 100 000 years of winter seasons in the UK. For each
year, the simulated AFI values at each grid cell depend on the other cells
based on the fitted RVM. Long simulations are needed to obtain numerically
converged results, i.e. convergence to the “true” return period. Our focus
here is the 200-year RP, which is commonly associated with capital and
regulatory requirements. By repeating the simulation several times, it has
been assessed that 100 000 years of winter seasons is long enough to neglect the
Monte Carlo uncertainty. The stationary model is used to generate a
stochastic set which corresponds to the current hazard experience. The
non-stationary model permits us to create additional stochastic sets that
represent different climate conditions. In order to assess the influence of
climate change on UK cold spells, three separate stochastic sets, of 100 000
years each, have been created as follows:

pre-industrial climate (

current climate (

future climate (

The choice of the year 2030 assures a relative close time distance which is more
relevant for the insurance industry

The small sample size used in this study (110 years of data) together with
the high dimensions of the modelled pdf (67) can lead to large uncertainties
in the estimated return periods. Following

A simulation with the same length as the observed data (i.e. 110 years) is repeated for

For each of these

For each of the resulting

The uncertainty in the return levels is estimated by identifying the 95 % confidence interval (i.e. the range 2.5 %–97.5 %) from these 500 return level curves.

Histogram of the NAOI and the pdf of the fitted Gaussian distribution (red line).

Due to computational constraints, confidence intervals are only computed for the stationary model. In addition, the simulation length has been reduced to 10 000 years (instead of 100 000), which implies that part of the calculated uncertainty is due to Monte Carlo sampling variability. In order to investigate the sources of this uncertainty further, the uncertainty associated with the RVM is only separated from the uncertainty of the full model, i.e. of the joint pdf, by calculating confidence intervals with the same approach as described above but using the same marginal pdfs in each bootstrap repetition.

The obtained stochastic sets (see Sect.

Return period maps at higher return periods (100, 200, and 500 years) for the
pre-industrial, current, and future climate stochastic sets are shown in
Fig.

Maps of stochastic AFI values (in

The vine copula methodology permits the estimation of the hazard return
periods over aggregated regions in the UK. Since our focus is mainly on
inhabited areas, for each simulation year (

Return period curves of the wAFI (in

The stationary model is utilized to analyse the uncertainty in the model
results and investigate its sources. Figure

Return period curves for the stochastic sets under pre-industrial, current,
and future climate conditions are shown in Fig.

The non-stationary model suggests that under current climate conditions, such
an extreme event is approximately 2 times less likely to occur than in the
1960s. This agrees with

Return period estimates (in years) for the 1962/63 winter freeze event, based on wAFI.

By the year 2030, an event of the same severity as 1962/63 is predicted to
become almost 2 times less infrequent, with a return period of 788 years.
Figure

The profound effect of the NAO on the winter surface temperature over the UK has been
reported by several studies

Return period curves of the wAFI (in

As already mentioned, the effect of the NAO or

This paper presents a probabilistic model of extreme cold winters in the United Kingdom. The hazard is modelled using the Air Freezing Index, an index which accounts for both the magnitude and the duration of air temperature below freezing and is calculated from the ERA-20C reanalysis temperature data covering the period from 1900 to 2010. Extreme value theory has been applied in order to estimate the probability of extreme cold winters spatially across the UK. More importantly, the spatial dependence between regions in the UK has been assessed through a novel approach which takes advantage of the vine copula methodology. This approach allows the modelling of concurrent high AFI values across the country, which is necessary in order to assess the extreme behaviour of freeze events reliably.

Recognizing the non-stationary nature of climate extremes, the model also
incorporates the NAO and climate change effects as predictors. Stochastic sets of
100 000 years representing different climate conditions (i.e. pre-industrial,
current, or future climate and positive or negative NAO) have been generated,
and the return periods of extreme cold winters in the UK, such as the Big Freeze
of 1962/63, have been estimated. According to the model, the occurrence of
such an event is calculated to have decreased approximately 2 times during
the course of the 20th century as a result of anthropogenic climate change.
The model further predicts that by the 2030s, extreme cold winters will become
even more uncommon and will occur about 2 times less frequently under the
influence of increasing

However, considerable uncertainty exists in these estimates, which should be interpreted with caution. The 110-year reanalysis record used in this study is estimated to be short, and the level of uncertainty in extremal estimates with long return periods is high. Additional uncertainty may also be introduced by possible spurious trends in the reanalysis data set. A longer record of temperature data would be necessary in order to reduce the uncertainty, and high-quality long-term reanalysis products with multiple ensemble members could help in this direction.

The ERA-20C data are available under a Creative Commons
Attribution 4.0 International License at

According to Sklar's theorem, the joint multivariate distribution of a set of
d random vectors can be fully specified by the separate marginal
distributions and by their (

The probability density function (pdf) of

Expression (

To quantify the dependence between variables, different measures have been
defined, addressing different aspects of dependence. A common measure of
overall dependence is the Kendall rank correlation coefficient, commonly
referred to as Kendall's

One important complication is that identifying the appropriate

Vine copulas provide a flexible solution to this problem based on a pairwise
decomposition of a multivariate model into bivariate (conditional and
unconditional) copulas, whereby each pair-copula can be chosen independently
from the others. In particular, asymmetries and tail dependence can be taken
into account as well as (conditional) independence to build more parsimonious
models. Vines thus combine the advantages of multivariate copula modelling,
that is the separation of marginal and dependence modelling, and the flexibility
of bivariate copulas

Similar to Fig.

As an example, in a four-dimensional case, the joint pdf can be decomposed as
a product of six pair-copulas (three unconditional and three conditional) and
four marginal pdfs, as shown in Eq. (

Example of four-dimensional R-vine trees corresponding to the
decomposition shown in Eq. (

The above decomposition is not unique, and

SK conceived the presented idea, performed the computations, discussed the results, and wrote the manuscript.

This research is sponsored by the author's employer, Guy Carpenter, and may lead to the development of products that may be licensed to Guy Carpenter clients.

The author would like to thank the editor and the two anonymous reviewers for their comments, which helped to greatly improve the quality of this article. I would like to thank my colleagues at Guy Carpenter for reviewing the content and providing valuable feedback. Edited by: Piero Lionello Reviewed by: two anonymous referees