Winter storms are the most costly natural hazard for European residential
property. We compare four distinct storm damage functions with respect to
their forecast accuracy and variability, with particular regard to the most
severe winter storms. The analysis focuses on daily loss estimates under
differing spatial aggregation, ranging from district to country level. We
discuss the broad and heavily skewed distribution of insured losses posing
difficulties for both the calibration and the evaluation of damage functions.
From theoretical considerations, we provide a synthesis between the
frequently discussed cubic wind–damage relationship and recent studies that
report much steeper damage functions for European winter storms. The
performance of the storm loss models is evaluated for two sources of wind
gust data, direct observations by the German Weather Service and ERA-Interim
reanalysis data. While the choice of gust data has little impact on the
evaluation of German storm loss, spatially resolved coefficients of variation
reveal dependence between model and data choice. The comparison shows that
the probabilistic models by

As a major contribution to natural-hazard damages, windstorms are responsible
for an average of 39 % of world-wide economic losses during
1980–2011

Recent climatological studies by

A storm damage function describes the relation between the intensity of a
storm and the typical monetary damage caused. While on the continental scale storm
intensity can be best described by complex storm severity indices

The work in hand tackles this issue by providing a model intercomparison of storm damage functions for the residential sector in the context of European winter storms.

In the discussion of storm damage functions, it is often assumed that loss
should increase as the square or cube of the maximum wind (gust) speed. These
presumptions originate from the following:

the consideration of wind loads, which are approximately proportional to
the exerted pressure and, hence, to the square of the wind speed

the concept of proportionality between structural damage and the dissipation
rate of the wind kinetic energy that scales with the third power of wind speed

We reason that the apparent contradiction results from the negligence of a potential loss threshold due to insurance deductibles or similar economic effects. Thus, we schematically demonstrate the transition from very steep loss increase to a more modest cubic power-law.

The comparison of storm damage models is generally impeded by inconsistencies
for reasons of (i) differing temporal or spatial resolution of meteorological
data, (ii) deviating building codes and enforcement practices, and (iii) differing
insurance policies and claims settlement practices

In order to circumvent such inconsistencies, three recently developed damage
functions

The theoretical foundations and the implications of each model are discussed in order to mainstream terminology and conceptual structure of storm damage functions. Quantitative results are obtained from numerical estimation and allow a direct comparison of model performance under varied spatial aggregation, relating to either daily loss or particular major storms. During summer months, the employed loss data inseparably includes both wind and hail damages. Since the employed damage functions concern wind damage only, we limit the work in hand to days within the winter half-year (abbreviated as WH), comprising the months October through March.

We address the validation of countrywide loss estimates by applying a novel pairwise binomial test metric in conjunction with the relative metrics mean percentage error (MPE) and mean absolute percentage error (MAPE). Furthermore, a coefficient of variation is employed to assess the predictive uncertainty on district level at daily resolution.

The overall model estimation is based on annual cross validation, an
iterative procedure for the sampling of the training data, safeguarding that
loss estimates within any given year are obtained from independent training
samples. We furthermore assess model robustness by employing
a

In the following section, we give overviews of the employed wind gust and insurance
data sets and of the model estimation procedure. In Sect.

In this work, the employed damage functions are calibrated against detailed insurance loss data obtained for storm damages to residential buildings. The German Insurance Association (GDV) provided loss data relating to the “comprehensive insurance on buildings” line of business resolved for 439 German administrative districts (as of 2006).

The data set comprises the magnitude of absolute losses and insured values as
well as the number of claims for the years 1997 to 2007 on a daily basis.
With its high spatiotemporal resolution and countrywide coverage, the GDV
data set has been successfully applied for the calibration of different damage
functions (e.g.

In order to eliminate price effects and time-varying insurance market
penetration, we consider relative figures for the amount of loss and claims
throughout. The following definitions are applied:

loss ratio (LR): the amount of insured loss per day and district, divided by the corresponding sum of insured value;

claim ratio (CR): the number of affected insurance contracts per day and district, divided by the corresponding total number of insurance contracts.

These definitions are based on the assumption that insured buildings are
randomly distributed in each district and are representative of the overall
residential building stock. With data coverage of up to 13.4 million insured
buildings and in excess of 90 % market coverage

The highly skewed and heavy-tailed distribution of daily losses during the
winter half-year is illustrated in Fig.

The vast number of days exhibiting negligible insured loss appears to be due
to a random scattering of small losses across time and districts. Supporting
the attribution to noise,

The three loss classes defined for the winter half-year. Given are the number of observations, the related quantiles, and the accumulated loss share for the period 1997 to 2007.

Two sets of meteorological data were employed. The first set comprises daily
maxima of the 3 s wind gust measured by the German weather service
DWD

Data available at:

Missing values may not exceed 20 days for each year.

Average missing days per year may not exceed 10 for the period 1996 to 2008.

Stations should exclude mountainous stations above 1400 m a.s.l.

Inhomogeneities in meteorological times-series can be identified by finding
an optimal solution to the multiple breakpoint problem. Standard methods are
available, in particular, for finding inhomogeneities in monthly climatic
time series

In the case of daily block maxima of climatic data, the relatively small
change at the breakpoint as compared to the data's variance and the presence
of long-term persistence adversely affect the capacity to identify
breakpoints correctly. With a low signal-to-noise ratio, the presence of
long-term correlation can lead to false identification of breakpoints

We attempt to avoid over-detection by applying a conservative testing scheme
based on multiple cross-comparison of neighbouring stations and the
examination of metadata, e.g. about relocation of stations. The testing
scheme employs the R implementation of the PMFred algorithm developed by

To begin with, we chose a control group of 39 stations whose individual time
series showed no significant inhomogeneities in the test algorithm.
Subsequently, we paired each of the 85 stations with the 10 closest of the
control group and performed the PMFred algorithm on the time series of their
differences. If, within a 60 day window, at least three pairwise tests indicated
a breakpoint that could be backed by metadata, the inhomogeneity was
corrected. Furthermore, if all 10 pairwise comparisons suggested
a significant and otherwise undocumented breakpoint it was also corrected.
All corrections were performed using a quantile-matching algorithm

Overall, we took a conservative stance on artificial manipulations of the raw time series and corrected only three significant breakpoints in total, two of which were documented in metadata.

The second wind gust data set was obtained from the ERA-Interim reanalysis
project

ERA-Interim data were obtained from:

Both sets of wind gust data, DWD and ERA-Interim, require a downscaling to
match the resolution of the insurance data.

The analysis of daily insurance loss data of the winter half-year reveals an
extremely broad and strongly skewed loss distribution. Relating loss and wind
gust data, a pronounced heteroscedasticity is revealed

Since damage functions are typically employed as predictive models, it is of key
importance how accurately they perform in practice. In addition to choosing
the optimal model, there is the risk of overfitting to a training data which may not represent the high variability of weather extremes. In order to
assess the predictive performance of the employed models, a k-fold cross
validation scheme

For annual cross validation, the 11-year data set is partitioned into annual subsamples. Iteratively, each individual subsample is retained for evaluation, while the model is trained in the 10 years remaining. This process ensures that each year is used exactly once for evaluation.

The employed cross validation enables out-of-sample prediction for each day and allows for the assessment of the model fit with regard to the range of frequently occurring losses.

However, for very scarce extreme events the evaluation of model robustness requires additional resampling of the training data. The resampling is performed via a jackknife procedure, where each individual annual subsample is excluded consecutively from the 10-year training sample.

For the joint analysis of deterministic and probabilistic models, two different schemes for loss aggregation are employed. Generally, we consider the daily district-wise loss estimates as independent random variables dependent only on the maximum gust speed. In the case of deterministic models, the model estimates are interpreted as expected values and were simply summed up over time or space. For the probabilistic models, we employed a Monte Carlo approach, where results of 1000 independent random realisations were aggregated. The expected value and distribution quantiles were then calculated from the distribution of Monte Carlo estimates at the desired level of aggregation.

The broadness and skew of the loss distribution also play a role
for the validation of model estimates, as they have significant impact on the
applicability of evaluation metrics. Heteroscedastic dependence between
prediction error and loss magnitude invalidates traditional moment-based
metrics, such as

In order to eliminate the effects of scale of the loss distribution for model comparison, we propose a simple pairwise statistical test based on binomial statistics. The null-hypothesis is that both models have equal predictive skill and, hence, that their predictions are equally likely to be closest to the true observations. Successes (i.e. closer prediction) can be represented by independent Bernoulli trials with probability 0.5. In a one-tailed test, the binomial distribution then expresses the probability for a given success rate.

In order to apply the binomial test, the share of predictions where one or
the other comes closer to the observation is estimated for each pairing of
models. Significance is obtained from the binomial distribution with
probability 0.5 and

As the binomial test itself does not disclose why any specific model
outperforms a competitor, we interpret the results of each model in
conjunction with traditional relative metrics relating to a multiplicative
error. For the employed data,

The employed multiplicative metrics are the mean absolute percentage error (MAPE, i.e. the mean of the moduli of deviations between model estimates and observations in percent) and the mean percentage error (MPE, i.e. the mean of the deviations between model estimates and observations in percent). While MAPE gives an estimate of the variability of model results, MPE provides an indication for systematic bias.

A damage function describes the relation between the intensity of a specific
hazard and the typical monetary damage caused with respect to either a single
structure (

Microscale models can be empirical (i.e. statistically derived from data),
engineering-based, or a mixture of both. On the macro scale, damages may be
either aggregated from microscale models or obtained from statistical
relationships based on empirical data

Due to the minimum resolution of our data (i.e. districts), our analysis is constrained to the macroscale models of the latter kind. Nonetheless, some of the damage functions under scrutiny contain assumptions on the nature of microscale damage. As there are no publicly available engineering-based models for our region of interest, only statistical models are considered.

For a general overview of modelling approaches, both statistical and
engineering-based, we refer the reader to

The choice for an exponential damage function is motivated by empirical
observation, showing quasi-linear increase of the logarithm of the loss ratio
versus maximum wind (gust) speed over a wide range

It is a non-physical damage function in the sense that it does not saturate with increasing wind gust speed and thus ignores an upper limit of physical damage. However, average loss levels reached during European winter storms typically range below or around a few tenths of a percent of insured value, such that loss saturation does not become an issue.

Example of model predictions for a single district obtained from DWD data for
the training period 1997–2006 and set in contrast to year 2007 empirical
data, all limited to the winter half-year.

The damage function relates the loss ratio

variations of scale due to mismatches in altitude or location of the geographical reference of the gust data and the building portfolio

loss being dependent on a differing wind predictor with approximate proportionality to the maximum gust speed

systematic bias caused by the interpolation of wind gust data.

The exponential damage functions focuses on wind-dependent losses only.
Typically, these are large losses within the upper tail of the loss
distribution. For the employed insurance data, small losses that occurred at
days with maximum gust speed beneath the 95th percentile show a predominantly
random behaviour not captured by Eq. (

Further details about the calibration of the damage function are given in
Sect.

In the literature, there are several proponents for power-law-based storm
damage functions (e.g.

For winter storms affecting Germany,

For an arbitrary district, Fig.

The model can be simplified for large wind gust speeds. In this case, the expected
value of loss

The exponent

The original model published by

Please refer to Sect.

At the core of the damage function is the definition of a damage proxy

The damage function is calibrated by performing a linear regression of loss observations against the damage proxy, thus involving two regression parameters (a scaling coefficient and an offset). In the upper limit, the damage function increases without bounds and hence ignores damage saturation at high gust speed.

The scaled damage proxy is shown exemplarily for an arbitrary district in
Fig.

The employed wind gust percentile was empirically found by

I.e. structures are reinforced to withstand frequent low-impact events, while adapting to the rare extremes may be too costly. A balance between the individually perceived (monetary) risk and tolerable adaptation cost is maintained.

to the wind climate and hence argue for the applicability of a wind percentile as a proxy for such adaptation.The cubic relationship of the damage function has been repeatedly put into
context with the advection of kinetic energy

In Sect.

Although

Within their theoretical framework, a building damage occurs if
a critical wind gust speed, particular to that building, is exceeded. A continuous
probability density function is employed to describe the probability of
critical gust speeds within the overall building stock. For modelling
purposes,

If an individual building

In contrast to the other discussed damage functions, model fitting and loss estimation requires numerical integration, which makes the application of the damage function computationally more demanding. It was found that the model could not be reliably calibrated on loss data only, necessitating the use of additional data for the number of claims per region and day. Given the additional information from claims data, the damage function would be expected to perform as well or better than the competing models.

Due to its probabilistic description of the building stock, the damage function naturally incorporates an upper limit to the claim and loss ratio and may be applicable to a wide range of losses.

The model requires the calibration of four parameters, describing the wind
gust speed at which half of the building stock is damaged and its associated
standard deviation, the standard deviation of critical wind gust speeds, and the
gust range over which building damages reach complete destruction. Further
description of the mathematical details and the three-step calibration
procedure is given in Sect.

For an exemplary district, Fig.

Bringing together the four different models, the two wind gust data sources, and the
modelling procedure (Sect.

Due to the high level of detail, the presentation of results is focused on three distinct aggregation levels: (i) daily loss per district, (ii) daily countrywide losses, and (iii) countrywide losses caused by the six most severe storm events during their entire passage duration.

In case of models

The circumstances of comparing two deterministic and two probabilistic models require the choice of a common metric. The output of the deterministic models is hence considered equivalent to an expected value obtained from the probabilistic models and forms the basis of the model intercomparison.

While temporal or spatial aggregation generally leads to a convergence of model estimates and observations, strong variability is expected for daily storm loss estimates on the fine district scale.

On the basis of root-mean-square error, we define a coefficient of variation

Table

Due to the fact that the district resolution exceeds the resolution of
sampling points of the wind field, a strong influence of the choice of gust
data are expected. Figure

Coefficients of variation of the root mean squared error per district,
evaluated for the entire 11-year modelling period. Depicted is the minimum
value of CV

On country level, the predicted daily loss ratio (expected value) for each
model is plotted versus observed losses using a double-logarithmic scale.

Our second appraisal of the model performance is based upon countrywide daily losses. The spatial aggregation has the beneficial effects of reducing loss variability and yielding a high number of otherwise spatially separated loss events.

Figure

First of all, the loss predictions from all models exhibit a very high variability in the range of few orders of magnitude. Since the variability cannot be significantly reduced by model choice, it may be a consequence of other aspects, such as the stochastic nature of the building damage, measurement error of gust speed, or the omission of further explanatory parameters. Secondly, the model variability appears nearly symmetric on the log-scale, indicating a strongly skewed distribution. In this case, expected values may be significantly lower than loss observations that fall into the upper tail of the uncertainty distribution.

Two models,

Spatial averages of the coefficient of variation (RMSE) for each model. For ease of comparison, values are sorted in ascending order. The respective model is indicated by the colour code. The spatial extent is defined by the four geographic regions (north, east, south, west) depicted in the map inset.

When considering the binned loss ratios (black circles) in
Fig.

For ERA-Interim-driven simulations, Fig.

The similarity of results drawn from DWD and ERA-Interim wind gust data prevails for all further model results and we hence focus the subsequent discussion on DWD-based model estimates. The quality (performance) of wind gust data in the context of storm damages is beyond the scope of the work in hand. For special interest we provide results corresponding to ERA-Interim in the Supplement.

It is evident from an economic (or insurance) point of view that the
performance for small and mid-range damages should be disregarded in case
better performance is achieved for large loss events. In our further analysis
we accommodate for this aspect by applying the loss categories defined in Table

In order to compare model results over different loss ranges, we apply
a simple scale-independent pairwise statistical test based on binomial
statistics. For each pair of models, Table

As the binomial test itself does not disclose why any specific model
outperforms a competitor, we interpret the results of each model in
conjunction with the MAPE (mean absolute percentage error) and the MPE (mean
percentage error). Table

Results from a binomial test for prediction accuracy of the
different models based on daily loss estimates calculated from DWD wind gust
data. The model of each column is tested against each row of competing models
and across loss classes (as defined in Table

Estimates of the mean absolute percentage error (MAPE) and mean
percentage error (MPE) for each of the competing models and across loss
classes (as defined in Table

Dates of the six most severe winter storms during the period
1997–2007

For extreme losses in loss class I the binomial test gives prevalence to the
model

Considering loss class II, all models show a strong tendency to overestimate
large losses. Here, the smallest bias is produced by

In contrast, moderate losses in class III illustrate a completely different
behaviour. The biggest change arises for

All above metrics were based on model estimates obtained from DWD wind gusts
(cf. Fig.

Having so far considered only single loss days, Fig.

In addition to the expected value obtained from the full training sample,
estimates of the expected value obtained from the jackknife
resampling give an indication of the robustness of the model fit. A large
spread of jackknife estimates, e.g. as seen for the model

Robustness is of particular concern, since the short training period may not always contain very severe storms, and, hence, the storm damage function must reliably extrapolate beyond its support. Empirically, this aspect is illustrated most prominently for winter storms Jeanett and Kyrill, both affecting approximately the same geographical region.

In the case of model

With the exception of winter storm Lothar, model

Model estimates for the six most severe winter storms in the period 1997–2007
based on DWD data. Red circles indicate the expected value obtained from
models trained on the full 10-year data, while the red dots represent
expected values from the 9-year resampled (jackknife) training
periods. For models

A similarly robust behaviour is shown by model

The least constrained model

Finally, Fig.

All of the four different damage functions discussed herein exhibit a loss increase that is much more rapid than a cubic power law derived from physical considerations about the kinetic energy of the wind mass. In this section, we propose a simple mechanism to reconcile the steep loss increase with a cubic power law. With our hypothesis we intend to expedite the discussion on the overall shape of the damage curve, since its behaviour beyond the support has strong implications for the extrapolation of loss.

Figure

We make the hypothesis that the steep loss increase that is observed from the
GDV data may be a consequence of the presence of such a loss threshold.
Mathematically, when applying a threshold

Assuming a log-normal uncertainty distribution, Fig.

To be consistent, both LR curves given in Fig.

The non-linear processes behind wind and non-wind damage, as well as the
effects of cascading failure of structural components, entail that
reduced-form approaches as discussed here may only approximate the actual
storm damage characteristics. In order to assess the robustness and quality
of macroscale storm damage functions, we have analysed and compared the
results of four different models applicable to the European winter storm
season. As a growing body of climatological research indicates, an increase
in future storm intensity

Before we discuss the detailed results of the comparison, it is important to acknowledge the effect of deductibles on the shape of damage functions derived from insurance data. Care must be taken as to what extent physical damage concepts, such as a cubic wind–damage relationship, may be applied to insured storm loss. In this regard, all four compared damage functions exhibit a much stronger increase of loss, which is in good agreement with the GDV data employed herein. However, by introducing a simple loss threshold we could demonstrate how such a steep damage function for winter storm loss could be reconciled with a purely cubic wind–damage relationship. If, as climatological research suggests, future storm intensities increase beyond current levels, the overall shape of the damage function plays a crucial role for the extrapolation of future losses. With our threshold hypothesis we intend to expedite the discussion on the validity of damage functions beyond their original data support.

Storm-related insured losses generally exhibit a very broad distribution with a high dynamic range that spans several orders of magnitude. The loss distribution is highly skewed with very few extreme loss events dominating total annual loss. These two aspects pose severe difficulties for both the calibration and the evaluation of damage functions.

Ranking of the four damage functions according to their prediction quality, variability, and applicability.

With a focus on the level of extreme losses, least-squares curve fitting has
often been employed to calibrate damage curves to loss data. The combination
of skewed loss distribution and heteroscedastic variance seen for the case of
GDV data suggests a violation of the basic assumption for least-squares
fitting and potentially leads to biased results. Due to the high dynamic
range even temporally or spatially aggregated loss figures, as used in the
cubic excess-over-threshold damage function by

The optimal curve fitting procedure remains a matter of discussion. Relying
on the assumption of general damage relation valid for a large range of
losses, the probabilistic power law damage function by

As was seen in Fig.

Transferability is one of the biggest challenges of empirical damage
functions. All of the discussed damage functions require substantial
calibration to loss data. On the one hand,

From a practical point of view, model

In order to assess the countrywide performance of the different models,
a simple binomial test was devised. In conjunction with the more traditional
metrics MAPE and MPE, it was shown that models

The applicability of model

Overall, similar behaviour is found for ERA-Interim-based results which are
given in the Supplement. A peculiar difference is that for the class of
extreme loss days model

Generally, the obtained results were irrespective of either DWD or ERA-Interim wind gust data. Not surprisingly, ERA-Interim-based results showed greater variance than those based on direct wind gust observations. Interestingly, on district level, the estimated coefficients of variation reveal a marked increase of model variance from the west to the east of Germany.

Further analysis of the coefficient of variation emphasized the importance of
the interplay between damage function and the particular wind gust
distribution (from either DWD or ERA-Interim). Strong interdependence was
seen for model

It is worthwhile to note that the coefficient of variation indicates a strong
level of

In our comparison, it would not be meaningful to draw a unique conclusion on the suitability of
each model as the performance may crucially depend on the purpose for which
it is applied. In the light of this limitation, the exponential modelling approach was found
less adequate for the modelling of extremes. In contrast, model

Both probabilistic models provided good results over a wide range of loss
(moderate to extreme), with their model differences being much smaller than
the general variability of losses. On the regional level, they yielded
smaller coefficients of variation than the two deterministic models. While
models

The assumption of an exponential damage relationship is not uncommon in the
related literature

Mathematically, the damage function is comprised of a simple exponential term for the loss ratio,

Due to the high dynamic range of the loss data and their inherent
heteroscedasticity, the damage function cannot be calibrated directly via
least-squares. Similarly to the approach for model

Complementary to the loss magnitude, the probability of loss occurrence
(i.e. of receiving one or more loss claims) is given by the relationship

In conjunction, loss occurrence and loss magnitude yield the stochastic
expression for the loss ratio

For high wind gust speeds

Equation (

Both components of the damage function are calibrated separately. The log-normally distributed loss magnitude is fitted via maximum likelihood to the empirical loss. A least-squares approach is used to fit the loss occurrence term against empirical occurrence rates derived from binned data, enforcing parameter constraints such that the loss occurrence probability is bound within the interval [0, 1].

Keeping in mind the high dynamic range of loss claims with few dominating
extreme losses, the linear regression implicitly puts a strong emphasis on
extreme losses ensuring that these are closely matched (cf. Fig.

The shape of the damage function is determined by the power law term, which is influenced only by the 98th wind gust percentile. We chose to determine the 98th percentile from the same training sample as used for calibration of the remaining parameters.

The value of this threshold is of particular interest, as it controls the
shape and with it the steepness of the damage function. To clarify this
statement, we relate the cubed power law term of the damage function with
a tangent based on a simple power law without threshold. For every gust speed

Equation (

Hence, the steepness of the model is dependent on the wind gust data source,
which may have a potential impact on the portability of the damage function.
Additionally, the high local exponents around 10 indicate a similarity with
other models that report exponents of a similar magnitude, e.g.

For data-scarce applications, it may be opportune to resolve regional
portfolio differences via population density as a proxy for (insured) value
and obtain a global parametrisation via regression on the national level

Finally,

The fundamental concept of model

Comparison of the parameter values obtained for the federal state of
Baden-Württemberg with those published by

For a portfolio of buildings, each with individual critical threshold,
a specific density distribution for

The loss ratio

Finally, uncertainty is introduced by assuming a Gaussian distribution

For calibration,

For the work in hand, these problems were solved via a three-step fitting procedure. In order to exclude the effect of noise, data below the 95th wind gust percentile were discarded during the fitting procedure.

In the first step, Eq. (

In the second step, the above described fitting procedure is used to
calibrate Eq. (

Thirdly, the parameters of the normal distribution describing the random
fluctuation of

Due to the strong deviation from the original least-squares fitting employed
by

While we report only those results that relate to the best performing model
setup, results from applying the Baden-Württemberg calibration to entire
Germany

We appreciate valuable discussions with U. Ulbrich and M. Boettle. We thank the German Insurance Association (GDV), the German Weather Service (DWD), and the European Centre for Medium-Range Weather Forecasts (ECMWF) for providing the data. This work was supported by the European Community's Seventh Framework Programme under Grant Agreement No. 308497 (Project RAMSES). Edited by: I. Didenkulova Reviewed by: three anonymous referees