This paper describes an approach to derive probabilistic predictions of local winter storm damage occurrences from a global medium-range ensemble prediction system (EPS). Predictions of storm damage occurrences are subject to large uncertainty due to meteorological forecast uncertainty (typically addressed by means of ensemble predictions) and uncertainties in modelling weather impacts. The latter uncertainty arises from the fact that local vulnerabilities are not known in sufficient detail to allow for a deterministic prediction of damages, even if the forecasted gust wind speed contains no uncertainty. Thus, to estimate the damage model uncertainty, a statistical model based on logistic regression analysis is employed, relating meteorological analyses to historical damage records. A quantification of the two individual contributions (meteorological and damage model uncertainty) to the total forecast uncertainty is achieved by neglecting individual uncertainty sources and analysing resulting predictions. Results show an increase in forecast skill measured by means of a reduced Brier score if both meteorological and damage model uncertainties are taken into account. It is demonstrated that skilful predictions on district level (dividing the area of Germany into 439 administrative districts) are possible on lead times of several days. Skill is increased through the application of a proper ensemble calibration method, extending the range of lead times for which skilful damage predictions can be made.

Severe weather events, and in particular severe winter storm events, cause a major share of economic losses due to natural disasters in Europe and in Germany (Munich Re; 2007, 2012, 2013) and regularly cause a number of human fatalities. To prevent human fatalities and reduce property losses caused by natural disasters, national and regional civil protection agencies need to be supported by effective weather warning systems. Within the Sendai Framework For Disaster Risk Reduction (UNISDR, 2015), it has been stated that for an effective disaster risk reduction an understanding of natural risks and their impacts is needed. This includes all aspects of disasters, such as vulnerability, capacity and exposure. With such understanding, and if possible, the ability to model the impacts of severe weather events, improved warning systems could be designed, supporting decision-making processes for civil protection agencies.

The modelling of winter storm damages in Germany has been carried out in a number of recent studies, including both deterministic approaches (Klawa and Ulbrich, 2003; Heneka and Ruck, 2008; Donat et al., 2011) as well as probabilistic approaches (Heneka and Hofherr, 2010; Prahl et al., 2012). These storm damage models provide means to translate observed or modelled gust wind speeds into local damage or loss ratios (i.e. losses normalized with the local sum of insured values). Depending on data availability, these models include a regionally specific parameter estimation to describe differences in local vulnerabilities resulting from local differences in building characteristics (compare e.g. Donat et al., 2011), for example. Rather than aiming at a quantitative model for predictions of loss ratios, here we employ a simple logistic regression model, aiming at the prediction of exceedance probabilities for defined loss thresholds. This model is similar to the first modelling step of the damage model described in Prahl et al. (2012).

In giving an estimate of the inherent uncertainty in the relationship between the maximum wind gust and damage, the statistical model uncertainties arising in the damage modelling step can be quantified. The second major source of uncertainty in storm impact predictions arises from meteorological forecast uncertainties. The latter uncertainty is commonly addressed by means of ensemble prediction systems (Palmer, 2000; Leutbecher and Palmer, 2008; Slingo and Palmer, 2011), which is why we base our study on the medium range ensemble prediction system operationally run at the European Centre for Medium-Range Weather Forecasts (ECMWF; Palmer et al., 2007).

Our approach thus allows us to address and quantitatively compare the two main uncertainties arising in the modelling chain: meteorological forecast uncertainty and damage model uncertainty. In particular, we study the effect of neglecting uncertainty information, as is commonly done when interpreting the ensemble mean of a forecast ensemble or applying a simple deterministic damage model neglecting the respective uncertainty.

The aim of this paper is to demonstrate the benefit of a fully probabilistic approach when predicting storm damages, which can form the basis for the design of risk-based warning tools. We furthermore aim at demonstrating the benefit (in terms of forecast skill) of an explicit and full treatment of the involved uncertainties within the modelling chain.

We structured the paper as follows. Section 2 describes the utilized data sources. The methodology, particularly the full modelling chain, is described in Sect. 3, including the verification methodology applied. Verification results are presented in Sect. 4, followed by discussion and conclusion in Sect. 5.

Insurance data on losses to residential buildings were provided by the
German insurance association, Gesamtverband der Deutschen
Versicherungswirtschaft e.V. (GDV). These comprise of daily data on
administrative district level, with areas ranging from about
40 km

For training of the probabilistic storm damage model, analyses from the operational assimilation cycle for the COSMO-EU model (Schulz and Schättler, 2014) are employed. As a specific configuration of the non-hydrostatic COSMO-Model (Rockel et al., 2008; Doms, 2011), COSMO-EU is operationally run at German Weather Service (DWD) covering the European domain in a resolution of 7 km, using 40 vertical levels with the lowest level 10 m above the ground. Forecasts are operationally initialized every 6 h (00:00, 06:00, 12:00 and 18:00 UTC) and performed for up to 78 h. The COSMO-EU assimilation scheme (based on a nudging methodology) is performed every 3 h (00:00, 03:00, 06:00, …, 21:00 UTC) and analysis files are written every hour. Here we use hourly 10 m wind gusts, which are extracted for each hour from the latest available analysis run. These are finally used to calculate daily maximum 10 m wind gusts. The COSMO-EU analyses are available for the period 2006–2011.

ECMWF has operationally run its Ensemble Prediction System (EPS) since 1992
(Molteni et al., 1996). This system is based on the same numerical weather
prediction (NWP) model that is used for the deterministic weather forecast,
the Integrated Forecasting System (IFS). However, in ensemble prediction
mode it is employed with a coarser vertical and horizontal resolution. The
latter has been successively increased from an initial resolution of

For the current study we use the 6-hourly output of instantaneous 10 m wind
speed of the 50 perturbed ensemble members operationally produced between
November 2000 and January 2010 (in

According to the data availability, the different modelling steps described in the following chapter are performed for different time periods. The statistical downscaling (compare Sect. 3.1) is developed on the basis of a set of 181 simulations for individual storm events during the period 1959–2010. The ensemble post-processing (compare Sect. 3.2) is performed for the years 2006–2009, for which both COSMO-EU analyses and ECMWF-forecasts are available. The training of the probabilistic damage model (compare Sect. 3.3) is performed for the years 2006–2011, for which both damage data and COSMO-EU analyses are available. Assessment of forecast skill is done for the period 2001–2009, for which ECMWF-forecasts and damage data are available.

Within the COSMO-EU domain, the global ECMWF-EPS forecasts were
statistically downscaled to the fine COSMO-EU resolution of approx. 7 km,
following the approach developed by Kruschke (2015). The basic concept of
this downscaling procedure is a multiple linear regression approach
quantifying the relationship of fine-scale surface gusts to the coarse scale
(instantaneous) surface winds given by the respective ECMWF-EPS forecast.
For each COSMO-EU grid-box (436 905 in total) an individual statistical
model, i.e. a regression equation, is established. This is done by
objectively choosing skilful predictors from a given set of potential
predictors. Essentially, these potential predictors are the EPS surface-wind
components and wind magnitudes scaled by the respective climatological
98th percentile (to achieve homogenisation with respect to orographic
effects) and subsequently interpolated (first-order conservative) to the
coarser of the analysed EPS resolutions, that is

The training of this statistical downscaling procedure and its evaluation (by three-fold cross-validation and several MSE-related metrics) is based on dynamical regionalization of 181 European winter storm episodes that was done by employing the numerical weather prediction model chain (global model GME and regional model COSMO-EU) of the German Weather Service (DWD). A comprehensive description of this statistical downscaling approach, as well as its development and evaluation is given by Kruschke (2015). This includes testing various other combinations of potential predictors and demonstrating that this approach outperforms (measured with respect to the mean squared error of wind gusts) a similar approach described by Haas and Pinto (2012), which is also based on multiple linear regressions. Kruschke (2015) additionally provided an effective quantification of uncertainties of the statistically modelled gusts. However, these uncertainty estimates are not used in the course of the current study.

The benefits from using ensemble prediction systems instead of single deterministic forecasts is the possibility to estimate the forecast uncertainty, which can differ for each meteorological situation. In practice, ensembles often systematically under- or overestimate this uncertainty, which is referred to as an under- or overdispersion. At the ECMWF, the method of singular vectors is used to generate a set of initial conditions that are used to calculate several members of a forecast ensemble with the intention to produce an optimal spread. It should be noted, that the ECMWF-EPS has been constructed so that its spread is optimized for medium-range forecasts, thus for forecasts of 3–5 days. Despite of such sophisticated techniques for the perturbations, ensemble forecasts still often tend to be under-dispersive. This means, that the spread of the ensemble members (the members being discrete random draws of the forecasted probability density function) may be too small and it may not reflect the full uncertainty inherent to the forecast. “Calibrating” the ensemble spread, which is part of sophisticated post-processing techniques, can thus help address such underdispersion of ensemble forecasts (see Bröcker and Smith, 2008). Several methods exist to calibrate a forecast ensemble, partly depending on the ensemble type (single-model, multi-model or lagged-averaged-forecasts). An overview of calibration techniques for medium-range forecasts can be found in Gneiting (2014). For this study, we apply the approach of Bröcker and Smith (2008). This method states a so-called ensemble dressing approach, whose purpose is to estimate the probability density function (PDF) of the ensemble, and can be used to adjust the spread. The chosen method has the advantage that it can represent different methods of ensemble dressing depending on the selected parameter set. It transforms the discrete members (50 in our case) to a continuous distribution function by combing kernel functions for each individual member. The ECMWF-EPS is a single-model ensemble and all of the members are indistinguishable. For this reason, all members are dressed by using the same Gaussian kernel. However, ensemble post processing is performed for each grid cell separately. Aside from depending on the specific forecast situation, the actual size of the Gaussian kernel is thus determined individually for each grid cell. The dressing is done using an affine ensemble transformed version of the original data (Bröcker and Smith, 2008). While the dressing is used to transform the discrete members to a distribution function, the affine transformation is used to eliminate biases from the raw forecasts. Parameters for the transformation as well as for the Gaussian kernel are estimated using the minimization of the continuously ranked probability score (CRPS; compare Gneiting and Raftery, 2007). The CRPS is a measure that describes the performance of an ensemble in its entity by comparing the forecast and observation cumulative distribution functions (CDFs).

In general, the aim of the method is the estimation of the entire PDF of forecasts, based on the 50 ensemble members. However, in our case, we are interested in deriving a corrected 50-member forecast ensemble, which is representative of this full PDF. This can simply be accomplished by randomly sampling the 50 members from the calibrated PDF. However, the calibration should not be interpreted for these individual members, since the method is designed to calibrate the ensemble properties (such as ensemble bias and dispersion) rather than the individual members' properties.

In the last step, the forecasts of near-surface maximum gusts are translated into probabilistic estimates for the exceedance of specified loss ratio thresholds (“damage occurrences”). Due to insufficient information about meteorological conditions on sub-grid scales (e.g. turbulent gusts induced through localized orographic features), as well as lack of knowledge on individual building characteristics, it is impossible to model damage occurrences on individual entity level in a deterministic manner. Instead, a statistical relation, valid for the total stock of buildings within a district, is derived, which shall enable the specification of probability estimates to express these uncertainties. To do so, logistic regression analysis is performed for each district. Damage occurrences, defined as the exceedance of loss ratio above a certain threshold, are derived from the observed loss ratio time series. The resulting time series are then related to daily maxima of near-surface wind gusts from the COSMO-EU analyses to train the logistic regression curve. For each district, wind gusts at the closest grid point from the centre of the district are used.

To be able to investigate the influence of the individual uncertainty sources (meteorological forecast uncertainty and damage modelling uncertainty) different probability forecasts are set up. Specifically, four different setups result from (i) treating no uncertainty resulting in deterministic forecasts, (ii) treating only meteorological forecast uncertainties, (iii) treating only damage-modelling uncertainty and iv) treating both uncertainty sources.

The derivation of probability forecasts for damage occurrences is straightforward in the case of individual (single) member forecasts, which is done simply by applying the logistic regression function (described in Sect. 3.3) to calculate a probability estimate for the given forecasted wind gust. Similarly, the logistic regression function can be applied to the ensemble mean. Resulting probability estimates include damage-modelling uncertainty, while neglecting meteorological uncertainties (setup iii). Additionally, meteorological forecast uncertainty information is taken into account by applying the transfer function to each ensemble member. Assuming the members to be equally likely, probability forecasts can then be calculated as the ensemble mean of the damage-occurrence probabilities derived for the individual ensemble member forecasts (setup iv). Similar to neglecting meteorological forecast uncertainties, the statistical uncertainty from the damage-modelling step can be neglected by assuming a stepwise function instead of the logistic regression curve (compare Fig. 1, top panel). This is done by assuming a probability of one in case the forecasted gust wind speed exceeds a critical threshold and a probability of zero otherwise. Though not restricted to this choice, we choose this critical threshold to correspond to the gust wind speed for which the probability from the logistic regression analysis is 0.5. No treatment of uncertainty is accomplished when applying this “deterministic” damage occurrence function to the ensemble mean forecast (setup i). Finally, probability forecasts can be generated by applying the “deterministic” damage occurrence function to individual ensemble member forecasts. Probability estimates are then again calculated by averaging over the resulting individual member probability (setup ii). Since this is either one or zero in the deterministic case, this is similar to the fraction of members exceeding the critical threshold for the gust wind speed.

The statistically downscaled wind gust ensemble forecasts are investigated
on grid-point basis by means of Talagrand diagrams (see e.g. Jolliffe and
Stephenson, 2003; Wilks, 2011). A Talagrand (or rank) histogram can be used
to illustrate model biases as well as an under- or overdispersion of the
ensemble. To construct the Talagrand diagram, the ensemble members are
ordered according to their rank for each time step and for each grid cell in
ascending order. The frequency of observations falling in between these
ranked ensemble members is counted. In a perfect ensemble, each rank would
be equally populated, meaning that each ensemble member is equally likely.
An asymmetry shows a bias, as too often the ranks of the weakest or the
strongest members are populated. If the Talagrand diagram has a

Illustration of the methodology to derive probabilistic impact prediction from ensemble-forecasted gust wind speed. Top panel: probabilistic storm damage function – logistic regression curve – relating the forecasted gust wind speed to a probability of damage occurrence. The dashed line indicates the deterministic version of such a damage function being zero below the critical threshold for gust winds and one above it respectively. Bottom panel: illustration of gust winds forecasted by a 10-member ensemble in solid lines. Dashed line indicates the ensemble mean.

Forecast quality of derived daily probability estimates for damages on
district level are assessed by means of the Brier score (Wilks, 2011), which
is the mean quadratic error of the probability forecast

Left panel: Talagrand diagram of statistically downscaled EPS forecasts, lead time 1 day (red), 5 days (green) and 9 days (blue) from January 2006 to January 2010. Right panel: Talagrand diagram of statistically downscaled and post-processed EPS forecasts, lead times 1, 5 and 9, from January 2006 to January 2010.

Confidence intervals on derived Brier scores are calculated by means of a
bootstrap method, randomly generating 10 000 BS

To assess the reliability of probabilistic forecasts, reliability diagrams –
relating forecasted probabilities to observed event frequencies – are
employed. In case of a perfectly reliable forecast, an event should be
expected in

To address “false alarms” and “missed events” in the case of
probabilistic forecasts, ROC (relative operating characteristics) curves are
considered. In case of the deterministic forecasts (no uncertainty
treatment), the hit rate

In a first step, the statistically downscaled ensemble forecasts were
verified against the COSMO analyses by means of the rank histogram statistics
described in Sect. 3.5. The resulting Talagrand diagrams for forecast lead
times of 1, 3 and 9 days (red, green and blue respectively) are shown in
Fig. 2 (left panel). First note that there is an asymmetry to the right-hand
side. For 1 day forecast lead time it is found that in nearly 40 % of the
cases, the observation is equal to or above the largest value of the
ensemble. At first sight, such frequency bias appears to be rather critical.
However, the absolute bias of the downscaled ensemble forecasts' (not shown)
range is only between 0.1 and 0.5 m s

Observed occurrences and forecasted probabilities for loss ratios
exceeding 0.0001 ‰ for 31 October 2006 (winter storm “Britta”).
^{©}GeoBasis-DE/BKG 2008).

To correct both bias and underdispersion, the ensemble post-processing technique after Bröcker and Smith (2008) was applied to the data. The Talagrand diagrams for the post-processed forecast (Fig. 2, right panel) shows nearly equally populated ranks. Slightly higher populations are found for the lowest and highest ranks. In case of forecast lead time of one day (red), the lowest and highest rank are populated with a frequency of about 0.05, which is roughly twice the frequency found for the intermediate ranks. In only 4 % of all forecasts, the observation falls below the lowest value and above the highest of the ensemble forecast members. Thus, the underdispersion is largely removed by post processing. For increasing lead time the remaining underdispersion further declines. Also, the Talagrand histograms for the post-processed ensemble (Fig. 2, right panel) show no considerable asymmetry, indicating that the bias found for the downscaled forecasts is removed.

The four different settings (as described in Sect. 3.4) are used to forecast storm damage occurrences from the statistically downscaled EPS forecasts. As an illustrative example, resulting forecasts on district level are visualized in Fig. 3 for 31 October 2006 (winter storm “Britta”). In about half of all 439 districts, the observed loss ratio within individual districts exceeded the threshold 0.0001 ‰. For a lead time of 1 day (forecasts initialized on 12:00 UTC of the previous day) the deterministic setup (no uncertainty treatment) forecasts such exceedance in considerably fewer districts. With a treatment of meteorological uncertainty only, non-zero probabilities are derived in a number of districts, for which the deterministic model does not forecast a threshold exceedance. However, large areas which had been affected by damages feature only probabilities below 10 %. The treatment of the uncertainty on damage occurrences in the case of winter storm “Britta” yields a rather different picture. Now probabilities of 20 % or higher are derived for most northern regions that recorded damages. Particularly considering the dressed ensemble forecasts, forecasts applying a treatment of both uncertainties feature probabilities higher than 40 % on most regions affected, while probabilities of 10–20 % are featured in southern regions where only a few individual districts recorded damages.

Considering longer lead times, it shows that treating both uncertainties (particularly by means of the dressed ensemble) seems to be advantageous compared to the methods disregarding uncertainty information. In this example, considering both uncertainty sources even 9 days in advance yields probabilities of 10–20 % in most of the areas affected, while neglecting the uncertainty information does not yield any signal with respect to damage occurrences.

Reliability diagrams (left panel) and ROC curves (right panel) for the forecasts (2006–2009) with lead time 3 days for the high loss threshold (0.001 ‰). The climatological event frequency is indicated as a dashed horizontal/vertical black line in the reliability diagram (left panel). Forecasts considering only the meteorological (damage model) uncertainty are shown in green (yellow). Forecasts with treatment of both uncertainty sources using the undressed (dressed) ensemble are shown in blue (red).

Of course, the quality of probabilistic forecasts cannot be judged by means
of single forecasts or single storm situations. Instead, a systematic
evaluation of forecast quality is performed by means of Brier score and Brier
skill score, which are objective measures for the quality of probabilistic
forecasts. By means of reliability diagrams, further insight is gained into
the calibration characteristics of the probabilistic forecasts. Additionally,
ROC curves are considered to systematically evaluate the potential forecast
quality in terms of “false alarms” or “misses”. Verification of damage
occurrence forecasts is performed for exceedances of a low threshold (loss
ratio

By means of the reliability diagrams (exemplarily shown for high impact
events in Fig. 4, left panel) it can be found that considering the
uncertainties inherent to the forecasts improves the reliability of
probabilistic forecasts significantly. In the case of the deterministic
forecasts (black circles) they show that in about 3 % of all cases for
which the forecasts reads “no event” a loss event has actually been
observed. Similarly, in about 97 % of the cases for which an event is
forecasted a loss event actually occurred. Considering the probabilistic
forecasts, it is found that if forecasted probabilities are low (

Lead time dependent Brier skill score (BSS; employing climatology as the reference forecast) for events with a loss ratio exceeding low threshold (0.0001 ‰) (left panel) and loss events with a loss ratio exceeding high threshold (0.001 ‰) (right panel) for the period 2006–2009. Shown in black symbols are verification results for the four different set-ups, red triangles show verification results using the ensemble dressing post processing method. 90 % confidence intervals from a bootstrapping method are shown as shaded areas.

Considering the example of winter storm “Britta” presented in Fig. 3, it
may be argued that by treating additional uncertainty sources the probability
estimates increase, which may lead to an increase in false alarms. However an
analysis using ROC curves (exemplarily shown for high impact events in
Fig. 4, right panel) shows that this is not the case. They show that using
the probabilistic forecasts, the hit rate (

Considering the Brier skill score (as described in Sect. 3.5) with the
climatology as a reference forecast it is confirmed, that the deterministic
forecasts of damage occurrences only yield very low skill on the first
forecast day (compare circles in Fig. 5). Considering meteorological
uncertainties for low-impact events (loss ratio

The situation is different in case of high impact events (loss
ratio

The spatial stratification by districts shows that forecast skill is not homogeneous over German districts (Fig. 6). In general, higher skill is found in northern regions. It can be assumed that this higher skill in northern regions is due to an increasingly flat orography. Over complex terrain, predictability of wind gusts can generally be assumed to be lower, which is thus consistent with the spatial differences in respect to the predictability of damage occurrences. Additionally the differences in skill might be influenced by the fact that the frequency of events with loss ratios exceeding the threshold is not constant throughout Germany. Since loss events are more frequent in the northern regions, skill might be larger in these regions. Furthermore, the spatial stratification also shows that skilful forecasts throughout Germany are only achieved through a treatment of the damage model uncertainty (Fig. 6), even for the shortest lead time of 1 day. Further improvement is achieved by full treatment of uncertainty, which has been quantified in the previous paragraphs.

Brier skill score (employing climatology as reference forecast) for
events with loss ratio exceeding low threshold (0.0001 ‰) in the
period 2006–2009 (^{©}GeoBasis-DE/BKG 2008).

A probabilistic approach to forecast local occurrences of damages due to winter storms was presented. The approach is based on a logistic regression analysis, relating daily maxima of near-surface gust wind speeds from meteorological analysis data to damage occurrences for individual districts within Germany, defined through the exceedance of the loss ratio over a specified threshold. Due to unknown meteorological conditions on subgrid scales as well as unknown details on individual housing characteristics, it is impossible to model damage occurrences on an individual building level in a deterministic manner. Instead, only a statistical relation valid for a certain stock of buildings within a district can be derived. The probability estimates for specific gust wind speeds then reflect the damage model uncertainty arising from unknown details on unresolved spatial scales. Another uncertainty in the relation between gust wind and damage probability arises from the fact that from a data point of view, hail-induced damages cannot be distinguished from wind-related damages in the dataset we use. According to the provider of the dataset (GDV), winter months are dominated by windstorm damages while summer is dominated by hail-induced damages. However in rare cases of severe winter storm events, hail damages may occur. For example, it is known that damaging hail occurred during the frontal passage of storm Kyrill in 2007 (Fink et al., 2009). Taking into account the occurrence of hail and resulting damages could be done based on additional predictors such as the “convective available potential energy” (CAPE) and “convective inhibition” (CIN). Based on a logistic regression model with multiple predictors, both the individual effect of hail but also the contribution of hail in the case of winter storms could be quantified. It can be assumed, that the probability of hail will increase in case of the most severe winter storm events. Thus, for high gust winds the damage probability forecasts (which neglect the effects of hail) might be underestimated. Considering the reliability diagrams for the probabilistic forecasts (exemplarily shown in Fig. 4, left panel) we do find such underestimation of the probability forecasts. However, a more in depth analysis is needed to clearly attribute this to effects due to hail. This has not been the scope of this paper but we plan to address this in further research.

When forecasting winter storm damages, further uncertainty arises due to meteorological forecast uncertainties. In this study, these uncertainties were addressed by applying the storm damage model to the operational EPS system of the ECMWF. Since the resolution of the ECMWF-EPS is too coarse, a statistical downscaling was applied to obtain near-surface wind gusts on the COSMO-EU grid (7 km).

In a first step, the statistically downscaled gust winds were verified against meteorological analyses, indicating a bias of the ensemble predictions towards lower gust wind speeds. In addition, the ensemble predictions were found to be under-dispersive, thus showing too little ensemble spread, which indicates an underestimation of uncertainty by the ensemble. By applying the probabilistic storm-damage model to the ensemble forecasts the influence of the individual uncertainty sources (meteorological forecast uncertainty and damage-model uncertainty) has been investigated. Results show that neglecting the statistical uncertainty arising within the damage model leads to rather poor forecast skill. Particularly for low-impact events and for short lead times, the damage model uncertainty is found to dominate the overall uncertainty. This reflects the fact that meteorological forecast uncertainties are smaller at short lead times and particularly in the case of low-impact (low wind) situations where basically an ensemble mean forecast or even a single deterministic forecast is sufficient to derive reasonable forecasts.

With longer lead times, meteorological forecast uncertainties naturally play an increasing role. Particularly for high-impact situations (due to severe wind gusts) it was shown that meteorological forecast uncertainties cannot be neglected without severe deficiency in skill. This means that an explicit treatment of both uncertainties leads to considerable improvement in forecast skill. The reason for this can be found in the non-linearity of the relation between the meteorological parameter wind and resulting impact or impact probability. Basically such nonlinear relation implies the necessity of weighing ensemble members in a more complex fashion compared to simply calculating the ensemble mean of gust wind speeds. This nonlinear weighing is taken into account by the impact modelling step and subsequent ensemble averaging for the forecast quantity of interest (in this case impact probability). Thus, in such a situation an explicit treatment of uncertainty through the complete modelling chain is highly beneficial.

For short lead times and low-impact situations the effect from a treatment of both uncertainties is negligible. For large lead times (up to 9 days) this effect corresponds to a gain of one day in forecast lead time. For high-impact situations this effect is even larger, corresponding to a gain of 2–3 days lead time. Both bias and underdispersion of the ensemble forecasted gust wind speeds have been treated by applying an ensemble post-processing method (ensemble dressing), which is found to effectively compensate both shortcomings. Using the ensemble dressed gust winds as the basis for the damage occurrence forecasts shows additional forecast skill corresponding to a gain of 1–2 days lead time. This gain is particularly large at shorter lead times of a few days, for which a greater bias as well as a larger underdispersion in forecasted gusts has been found.

Overall, this study shows, that in the case of winter storm damages, skilful predictions of storm loss occurrences on lead times of several days can be made using the presented (fully probabilistic) framework to integrate meteorological forecast uncertainties and uncertainties resulting from a downstream impact model. Such quantification of both potential impacts of severe weather and their respective likelihood forms the basis for developing risk-based warning systems. By quantifying impacts and their likelihood, which is particularly relevant to recipients, the acceptance of weather warnings might be strongly enhanced. As one of the first national weather services, the UK Met Office has recently moved on to a risk-based warning system (Neal et al., 2013). The basis of such a warning system is formed by the risk matrix, composed of the two dimensions impact and likelihood. By quantification of both these dimensions, the presented framework can thus directly feed into such a warning system.

The data set on insured losses is property of the Gesamtverband der Deutschen Versicherungswirtschaft e.V. (GDV) and is not available to the public. Inquiries concerning data usage should be directed to GDV.

Information on the availability and accessibility of the operational COSMO-EU analyses can be found in Schulz and Schättler (2014). Inquiries about data usage should be directed to Deutscher Wetterdienst.

Operational ECMWF forecast data are described in Palmer et al. (2007) and are accessible for authorized users via the ECMWF (ECMWF, 2016).

The statistically downscaled gust forecasts and resulting damage probabilities, generated as part of this work, are intellectual property of Freie Universität Berlin and are not available to the public. Researchers interested in scientific collaboration and data usage are asked to contact the authors.

This research was carried out in the Hans Ertel Centre for Weather Research. This research network of universities, research institutes and the Deutscher Wetterdienst is funded by the BMVI (Federal Ministry of Transport and Digital Infrastructures). Furthermore, contributions to this work have been funded by the Federal Ministry of Education and Research in Germany (BMBF) through the research program MiKlip (FKZ: 01LP1104A) and by Munich Re. We wish to thank the Gesamtverband der Deutschen Versicherungswirtschaft e.V. (GDV) for providing the loss data. We are also grateful to the German Weather Service (DWD) and the ECMWF for providing access to the EPS data. We would like to thank the editor Sven Fuchs and the two anonymous reviewers, whose constructive comments helped to improve the article. Edited by: S. Fuchs Reviewed by: two anonymous referees