The XWS open access catalogue of extreme European windstorms from 1979 to 2012

. The XWS (eXtreme WindStorms) catalogue consists of storm tracks and model-generated maximum 3 s wind-gust footprints for 50 of the most extreme winter wind-storms to hit Europe in the period 1979–2012. The catalogue is intended to be a valuable resource for both academia and industries such as (re)insurance, for example allowing users to characterise extreme European storms, and validate climate and catastrophe models. Several storm severity indices were investigated to ﬁnd which could best represent a list of known high-loss (severe) storms. The best-performing index was S ft , which is a combination of storm area calculated from the storm footprint and maximum 925 hPa wind speed from the storm track. All the listed severe storms are included in the catalogue, and the remaining ones were selected using S ft . A comparison of the model footprint to station observations revealed that


Introduction
European windstorms are extratropical cyclones with very strong winds or violent gusts that are capable of producing devastating socioeconomic impacts. They can lead to structural damage, power outages to millions of people, and closed transport networks, resulting in severe disruption and even loss of lives. For example the windstorms Anatol, Lothar and Martin that struck in December 1999 inflicted approximately USD 13.5 billion (indexed to 2012) worth of damage, and led to over 150 fatalities (Sigma, 2007(Sigma, , 2013. By cataloguing these events, the intensity, location and frequency of historical windstorms can be studied. This is crucial to understanding the factors that influence their development (such as the North Atlantic jet stream or the North Atlantic Oscillation), and for evaluating and improving the predictions of weather and climate models.
Publicly available historical storm catalogues, such as HURDAT (Landsea et al., 2004) and IBTRACS (Levinson et al., 2010), are widely used in the tropical cyclone community. These catalogues provide quantitative information about historical tropical cyclones, including observed tracks of storm position and intensity. Tropical cyclone catalogues are an essential resource for the scientific community and are used to understand how climate variability modulates the development and activity of tropical cyclones (e.g. Ventrice et al., 2012) and for evaluating climate models (e.g. Strachan et al., 2013;Manganello et al., 2012). These catalogues are also widely used within the insurance and reinsurance industry to assess risks associated with intense tropical cyclones.

Published by Copernicus Publications on behalf of the European Geosciences Union.
Despite the utility of tropical cyclone catalogues, no comparable catalogue of European windstorms currently exists. One of the last major freely available catalogues was that of Lamb (1991). This catalogue has not been digitised and is now long out of date. More recent catalogues only contain information on storm intensity (Della-Marta et al., 2009), only pertain to a specific country (e.g. Bessemoulin, 2002), or are not publicly available. The XWS catalogue, available at www.europeanwindstorms.org, aims to address this gap, by producing a publicly available catalogue of the 50 most extreme European winter windstorms. The catalogue consists of tracks and model-generated maps of maximum 3 s wind gusts at each model grid point over a 72 h period for each storm (hereafter the maps are referred to as the storm footprints, and 3 s wind gusts as gusts).
In order to create the catalogue, several scientific questions had to be addressed: 1. What is the best method for defining extreme European windstorms?
2. How well do the model storm footprints compare with observations, and what are the reasons for any biases?
3. What is the best way to recalibrate the footprints given the observations?
This paper describes how the above questions were addressed to produce the XWS catalogue. The paper is structured as follows: Sect. 2 describes the data and methods used to generate the storm tracks and footprints, and Sect. 3 describes the method used to select the 50 most extreme storms. Section 4 evaluates the storm footprints using weather station data, and Sect. 5 describes the recalibration method. Conclusions and future research directions are discussed in Sect. 6.

Data
This section describes the data sets and models used to produce the data for the 50 extreme European windstorms in the XWS catalogue, which consists of: -tracks of the 3-hourly locations of the maximum T42 850 hPa relative vorticity, minimum mean sea level pressures (MSLP) and maximum 925 hPa wind speed over continental European and Scandinavian land within a 3 • radius of the vorticity maximum, from the ERA Interim reanalysis identified by an automated cyclone tracking algorithm (Hodges, 1995(Hodges, , 1999; -maximum 3 s gust footprints over a 72 h period using the ERA Interim reanalysis dynamically downscaled using the Met Office Unified Model; -recalibrated maximum 3 s gust footprints using Met Office Integrated Data Archive System (MIDAS) weather station observations.
The details of the storm tracks and the modelled footprints will be described below. The details of the recalibration will be discussed in Sect. 5.

Storm tracks
Storms are tracked in the European Centre for Medium Range Weather Forecasts (ECMWF) Interim Reanalysis (ERA Interim) data set , over 33 extended winters (October-March 1979/80-2011). The identification and tracking of the cyclones is performed following the approach used in Hoskins and Hodges (2002) based on the Hodges (1995Hodges ( , 1999 tracking algorithm. This uses 850 hPa relative vorticity to identify and track the cyclones. Previous studies (Hodges et al., 2011) have used 6-hourly reanalysis data, but here 3-hourly data are used to produce more reliable tracks since some extreme European windstorms have very fast propagation speeds. In addition to producing 6-hourly reanalyses, the ERA Interim suite produces two 10-day forecasts initialised at 00Z and 12Z every day . To create the 3-hourly data set the outputs valid at 03Z and 09Z from the forecast initialised at 00Z and the outputs valid at 15Z and 21Z from the forecast initialised at 12Z were combined with the 6-hourly analyses. Before the identification and tracking progresses the data are smoothed to T42 and the large-scale background removed as described in Hoskins and Hodges (2002), reducing the inherent noisiness of the vorticity and making tracking more reliable. The cyclones are identified by determining the vorticity maxima by steepest ascent maximisation in the filtered data as described in Hodges (1995). These are linked together, initially using a nearest-neighbour search, and then refined by minimising a cost function for track smoothness (Hodges, 1995) subject to adaptive constraints on the displacement distance and track smoothness (Hodges, 1999). These constraints have been modified from those used for 6-hourly data to be suitable for the 3-hourly data. Storms that last longer than 2 days are retained for further analysis. The algorithm identified 5730 storms over the 33 yr period in a European domain defined as 15 • W to 25 • E in longitude, 35 to 70 • N in latitude; 50 of these storms were selected for the catalogue as described in Sect. 3.
The MSLP and maximum 925 hPa wind speed associated with the vorticity maxima are found in post-processing. This is done by searching for a minimum/maximum within a certain radius of the vorticity maximum. A radius of 6 • is used for the MSLP. For the 925 hPa wind speed, radii of 3, 6 and 10 • were tested but only the results for 3 • are given as this was found to be the best indicator of storm severity (see Sect. 3.1). For the MSLP the location of the minimum is only given if it is a true minimum. If not, the MSLP value given is that at the vorticity centre.

Dynamical downscaling
To achieve the best storm representation, the highest resolution hindcast data set available at the Met Office at the time of starting the project was selected. This data set was generated by dynamically downscaling ERA Interim (T255 ∼ 0.7 • ) to a horizontal resolution of 0.22 • (equivalent to ∼ 24 km at the model's equator). The 0.22 • resolution data set covers the entire ERA Interim period  at the time of making this storm catalogue). The atmospheric model used to perform the downscaling is the Met Office Unified Model (MetUM) version 7.4 (Davies et al., 2005). The model's non-hydrostatic dynamical equations are solved using semi-Lagrangian advection and semi-implicit time stepping. There are 70 (irregularly spaced) vertical levels, with the model top being 80 km.
The downscaled region covers western Europe and the eastern North Atlantic (hereafter referred to as the "WEuro" region), and is shown in Fig. 1. The 0.22 • MetUM grid uses a rotated pole at a longitude of 177.5 • and latitude 37.5 • so that the grid spacing does not vary substantially over the domain 1 .
To create the data set, the 0.22 • MetUM is initialised every day at 18Z using full-field initial conditions from the reconfigured ERA Interim analysis at that time. The 0.22 • MetUM runs for 30 model hours, using lateral boundary conditions generated from the ERA Interim 6-hourly analyses. The first 6 h of model output are disregarded due to spin-up, allowing the model to adjust from the ECMWF IFS (the ECMWF Integrated Forecast System, ECMWF, 2006) initial conditions, leaving data for 00Z to 00Z on each day. This results in daily 24 h forecasts which are combined to create a new, higherresolution data set for the entire ERA Interim period. By re-initialising the 0.22 • MetUM runs every 24 h, deviations from ERA Interim in the centre of the model domain should be minimised.

Creating the windstorm footprints
For this catalogue the footprint of a windstorm is defined as the maximum 3 s gust at each grid point over a 72 h period during which the storm passes through the domain. The 72 h period was centred on the time which the tracking algorithm identified as having the maximum 925 hPa wind speed over land 2 within a 3 • radius of the track centre. The 72 h duration was chosen because it is commonly used in the insurance industry (Haylock, 2011), although lifetimes of windstorms can be longer than this. However, by centring the 72 h period at the time mentioned above, the footprints should capture the storms during their most damaging phase. The 3 s gust has been shown to have a robust relationship with storm damage (Klawa and Ulbrich, 2003), and is commonly used in catastrophe models currently used by the insurance industry.
Maximum 3 s gusts at a height of 10 m, which output every 6 h and give the maximum gust achieved over the preceding 6 h period, from the 0.22 • MetUM data set are used to create the footprints.
In the MetUM the gusts are estimated using the relationship U gust = U 10 m +Cσ , where U 10 m is the 10 m wind speed and σ is the standard deviation of the horizontal wind, estimated from the friction velocity using the similarity relation of Panofsky et al. (1977). C is a constant (although it is modified over rough terrain) determined from universal turbulence spectra. The value of C is set so that there is a 25 % chance that the resulting 3 s wind gust will be exceeded (Lock and Edwards, 2013;Beljaars, 1987).
It should be noted that there are several other techniques available for estimating wind gusts, as described in Sheridan (2011), andBorn et al. (2012) showed that different parameterisation schemes can sometimes lead to differences of up to 10-20 m s −1 in the estimated gust at a particular site. A commonly used alternative method for predicting gusts is to use the maximum wind speed at the vertical levels from which momentum may be transported to the surface (e.g. Brasseur, 2001). This method is argued to be more physically based, although it is not clear if the method adds a significant improvement to the gust estimates (Sheridan, 2011). Biases arising from the gust parameterisation are discussed further in Sect. "Underprediction of high gusts for low-altitude stations".
Footprints were created for each of the 5730 storms identified by the tracking algorithm applied to ERA Interim ; see Sect. 2.1).

Footprint contamination
The European extratropical cyclones identified by the tracking algorithm are relatively frequent events. Over the 33 extended winters that have been tracked, on average 2.5 events pass through the domain in any given 72 h period. Furthermore, extratropical cyclones exhibit temporal clustering (Mailier et al., 2006), which could result in days with even more storms. The highest number of cyclones passing through in a single 72 h period is 8, for the period starting at 00Z on 6 February 1985.
Footprints are therefore likely to include gusts from two or more cyclones. This can create problems when trying to attribute damage to a particular event. To attempt to isolate the footprint to a particular cyclone, all the gusts within a 1000 km radius 3 of the track position at each 6 h time step are assumed to be associated with that particular cyclone and all other data are rejected. The "decontaminated" footprint is then derived by taking the maximum of these gusts at all grid points where there are data remaining, within the 72 h period of the cyclone.  18Z 19 January 2007 respectively, derived by taking maximum gusts at every individual grid point in the whole domain (the contaminated footprints). The footprints are almost indistinguishable, and are dominated by one large event. Figure 3 shows the footprints for the same cyclones, but created using the decontamination method described above. The new footprints show that cyclone 4769 was in fact a very weak event over southern France and the Mediterranean Sea, cyclones 4773 and 4774 are northern storms which did not make much impact on land, and the dominant event is cyclone 4782, centred on 15Z 18 January 2007, which is the famous storm Kyrill (January 2007).
The uncontaminated footprints are used for the calculation of the storm severity indices (see Sect. 3), although in the catalogue both the contaminated and uncontaminated footprints are available.

How to select extreme windstorms
Fifty of the most extreme storms of the 5730 identified by the tracking algorithm (Sect. 2.1) have been selected for the XWS catalogue. The challenge was to define an index to quantify the "extremeness" of a storm, on which to base the storm selection.
A storm can be defined as extreme in many ways -for example in terms of a meteorological index, or extreme values of insured losses i.e. a severe event (Stephenson, 2008 Sigma (2004Sigma ( , 2006Sigma ( , 2007Sigma ( , 2009Sigma ( , 2011Sigma ( , 2012Sigma ( , 2013. Losses have been converted to be indexed to 2012 values. that here severity is defined in terms of total insurance loss, however other measures are possible such as human mortality, ecosystem damage, etc. The aim here was to find an optimal objective meteorological index that selects storms that were both meteorologically extreme and severe. Expert elicitation with individuals in the insurance industry led to the identification of 23 severe storms in the period 1979-2012 (Table 1) which would be expected to be included if considering insured loss only, over the whole European domain. The most successful meteorological index is considered to be the one that ranks most of these 23 severe as extreme (defined as category C storms in Sect. 3.2, Fig. 4a).

Possible meteorological indices
Meteorological indices from both the track and footprint of the storm were investigated. Of the track indices (maximum T42 950 hPa relative vorticity, minimum MSLP, and maximum intensity of the storm, U max , defined as the maximum 925 hPa wind speed over European and Scandinavian land within a 3 • radius of the vorticity maximum), U max was found to perform the best, giving the most storms in category C. Radii of 6 and 10 • were considered, both resulting in a slightly poorer performance by the index (fewer storms in category C). From the footprint, the size of the storm (N) was considered, defined as the area of the (uncontaminated) footprint that exceeds 25 m s −1 over continental European and Scandinavian land. A threshold of 25 m s −1 was used as it is recognised as being the wind speed at which damage starts to occur. In Lamb (1991) it was noted that wind speeds of 38-44 knots (19.5-22.6 m s −1 ) damage chimney pots and branches of trees and wind speeds of 45-52 knots (23.1-26.8 m s −1 ) uproot trees and cause severe damage to buildings. Indices U max and N can be combined to form a storm severity index (SSI). Numerous SSIs have been developed with their uses ranging from the estimation of the return period of windstorms over Europe (Della-Marta et al., 2009) to understanding how windstorms will change under anthropogenic climate change (Leckebusch et al., 2008). An SSI was used to rank the storms in the catalogue of extreme storms over the North Sea, British Isles and Northwest Europe by Lamb (1991). The SSI used by Lamb (1991) is based on the greatest observed wind speed over land (V max ), the area affected by damaging winds (A) and the overall duration of occurrence of damaging winds (D). Damaging winds were defined as those in excess of 50 knots (25.7 m s −1 ): The number of storms in category C (n C ) for the top n B + n C storms, for index S ft . (Lamb, 1991, p. 7). A similar SSI can be derived by combining the track index U 3 max (intensity) and footprint index N (area), and assuming that the duration of all storms is 72 h in accordance with the insurance industry definition of an event (Haylock, 2011): An alternative to U max can be calculated from the footprint rather than the track, by taking the mean of the excess gust speed cubed at grid points over European and Scandinavian land. Combining with index N, this gives an SSI calculated from the footprint only: where u i is the maximum gust at grid point i in the footprint. A relative local 98th percentile threshold can be used as an alternative to the fixed threshold, as in Klawa and Ulbrich (2003). This threshold implies that at any location storm damages are assumed to occur on 2 % of all days. This adaptation to wind climate can also be expected to affect the degree to which damage increases with growing wind speed in excess of the threshold value, hence the normalised rather than absolute winds are used. In Klawa and Ulbrich (2003) weather station data are used, but an equivalent SSI can be calculated from the footprint to quantify the advantage of using a relative threshold when predicting severity: where u 98,i is the 98 % quantile of maximum gust speeds during the period 1979-2012 at grid point i. In Klawa and Ulbrich (2003) the summand in Eq. (1) is multiplied by population density to calculate a loss index, but here the aim is to find a purely meteorological index for storm severity.
It should be noted that all of the indices investigated here are a function of gust or wind speed and area only. Duration of high winds and gusts may also relate to storm damage, so incorporation of this into the indices could be investigated in the future.

Results
The indices presented in Fig. 5 are related to one another. A positive association exists between U max and N and between S ft and S f98 . S ft has a stronger dependence on N than U max and the strongest extremal dependence exists between S ft and N. The 23 most severe storms are in the top 18 %, 7 %, 5 %, 10 % and 16 % of storms when ranked according to U max , N, S ft , S f (not shown) and S f98 respectively, hence severe storms are best characterised by a high value of S ft (Fig. 5e-h).
The catalogue will comprise of the 23 severe storms and 27 storms which are extreme in the optimal meteorological index. The intercept of the number of the 23 severe storms iin the top n B + n C storms and y = x − 27 identifies the number of storms in category C such that 50 storms are selected for the catalogue (Fig. 4b). U max , N, S ft , S f and S f98 give 15, 15, 17, 10 and 13 storms in category C respectively ( Table 2). The use of the relative threshold index S f98 results in more storms in category C than the fixed threshold equivalent, S f . The index S ft , however, maximises the number of storms in category C and therefore is the most successful index at identifying both meteorologically extreme and severe storms. Syst. Sci., 14, 2487-2501, 2014 www.nat-hazards-earth-syst-sci.net/14/2487/2014/ The location of the 50 storms selected by U max , N and S ft are broadly similar, concentrated around the UK and Northern Europe (Fig. 6a-c). These are Atlantic storms which are strong and well represented in the reanalysis data. Indices S f and S f98 select very similarly located storms (Fig. 6d). They both select "meteorologically extreme and not severe" (category B) storms that are located in the Mediterranean which are generally weaker and less well represented by the reanalysis data due to their small scale (Cavicchia et al., 2013). Of the 27 category B storms selected by S ft , 10 are also selected by U max , 9 by N and 4 by both U max and N, demonstrating that S ft selects an almost even combination of large-area and intense wind speed storms.

Nat. Hazards Earth
Indices U max , N, S ft , S f (not shown) and S f98 select events from 29, 26, 27, 30 and 30 yr out of the 33 yr period 1979-2012 respectively, hence all indices represent the period spanned by the XWS catalogue well (Fig. 7). A similar trend exists within the time series for all five indices, with more storms selected in the period 1985-1995 and fewer in the period 2000-2010 (Fig. 7). In summary, the index S ft is the most successful index at identifying severe storms. It depends on both the area and maximum wind speed intensity of the storm. The index S ft selects storms located over the UK and Northern Europe and samples storms over the full time period of the XWS catalogue, hence giving a good representation of the meteorologically extreme and severe Atlantic storms that occurred throughout the period. For these reasons S ft is the meteorological index used to select the 50 storms for the XWS catalogue. It is, however, worth mentioning that for specific regions or countries the performance of each index could be quite different; for example S f98 may perform better due to using local thresholds, and perhaps storm area (N ) may be less important.

Evaluation of MetUM windstorm footprints
Observational data were extracted from the MIDAS database. For each of the selected storms, all stations roughly within the WEuro domain which recorded maximum gusts during the 72 h period were used to evaluate the MetUM windstorm footprints. The gust data were a mixture of 1-, 3-and 6-hourly maximum gusts.
Example observational footprints for the storms Jeanette  where the high gusts occur, although it is difficult to confirm the exact affected region given the irregular locations of the observations. Figure 8c and f shows scatter plots of model maximum gusts against observed maximum gusts for all of the stations in the observational footprint for each storm. The MetUM maximum gusts for each specific station location were calculated using bilinear interpolation between grid points.
The scatter plots show that the gusts are scattered about the y = x line, meaning that in general the model gusts are in agreement with the observations. This result is especially impressive when considering that the model gusts have simply been interpolated from a ∼ 25 km grid to a specific location without applying any corrections. For the 50 storms in the catalogue, the mean root mean square (rms) error in the model gusts is 5.7 m s −1 (for stations at altitudes less than 500 m, and removing gusts for which the observations read 0 m s −1 which are believed to be erroneous). Syst. Sci., 14, 2487-2501, 2014 www.nat-hazards-earth-syst-sci.net/14/2487/2014/ However, two problems with the model are apparent from these scatter plots:

Nat. Hazards Earth
-For all storms there is a more dispersed population separate from the general population, below the y = x line. It was found that these points are mostly from stations with altitudes greater than ∼ 500 m (plotted in red).
-For a number of storms the plots of model vs. observed gusts appears to deviate from the y = x line, flattening off for observed gust speeds of greater than ∼ 25 m s −1 , showing that the model is underpredicting extreme gusts. In Fig. 8 this can be seen for the storm Kyrill, although the problem is not so severe for the storm Jeanette.
The first issue has been noted previously, and is a common issue with climate and numerical weather prediction models (e.g. Donat et al., 2010;Howard and Clark, 2007). It is caused by the use of an effective roughness parameterisation, which is needed to estimate the effect of subgrid-scale orography on the synoptic scale flow; however, it causes unrealistically slow wind (and hence gust) speeds at 10 m.
In Howard and Clark (2007) a method was proposed to correct for this effect, by estimating a reference height, h ref , above which the wind speeds are unaffected by the surface, and then assuming a log-profile to interpolate wind speeds back down to 10 m, using the local vegetative roughness, z 0 , rather than the effective roughness.
In this model only wind speeds on seven model levels were archived, which means that the estimation of wind speeds at h ref could be subject to large errors. Nevertheless, applying the correction to the storm Kyrill gave a clear improvement to the maximum 10 m winds for high-altitude locations, although the underestimation of extreme gust at lower altitudes remained (plot not shown). For this calculation h ref was estimated from orographic data at the resolution of the MetUM, but it is possible to estimate h ref from finerresolution data (as was done in Howard and Clark, 2007), which may further improve the correction.
It would be desirable to apply this correction to all of the storms in the catalogue, although the extraction of the archived data on all model levels is a time-consuming and costly process and cannot be done at present. Instead, altitude is used as a covariate in the recalibration model (see Sect. 5), so this bias should be corrected.

Underprediction of high gusts for low-altitude stations
Possible reasons for the underprediction of high gusts for low-altitude stations described above include (i) the gust parameterisation scheme used, and (ii) whether the model can reproduce the strong pressure gradients. It is unlikely that the underprediction is dependent on the storms' locations because the storms Jeanette and Kyrill passed through similar areas and have very similar observational footprints, yet Fig. 8g shows that the underforecasting in Kyrill is much more pronounced.
Regarding point (i), the gust parameterisation scheme should take into account the subgrid-scale and sub-time-step processes that lead to gusts. The parameterisation scheme used for this work is classed as non-convective (Sheridan, 2011), yet for some of the storms strong convective activity has been identified: the storm Kyrill featured strong convection along the cold front, which led to heavy precipitation, strong convective gusts and even tornadoes (Fink et al., 2009). In order to correct for this either convective gusts should be included in the parameterisation scheme, or a highresolution model which explicitly resolves convection should be used.
However, for some storms it appears that the underestimation of gusts stems from an underlying problem with the 10m winds. Fig. 9a shows a scatter plot of model against observational gusts and Fig. 9b shows model error in the maximum gusts against the model error in the maximum 10 m 10 min mean winds 4 at low-altitude (≤ 500 m) stations which recorded both these measures, for the storm Anatol (December 1999). The stations which recorded gusts greater than 25 m s −1 (which, as for Kyrill, is approximately when the model begins to systematically underpredict the gusts for this storm) are highlighted in green. Apart from a few outliers, in 4 For the observations the maximum 10 min mean winds are the maximum of the instantaneous 10 min mean winds which are recorded every 1, 3, or 6 h depending on the station. The model maximum wind speeds are the maximum of the instantaneous 10 m wind speeds which are output every 6 h. Since the model time step is 10 min, the model wind speeds should be comparable to the 10 min mean observed wind speeds. The true maximum wind speeds of both the observations and model may be underestimated, but given the strong correlation between error in maximum gusts and error in maximum wind speeds this does not appear to be significant. Fig. 9b all points lie approximately on the y = x line, and the behaviour of the gust errors > 25 m s −1 is similar to that of gust errors ≤ 25 m s −1 . The correlation coefficient between gust and wind errors, r, is 0.57 for stations which recorded gusts greater than 25 m s −1 (after removing outliers with gust and wind errors greater than 30 m s −1 ). This strong relationship indicates that for this case the errors in the underlying winds have a significant contribution to the gust errors.
To investigate whether the underprediction of the 10 m winds (leading to the underprediction of gusts) is due to the underestimation of strong pressure gradients (point (ii)), the observed and modelled minimum MSLP for the storm Anatol were compared. Figure 10d shows the observed minimum MSLP (over the same 72 h period over which the maximum gusts were taken) recorded at all stations where data were available. The minimum MSLP from the model over the same period is shown in Fig. 10e, and the model MSLP error (model minimum MSLP -observed minimum MSLP) in Fig. 10f.
These plots show that Anatol deepened earlier (further west) than the model predicted, and so the depth of the minimum MSLP over the UK is underestimated. The model and observations appear to agree on the location of an MSLP minimum over Denmark, southern Sweden, and extending into the Baltic states, although again the minimum over Denmark is underestimated.
A possible reason for failure of the model to capture the low over the UK is that the western boundary of the WEuro domain is too far east to capture the early stages of this storm well. If the storm develops outside the western boundary, when it enters the domain the 0.22 • MetUM is only being driven at the boundaries, so it may not simulate a low as extreme as in the reanalysis data. When the MetUM is reinitialised (every 24 h) with the storm already within the domain it then has the initial conditions to develop into an extreme event. This is expected to be more of a problem for rapidly moving storms which can travel quite far into the domain before reinitialisation. There is also the possibility that even once a cyclone has been correctly initialised, its track and intensity could deviate from observations over the next 24 h. The observational and model footprints of Anatol are shown in Fig. 10a and b, and the model gust error for stations with altitudes ≤ 500 m is shown in Fig. 10c. These plots show that the main regions where the model gusts are underestimated are over the UK, Denmark and northern Germany, just to the south of the regions where the model failed to reproduce the depth of the central MSLP, i.e. in regions where the model pressure gradients would be underestimated. Figure 10g shows the maximum model geostrophic winds against maximum observed geostrophic winds 5 for 5 For Fig. 10c, the observed geostrophic winds were estimated by reconstructing the observed 6-hourly mean sea level pressure field by bilinearly interpolating MSLP station recordings. The instantaneous geostrophic winds could then be estimated from ∂P /∂x and Nat. Hazards Earth Syst. Sci., 14, 2487-2501, 2014 www.nat-hazards-earth-syst-sci.net/14/2487/2014/   . Recalibrated footprints for Jeanette (row 1) and Kyrill (row 2). Column 1 shows observed against raw MetUM maximum gusts for stations across Europe. As an example, the recalibrated mean (-), 95 % confidence (---) and 95 % prediction (· · · ) intervals based on a station located in London are superimposed. The y = x line is plotted in grey. Column 2 shows the mean recalibrated footprint, column 3 its ratio to the original footprint and columns 4 and 5 the 2.5 and 97.5 % prediction bounds, respectively.
the locations of the stations with altitude ≤ 500 m which recorded gusts for this storm. The geostrophic winds corresponding to the locations of the stations which recorded gusts > 25 m s −1 are highlighted in green. This plot shows that the model tends to underpredict geostrophic winds above approximately 40 m s −1 , and that many of the locations of the underpredicted geostrophic winds correspond to locations where gusts > 25 m s −1 were recorded. For comparison Fig. 10h shows the maximum model geostrophic winds against maximum observed geostrophic winds for Jeanette, where, unlike for Anatol, the model reproduces the tight pressure gradients and high geostrophic winds. We conclude that the underestimation of strong gusts (> 25 m s −1 ) apparent in some storms can be due to several mechanisms, including the underestimation of convective effects and strong pressure gradients. It would not make sense to apply a "universal" correction to all storms, since the problem varies from storm to storm. The recalibration method described below (Sect. 5) takes into account storm-to-storm variation.

Footprint recalibration
This section introduces a statistical method for "recalibrating" windstorm footprints, where recalibration describes estimating the true distribution of wind gusts, given the 0.22 • ∂P /∂y in the usual way. The maximum geostrophic winds for both model and observations were estimated by taking the maximum of the 6-hourly instantaneous geostrophic winds.
MetUM output. The proposed method is based on polynomial regression between transformed gust speeds: the response variable represents station observations and the explanatory variable the MetUM output. All station data within the footprint's domain are used, ranging between storms from 154 to 1224 stations, depending on data availability. Gusts above 20 m s −1 are recalibrated. Where MetUM gusts do not exceed 20 m s −1 , the recalibrated footprint uses the original MetUM output. By assuming that the observations are representative of the true gusts, the regression relationship gives an estimate of the distribution of true gusts given the MetUM's output. A random effects model (Pinheiro and Bates, 2000) is used to allow multiple windstorm footprints to be recalibrated simultaneously, which is achieved by associating a separate random effect with each storm. This model is based on an underlying polynomial relationship between observed and MetUM-simulated gusts, from which storm-specific relationships deviate according to some distributional assumptions and location-specific covariates. The random effects capture unmodelled differences between storms, one example being whether a storm has a sting jet (Browning, 2004). Not only does this allow a specific storm's footprint to be recalibrated, but storms without observational data can too, by integrating out the random effects, though this latter feature is not utilised here.

Statistical model specification
The notation adopted is that Y j (s) is the observed maximum gust for storm j , j = 1, . . . , J , at location s, and X j (s) is the corresponding MetUM output, noting that only X j (s) > 20 m s −1 are modelled. Gusts are log-transformed. The random effects model then has the formulation log Y j (s) ∼ N m j (log X j (s), z(s)), σ 2 , where z(s) is a vector of known covariates for location s, and σ 2 is a variance parameter. This assumes that for storm j , log observed gusts are normally distributed with mean m j , which is a function of MetUM gust, location and elevation, and variance σ 2 . The mean, m j (log X j (s), z(s)), has the linear form where (b j,0 , b j,1 , b j,2 ) T ∼ MVN (0, 0, 0) T , b (where MVN means multi-variate normal distribution), (c j,0 , c j,1 , c j,2 ) T ∼ MVN (0, . . . , 0) T , c , β 0 , β 1 , β 2 , γ 0 , γ 1 and γ 2 are regression coefficients and b and c are covariance matrices. Maximum likelihood is used to estimate β 0 , β 1 , β 2 , γ 0 , γ 1 , γ 2 , σ 2 , b , and γ .
Let z T (s) = (elevation(s), lon(s), lat(s), lon(s)lat(s)) (where lon(s) and lat(s) represent standardised longitudes and latitudes with mean zero and unit variance, respectively), so that γ k = (γ elev,k , γ lon,k , γ lat,k , γ lon:lat,k ) T for k = 0, 1, 2. This formulation allows the mean relationship to vary with elevation and location in a sufficiently robust way. Various combinations of the included covariates were tested, though those used in the presented model were found to perform best based on the Akaike Information Criterion. However, more complex relationships could be captured with covariates related to pressure fields or coastal proximity. Due to insufficient data, and the desire for parsimony, these were not tested here.
Parameter estimates (excluding those of b and c ) are shown in Table 3 together with standard errors. Figure 11 shows the resulting recalibrated footprints for the storms Jeanette and Kyrill. Column 1 of Fig. 11 shows that the recalibrated gusts are more consistent with the observations than originally simulated by the MetUM, which are in general negatively biased (column 3), though predictions are accompanied by relatively large uncertainty (prediction intervals, column 1; columns 4 and 5). The example mean relationships, for a station located in London, between MetUM and observed gust plotted in column 1 of Fig. 11 (solid lines), show that for the storm Kyrill, where the MetUM gusts were significantly underestimated, the mean increases above the y = x line for MetUM gusts of ∼ 25 m s −1 , so recalibration results in an increase in gust speed. For Jeanette the MetUM gusts compared better to observations, so the mean lies close to the y = x line and even shows a slight decrease for high MetUM gusts. This shows the importance of including storm-to-storm variation when recalibrating footprints.
The choice of threshold above which to recalibrate the MetUM's gusts is arbitrary; 20 m s −1 was chosen here as it retained sufficient data to give a reliable statistical model, while ensuring that gusts were "extreme". To improve consistency between the raw and recalibrated footprints at the 20 m s −1 threshold, non-exceedances are also used in model estimation, but downweighted exponentially according to the deficit between MetUM-simulated gusts and 20 m s −1 . However, little appreciable difference in predictions was found for thresholds in the range 15-25 m s −1 .

Conclusions
We have compiled a catalogue of 50 of the most extreme winter storms to have hit Europe over the period October-March 1979-2012, available at www.europeanwindstorms. org. The catalogue gives tracks, model-generated maximum 3 s gust footprints and recalibrated footprints for each storm.
The tracking algorithm used was that of Hodges (1995Hodges ( , 1999, which identified 5730 storms in the catalogue period. To select the storms for the catalogue several meteorological indices were investigated. It was found that the index S ft , which depends on both storm area and intensity, was the most successful at characterising 23 severe storms highlighted by the insurance industry. The 50 storms chosen for the catalogue are the 23 severe storms plus the top 27 other storms as ranked by S ft . Using an index with a relative threshold would result in more Mediterranean storms being selected, which are not the focus of this catalogue. The severe storms ranked highly (in the top 18 %) in all meteorological indices investigated. The choice of index is sensitive to the given list of severe storms, which may be biased or incomplete. If loss data were available for many storms this may improve the comparison of the indices.
The model used to generate the storm footprints is the Met Office Unified Model (MetUM) at 0.22 • resolution. The MetUM footprints compare reasonably well to observations, although for some storms the highest gusts are underestimated. Reasons for this include the model not representing convective gusts and underestimating strong pressure gradients. The latter is possibly an effect of the western domain boundary being close to continental Europe. In addition, increasing the reinitialisation frequency (currently every 24 h) may also reduce model biases, although this would increase computational expense.
The MetUM footprints have large errors for gusts at altitudes greater than 500 m due to the orographic drag scheme. A correction can be applied for this, but it has not been applied in this version of the catalogue.
A new recalibration method was developed to correct for the underestimation of high gusts. The method allows for storm-to-storm variation. This is necessary because not all storms suffer the same biases. The method gives an estimate of the true distribution of gusts at each MetUM grid point, therefore also quantifying the uncertainty in gusts.
We intend to update the catalogue yearly to include recent events. Possible future plans include extending the catalogue back in time by performing tracking and downscaling to the 20th century reanalysis data set (Compo et al., 2011), and including tracks and footprints derived from different tracking algorithms and atmospheric models. Further improvements to the recalibration include recognition of spatial features of the windstorms, using Gaussian process kriging methods, and using high-resolution altitude data as a way to statistically downscale the footprints.