the Creative Commons Attribution 4.0 License.
Special issue: Coastal hazards and hydrometeorological extremes
Research article 14 Feb 2022
Research article  14 Feb 2022
Statistical estimation of spatial wave extremes for tropical cyclones from small data samples: validation of the STME approach using longterm synthetic cyclone data for the Caribbean Sea
 ^{1}Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
 ^{2}BRGM, Orleans, France
 ^{3}SHOM, DOPS/HOM/REC, Toulouse, France
 ^{4}Shell Research Limited, London SE1 7NA, United Kingdom
 ^{5}Department of Mathematics and Statistics, Lancaster University LA1 4YF, United Kingdom
 ^{1}Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
 ^{2}BRGM, Orleans, France
 ^{3}SHOM, DOPS/HOM/REC, Toulouse, France
 ^{4}Shell Research Limited, London SE1 7NA, United Kingdom
 ^{5}Department of Mathematics and Statistics, Lancaster University LA1 4YF, United Kingdom
Correspondence: Ryota Wada (r_wada@k.utokyo.ac.jp)
Hide author detailsCorrespondence: Ryota Wada (r_wada@k.utokyo.ac.jp)
Occurrences of tropical cyclones at a location are rare, and for many locations, only short periods of observations or hindcasts are available. Hence, estimation of return values (corresponding to a period considerably longer than that for which data are available) for cycloneinduced significant wave height (SWH) from small samples is challenging. The STME (spacetime maximum and exposure) model was developed to provide reduced bias in estimates of return values compared to competitor approaches in such situations and realistic estimates of return value uncertainty. STME exploits data from a spatial neighbourhood satisfying certain conditions, rather than data from a single location, for return value estimation.
This article provides critical assessment of the STME model for tropical cyclones in the Caribbean Sea near Guadeloupe for which a large database of synthetic cyclones is available, corresponding to more than 3000 years of observation. Results indicate that STME yields values for the 500year return value of SWH and its variability, estimated from 200 years of cyclone data, consistent with direct empirical estimates obtained by sampling 500 years of data from the full synthetic cyclone database; similar results were found for estimation of the 100year return value from samples corresponding to approximately 50 years of data. In general, STME also provides reduced bias and more realistic uncertainty estimates for return values relative to singlelocation analysis.
Tropical cyclones (also named hurricanes or typhoons depending on the region of interest) are one of the deadliest and most devastating natural hazards that can significantly impact lives, economies, and the environment in coastal areas. In 2005, hurricane Katrina, which hit New Orleans, was the most costly natural disaster of all time for the insurance sector, with losses totalling more than USD 10^{11} (Barbier, 2015). In 2017, hurricanes Harvey, Irma, and Maria caused record losses within just 4 weeks totalling more than USD 9×10^{10} (https://www.munichre.com/en/risks/naturaldisasterslossesaretrendingupwards/hurricanestyphoonscyclones.html#1979426458, last access: 1 February 2022). Tropical cyclones present multiple hazards, including large damaging winds, high waves, storm surges, and heavy rainfall, as exemplified by Typhoon Hagibis in Japan (see context description in Dasgupta et al., 2020) or Cyclone Idai in Mozambique in 2019 (https://data.jrc.ec.europa.eu/dataset/4f8c752b34404e61a48d4d1d9311abfa, last access: 1 February 2022).
Waves are one of the major hazards associated with tropical cyclones, of critical importance regarding marine flooding, especially for volcanic islands like those in the Lesser Antilles, North Atlantic Ocean basin (Krien et al., 2015); Hawaii, northeast Pacific Ocean basin (Kennedy et al., 2012); or Réunion, southwest Indian Ocean basin (Lecacheux et al., 2021). Here, the absence of a continental shelf and the steep coastal slopes limit the generation of high atmospheric storm surge but increase the potential impact of incoming waves. Moreover, windwaves propagate with little loss of energy over the deep ocean: this might potentially increase the spatial extent as well as time duration over which damaging coastal impacts occur during a tropical cyclone event (Merrifield et al., 2014); this contrasts with tropicalcycloneinduced storm surge, which tends to be concentrated in the vicinity of the cyclone centre.
To help decision makers in diverse fields such as wastewater management, transport and infrastructure, health, coastal zone management, and insurance, one key ingredient is the availability of data for the frequencies and magnitudes of coastal extremecycloneinduced significant wave heights (SWHs), e.g. estimates of 100year return values (e.g. as illustrated for Réunion by Lecacheux et al., 2012: Fig. 4). Yet, for many locations, only short periods of observations or hindcasts of tropical cyclones are available, which can be challenging for estimation of return values (corresponding to a period considerably longer than that for which data are available). For this purpose, a widely used approach relies on the combination of synthetic cyclone track generation, wave modelling, and extreme value analysis. The approach consists in the following steps.

Tropical cyclones, extracted from either historical data (Knapp et al., 2010) or climate model simulations (Lin et al., 2012), are statistically resampled and modelled to generate synthetic but realistic tropical cyclone records. Based on a Monte Carlo approach (Emanuel et al., 2006; Vickery et al., 2000; Bloemendaal et al., 2020), a tropical cyclone data set with the same statistical characteristics as the input data set, but spanning hundreds to thousands of years, can then be generated.

For each synthetic cyclone, a hydrodynamic numerical model is used to compute the corresponding SWH or surge level over the whole domain of interest. An example of such a simulator is the Global Tide and Surge Model of Bloemendaal et al. (2019).

SWH values at the desired coastal locations are extracted. Extreme value analysis (Coles, 2001) can then be used to estimate the corresponding return values. As an illustration of the whole procedure, one can refer to the probabilistic hurricaneinduced storm surge hazard assessment (including wave effects) performed by Krien et al. (2015) at the Guadeloupe archipelago, Lesser Antilles.
Implementation of steps (1) and (2) can however be problematic. Generation of synthetic cyclones with realistic characteristics is a research topic in itself. Further, the hydrodynamic numerical model can be prohibitively costly to execute, limiting the number of model runs feasible, resulting in sparse, nonrepresentative data for extreme value modelling. To overcome this computational burden, possible solutions can either be based on parametric analytical models (like the ones used by Stephens and Ramsay, 2014, in the southwest Pacific Ocean) or on statistical predictive models (sometimes called meta or surrogate models; NadalCaraballo et al., 2020). However, such approaches can only be considered “approximations”. The former parametric analytical models introduce simplifying assumptions regarding the physical processes involved. Statistical estimation is problematic, since inferences must be made concerning extreme quantiles of the distribution of quantities such as SWH, using a limited set of data.
Objective and layout
In the present work, we aim to tackle the problem of realistic return value estimation for small samples of tropical cyclones using a recently developed procedure named STME, which has already been successfully applied in regions exposed to tropical cyclones near Japan (Wada et al., 2018) and in the Gulf of Mexico (Wada et al., 2020). STME exploits all cyclone data drawn from a specific geographical region of interest, provided that certain modelling conditions are not violated by the data. This means in principle that STME provides less uncertain estimates of return values than statistical analysis of cyclone data at a single location. To date however, the STME methodology has not been directly validated: the objective of the present work is therefore to provide direct validation of return values (in terms of bias and variance characteristics, for return periods T of hundreds of years) from STME analysis using sample data for modelling corresponding to a much shorter period T_{0} (<T) of observation, drawn from a full synthetic cyclone database corresponding to a very long period T_{L} (T_{L}>T) of observation.
In the following sections, we present a motivating application in the region of the Caribbean archipelago of Guadeloupe, for which synthetic cyclone data are available for a period T_{L} corresponding to more that 3000 years. We use the STME method to estimate the T=500year return value for SWH, and its uncertainty, based on random samples of tropical cyclones corresponding to T_{0}=200 years of observation. This case will assess the performance of STME when reasonable sample sizes of extreme values are available for inference. In addition, we conduct the corresponding estimation for the T=100year return value for SWH, and its uncertainty, based on random samples of tropical cyclones corresponding to T_{0}=50 years of observation. This case is to assess the performance of STME under practical conditions, i.e. when the size of the sample of extreme data for analysis is relatively small. We compare estimates with empirical maxima from random samples corresponding to T years of observation from the full synthetic cyclone data (covering T_{L} years), and from standard extreme value estimates obtained using data (corresponding to T_{0} years) from the specific location of interest only. Section 2 provides an outline of the motivating application. Section 3 describes the STME methodology. Section 4 presents the results of the application of STME to the region of the main island pair (BasseTerre and GrandeTerre) of Guadeloupe. Discussion and conclusions are provided in Sect. 5.
The study area is located in a region of the Lesser Antilles (eastern Caribbean Sea) that is particularly exposed to cyclone risks (Jevrejeva et al., 2020), with several thousand fatalities reported since 1900 (http://www.emdat.be, last access: 1 February 2022). We focus on the French overseas region of Guadeloupe, which is an archipelago located in the southern part of the Leeward Islands (see Fig. 1).
This French overseas region has been impacted by several devastating cyclones in the past, including the 1776 event (category 5 according to the Saffir–Simpson scale; Simpson and Saffir, 1974) which led to >6000 fatalities (Zahibo et al., 2007), and the “Great Hurricane” of 1928 (Desarthe, 2015) with >1200 fatalities; the latter was probably the most destructive tropical cyclone of the 20th century. More recent destructive events include Hugo (in 1989; KoussoulaBonneton, 1994) and Maria (in 2017, which severely impacted Guadeloupe's banana plantations). The tracks of both Hugo and Maria are illustrated in Fig. 1. Analysis of the HURDAT database (Landsea and Franklin, 2013) reveals that approximately 0.6 cyclones per year passed within 400 km of the study area on average for the period 1970–2019. Almost all events emanated from the southeast. More than 80 % of the events passed close to the northern and eastern coasts of Guadeloupe's main island pair.
To assess the cycloneinduced storm surge hazard, Krien et al. (2015) set up a modelling chain similar to that described in the introduction: they randomly generate cyclonic events using the approach of Emanuel et al. (2006) and compute SWH and total water levels for each event over a wide computational domain (9.5–18.3^{∘} N, 45–65^{∘} W) using the ADCIRC–SWAN wave–current coupled numerical model. Here, the wind drag formula from Wu (1982) was selected, but with a prescribed maximum value of Cd=0.0035. The interested reader can refer to Krien et al. (2015) for more implementation and validation detail. The resulting SWH data are the basis of the current study to assess the performance of STME in estimating the Tyear return value, from data corresponding to T_{0} years of observation, for the cases T_{0}=200, T=500 and T_{0}=50, T=100.
In the present work, we use a total of 1971 synthetic cyclones passing nearby Guadeloupe (representative of 3200 years, i.e. about 0.6 cyclone per annum) and the corresponding numerically calculated SWH. These results are used to empirically derive the 100 and 500year SWH around the coast of Guadeloupe's main island pair for a smaller area of interest (15.8–16.6^{∘} N, 60.8–62.0^{∘} W; see Fig. 2). These results are useful to assess flood risk at local scale, since they provide inputs of highresolution hydrodynamic simulations (see for example the use of wave overtopping simulations at Réunion by Lecacheux et al., 2021). In the following, we analyse extreme SWH at 19 coastal locations around Guadeloupe's main island pair (on the 100 m isodepth contour; see blue stars in Fig. 2) and at 12 locations along a line transect emanating to the northeast of the island, corresponding to increasing water depth (see red triangles in Fig. 2).
To illustrate the SWH data, Fig. 3 depicts the spatial distributions of maximum SWH per location for the four cyclones with the largest single values of SWH in the whole synthetic cyclone database. All cyclones propagate from the southeast to the northwest with intense storm severity near the cyclone track, which reduces quickly away from the track.
In this section we describe the STME methodology used to estimate return values in the current work. Section 3.1 motivates the STME approach, and Sect. 3.2 outlines the modelling procedure. Section 3.3 provides a discussion of some of the diagnostic tests performed to ensure that STME modelling assumptions are satisfied.
3.1 Motivating the STME model
The STME procedure has been described in Wada et al. (2018, 2020). The approach is intended to provide straightforward estimation of extreme environments over a spatial region, from a relatively small sample of rare events such as cyclones, the effects of any one of which do not typically influence the whole region. For each cyclone event, the spacetime characteristics of the event are summarized using two quantities, the spacetime maximum (STM) of the cyclone and the spatial exposure (E) of each location in the region to the event. For any cyclone, the STM is defined as the largest value of SWH observed anywhere in the spatial region for the time period of the cyclone. The location exposure E is defined as the largest value of SWH observed at that location during the time period of the cyclone, expressed as a fraction of STM; thus values of E are in the interval [0, 1].
The key modelling assumptions are then that (i) the future characteristics of STM and E over the region will be the same as those of STM and E during the period of observation, and (ii) in the future, at any location, it is valid to associate any simulated realization of STM (under an extreme value model based on historical STM data) with any realization of E (under a model for the distribution of E based on historical exposure data).
3.2 STME procedure
The steps of the modelling procedure are now described. The first three steps of the procedure involve isolation of data for analysis. (a) An appropriate region of ocean is selected. The characteristics of this region need to be such that the underpinning conditions of the STME approach are satisfied (as discussed further in Sect. 3.3). (b) For each tropical cyclone event occurring in the region, the largest value of SWH observed anywhere in the region for the period of the cyclone (STM) is retained. (c) Per location in the region, the largest value of SWH observed during the period of the cyclone, expressed as a fraction of STM, is retained as the location exposure E to the cyclone.
The next three steps of the analysis involve statistical modelling and simulation. (d) First, an extreme value model is estimated using the largest values from the sample of STM; typically, a generalized Pareto distribution (see for example Coles, 2001) is assumed. Then a model for the distribution of location exposure E is sought; typically we simply resample at random with replacement from the values of historical exposures for the location, although model fitting is also possible. (e) Next, realizations of random occurrences of STM from (d), each combined with a randomly sampled exposure E per location, permit estimation of the spatial distribution of SWH corresponding to return periods of arbitrary length. (f) Finally, diagnostic tools are used to confirm the consistency of simulations (e) under the model with historical cyclone characteristics.
3.3 Diagnostics for STME modelling assumptions
The success of the current approach critically relies on our ability to show that simplifying assumptions regarding the characteristics of STM and exposure are justified for the data at hand. In particular, the approach assumes that (i) the distribution STM does not depend on cyclone track, environmental covariates, space, and time, and (ii) the distribution of exposure per location does not depend on STM, cyclone track, environmental covariates, and time. Diagnostic tests are undertaken to examine the plausibility of these conditions for the region of ocean of interest for each application undertaken. Establishing the validity of the STME conditions is vital for credible estimation of return values. Section 5 of Wada et al. (2018) provides a detailed discussion of diagnostic tests that should be considered to judge that the STME conditions are not violated in any particular application. For example, the absence of a spatial trend in STM over the region can be assessed by quantifying the size of linear trends in STM along transects with arbitrary orientation in the region. This is then compared with a “null” distribution for linear trend, estimated using random permutations of the STM values. Illustrations of some of the diagnostic tests performed for the current analysis are given in Sect. 4 below.
Return value estimates from STME are also potentially sensitive to the choice of region for analysis. We assume that the extremal behaviour of STM can be considered homogeneous in the region, suggesting that the region should be sufficiently small that the same physics is active throughout it. However, the region also needs to contain sufficient evidence for cyclone events and their characteristics to allow reasonable estimation of tails of distributions for SWH per location. The absence of dependence between STM and E per location can be assessed by calculating the rank correlation between STM (S, a spacetime maximum) and exposure (E_{j}, at location j, for locations j=1, 2, …, p) using Kendall's τ statistic. If the values of S and E_{j} increase together, the value of Kendall's τ statistic will be near unity. If there is no particular relationship between S and E_{j}, the value of Kendall's τ will be near zero. For large n, if S and E_{j} are independent, the value of Kendall's τ is approximately Gaussiandistributed with zero mean and known variance, providing a means of identifying unusual values which may indicate dependence between S and E_{j}. An illustrative spatial plot of Kendall's τ is given in Sect. 4.
Finally, estimates from STME are potentially sensitive to the extreme value threshold ψ_{n} (or equivalently the sample size n of the largest observations of STM) chosen to estimate the tail of the distribution of STM over the region. Results in Sect. 4 are reported for a number of choices of n for this reason.
3.4 Modelling STM and estimating return values
Suppose we have isolated a set of n_{0} values of STM using the procedure above. We use the largest n≤n_{0} values $\mathit{\{}{s}_{i}{\mathit{\}}}_{i=\mathrm{1}}^{n}$, corresponding to exceedances of threshold ψ_{n}, to estimate a generalized Pareto model for STM, with the probability density function
with shape parameter ξ∈ℝ and scale parameter σ_{n}>0. Choice of n is important to ensure reasonable model fit and biasvariance tradeoff. The estimated value of ξ should be approximately constant as a function of n for sufficiently large ψ_{n} and hence small n. The full distribution F_{S}(s) of STM can then be estimated using
where ${F}_{n}^{*}\left(s\right)$ is an empirical “counting” estimate below threshold ψ_{n}, and τ_{n} is the nonexceedance probability corresponding the ψ_{n}, again estimated empirically.
Using this model, we can simulate future values H_{j} of SWH at any location j (j=1, 2, …, p) in the region, relatively straightforwardly. Suppose that E_{j} is the location exposure at location j, and ${F}_{{\mathrm{E}}_{j}}$ is its cumulative distribution function, estimated empirically. Since then ${H}_{j}={E}_{j}\times S$, the cumulative distribution function of H_{j} can be estimated using
where f_{S}(s) is the probability density function of STM, corresponding to cumulative distribution function F_{S}(s) estimated in Eq. (2).
The STME methodology outlined in Sect. 3 is applied to data for the neighbourhood of Guadeloupe's main island pair described in Sect. 2. The objective of the analysis is to estimate the Tyear return value for SWH from T_{0} years of data – for (T_{0}, T) pairs (200, 500) and (50, 100). First, details of the setup of the STME analysis are provided in Sect. 4.1. Then, in Sect. 4.2, we describe two competitor methods included for comparison with STME. Section 4.3 then describes estimates for the 500year return value on the 100 m isodepth contour around Guadeloupe's main island pair and the line transect introduced in Sect. 2, illustrated in Fig. 2, using maximum likelihood estimation (see for example Hosking and Wallis, 1987; Davison, 2003). For comparison, Sect. 4.4 then provides estimates obtained using probabilityweighted moments (see for example Furrer and Naveau, 2007; de Zea Bermudez and Kotz, 2010a, b). Section 4.5 describes some of the diagnostic tests undertaken to confirm that the fitted model is reasonable. Section 4.6 outlines the inference for the T=100year return value from data corresponding to T_{0}=50 years.
4.1 Details of STME application
The spatial region of interest is the neighbourhood of Guadeloupe's main island pair in the Caribbean Sea, corresponding to the approximate longitudes 12–18^{∘} N and latitudes 58–65^{∘} W (see Fig. 1). An initial analysis using Kendall's τ suggests the full region (9.5–18.3^{∘} N, 45–65^{∘} W) shows dependency of STM and exposure, with stronger cyclones tending to pass through the western part of the region. However, if a very high threshold ψ≈20 m were to be selected for analysis, reasonable decoupling of STM and E could be achieved, with relatively less intense tropical cyclones neglected. Since the focus of the current work is the ocean environment of the Guadeloupe archipelago, a smaller region (see Fig. 1, right panel) was defined. For this region, Kendall's τ indicated low dependence between STM and E for thresholds ψ of 10 m and above, as illustrated in the left panel of Fig. 4.
The right panel of Fig. 4 shows the location and magnitude of STM for each of the n=60 largest cyclones observed in the region. There is no obvious spatial dependence between the size of STM and its location. In Sect. 3.3, we discuss the use of rank correlation of STM along latitude–longitude transects as a means to quantify dependence in general. In fact, the Kendall's τ analysis illustrated in the left panel would also indicate any strong spatial dependence in STM; therefore, results of the rank correlation analysis along latitude–longitude transects are not presented. We conclude that Fig. 4 does not suggest that the modelling assumptions underlying STME are not satisfied.
The relatively large number of boundary STM values reflects occurrences of cyclones, the true STM locations of which occur outside the analysis region. For these events, the value of STM used for analysis is the largest value of SWH observed within the analysis region. In this sense, we are performing the STME analysis conditional on the choice of region. For example, consider a cyclone for which the location of the STM value s^{*} falls outside the region of interest. Then the conditional STM value s for the cyclone (within the region) will of course be smaller than s^{*}; however, the cyclone's conditional exposure (assessed relative to the conditional STM s for the region, rather than the full STM s^{*}) will consequently be larger.
Specific interest lies in the variation in the extreme return value around Guadeloupe and the rate of increase in return value with increasing water depth away from the coasts. It is known that SWH at a location is dependent on water depth, bathymetry, and coastlines, since ocean waves in shallow water for example are influenced by bottom effects, and since both wind and wave propagation can be weakened in the vicinity of coastlines. For this reason, two sets of locations were adopted for the detailed analysis reported here. The first set corresponds to 19 locations on an approximately isodepth contour at 100 m depth around the main island pair of Guadeloupe. This depth value is typically chosen to define the boundaries of the localscale highresolution flooding simulations. The second set corresponds to 12 locations on a line transect emerging approximately normally from the northeast of the main island pair of Guadeloupe. We focus on the northeast neighbourhood, because it has the highest exposure to cyclones. The contour and transect are illustrated in Fig. 2, and location numbers are listed.
The focus of the analysis is estimation of the T=500year return value for SWH on the isodepth contour and line transect, based on T_{0}=200 years of data. To quantify the uncertainty in the 500year return value using STME analysis, the following procedure was adopted. (a) Randomly select the appropriate number of cyclones (corresponding to T_{0} years of observation) from the T_{L} years of synthetic cyclones. (b) Identify the largest n values of STM in the sample, for n=20, 30, 40, 50, and 60 (corresponding to lowering the extreme value threshold). (c) Estimate a model for the distribution of STM using maximum likelihood estimation or the method of probabilityweighted moments. (d) Estimate the empirical distribution of exposure E per location on the isodepth contour and line transect. (e) Estimate the 500year return value as the quantile of the distribution ${F}_{{\mathrm{H}}_{j}}$ of significant wave height at location j with nonexceedance probability $\mathrm{1}({T}_{\mathrm{0}}/n)/T$. Finally, the whole procedure (a)–(e) is repeated 100 times to quantify the uncertainty in the Tyear return value.
Figure 5 illustrates the tails of the distribution of STM from the largest 30 values of STM from each of 100 random samples corresponding to 200 years, and from the full sample of synthetic cyclones. It can be seen that the 500year return value for STM lies in the region (20, 30) m.
4.2 Benchmarking against the full cyclone database and singlelocation analysis
One obvious feature of the current synthetic cyclone database is that it corresponds to a long time period relative of T_{F}=3200 years, much longer than the return period of T=500 years being estimated in the current analysis. Thus, we are able to estimate the 500year return value at any location using the full synthetic cyclone data, by simply interpolating the sixth and seventh largest values, corresponding to the nonexceedance probability in 500 years. This provides a direct empirical estimate.
From previous work, a key advantage found using the STME approach is that it provides less uncertain estimates at a location compared with conventional singlelocation analysis. We wish to demonstrate in the current work that this is also the case. For this reason, we also calculate estimates for comparison with those from STME, based on independent analysis of cyclone data from each location of interest.
4.3 Maximum likelihood estimation
Figure 6 illustrates the 500year return value for SWH using maximum likelihood estimation for locations on the 100 m isodepth contour around Guadeloupe's main island pair, with location numbers given in Fig. 2. The figure caption gives relevant details of the figure layout. Across the 19 locations considered, the 500year return value is estimated using STME (blue), singlelocation (red), and full synthetic cyclone data (black); in general, there is good agreement between estimates per location. Bias characteristics for singlelocation and STME estimates are relatively similar in general. It can be seen however (from the longer red whiskers) that the uncertainty in singlelocation estimates is greater in general from the corresponding STME estimates.
As the number of points used for STME estimation per location increases, there is evidence for reduction in the uncertainty with which the return value is estimated, as might be expected. However, there is also some evidence for a small increase in the mean estimated return value. This is explored further in Sect. 4.5. There is very little corresponding evidence for reduced uncertainty in the singlelocation analysis. There are more outlying estimates of return value for singlelocation analysis than for STME.
The corresponding results for the line transect analysis using maximum likelihood estimation is given in Fig. 7. The general characteristics of this figure are similar to those of Fig. 6. The return value increases as would be expected with increasing water depth. Singlepoint estimates are more variable that those from STME. Biases appear to be relatively small and similar for STME and singlelocation estimates. There is little evidence that the STME median estimate increases with increasing sample size. We infer from the analysis that water depth has little effect on the performance of the STME approach.
4.4 Results estimated using probabilityweighted moments
Estimates for the 500year return value on the isodepth contour, obtained using the method of probabilityweighted moments, are shown in Fig. 8. The behaviour of STME and singlelocation estimates shown is very similar to that illustrated for maximum likelihood estimation in Fig. 6.
Results for the line transect using probabilityweighted moments are given in Fig. 9; again, the figure shows similar trends to Fig. 7. There is some evidence that the STME median estimate increases with increasing sample size and that this reduces bias.
4.5 Assessment of model performance
Comparing box–whisker plots from centre to left for each location in the figures in Sect. 4.3 and 4.4 suggests that there is sometimes a small increasing trend in return value estimates from STME as a function of increasing sample size for inference. We investigate the trend further here. Figure 10 gives estimates for the 500year return value of spacetime maximum STM (as opposed to the full STME estimate for SWH) as a function of sample size used for estimation, using maximum likelihood estimation (blue) and probabilityweighted moments (red). Also shown in black is the empirical estimate of the 500year STM return value obtained directly from the synthetic cyclone data. The figure shows a number of interesting effects. Firstly, STM estimates from both maximum likelihood and the probabilityweighted moments increase with increasing sample size n, and this effect is more pronounced for probabilityweighted moments. As a result, the bias of estimates using probabilityweighted moments is considerably larger than that from maximum likelihood estimation for sample size of 60. The uncertainty of estimates from probabilityweighted moments is also somewhat larger than that from STME.
A number of studies in the literature compare the performance of different methods of estimation of extreme value models. The method of probabilityweighted moments is known to perform relatively well relative to maximum likelihood estimation for small samples (see for example Jonathan et al., 2021, Sect. 7 for a discussion). For small samples, for example, maximum likelihood estimation is known to underestimate the generalized Pareto shape parameter and overestimate the corresponding scale parameter, leading to bias in return value estimates. The results in Fig. 10 indicate that, if anything, maximum likelihood estimation performs somewhat better than the method of probabilityweighted moments for the current application. Regardless, the trends in Fig. 10 serve to illustrate the importance of performing the STM extreme value analysis with great care, particularly for small sample sizes.
One of the assumptions underpinning the STME approach is that the exposure distribution at a location is not dependent on the magnitude of STM. We investigate this further here. Our aim is to show that the empirical cumulative distribution function for exposure (henceforth ECDF for brevity) corresponding to the largest and smallest STMs is typical of ECDFs in general and are in no way special relative to ECDFs corresponding to other cyclones. We can quantify the difference between two ECDFs using the Kullback–Leibler divergence (KL) (Liese and Vajda, 2006). We proceed to estimate the “null” distribution of KL using 1000 sets of randomly selected pairs of ECDFs. In addition, we calculate the Kullback–Leibler divergence KL^{*} for the pair of ECDFs corresponding to cyclones with the largest and smallest STMs. If there is no dependence of ECDF on STM, then the value of KL^{*} should correspond to a random draw from the null distribution of KL. The lefthand panel of Fig. 11 illustrates the null distribution of KL at Location 21, for a sample size of 20, together with the corresponding value of KL^{*}. We note that the value of KL^{*} is not extreme in the null distribution. In the righthand panel of Fig. 11, the empirical cumulative distribution function of the nonexceedance probability of KL^{*} (in the corresponding null distribution) is estimated over all locations and sample sizes. The approximate uniform density found suggests indeed that KL^{*} corresponds to a random draw from the null distribution; a Kolmogorov–Smirnov test on the data suggested that it was not significantly different to a random sample from a uniform distribution on [0, 1].
Complementary analyses (not shown) evaluated KL^{*} for ECDFs corresponding to the largest two STMs in the data and (separately) for ECDFs corresponding to the smallest two STMs. Results again indicated that both of these choices for KL^{*} could be viewed as random in their null distributions. Since exposure distribution at a location is not dependent on the magnitude of STM, we assume the overall performance of STME is mainly governed by the estimation of STM. In shallow water, where waves are subject to breaking, it would be expected that the exposure distribution would be dependent on STM, and therefore the validity of the method should be more carefully checked using the approaches described in Sect. 3.3.
4.6 Model performance for smaller sample sizes
Here we repeat the analysis in Sect. 4.1–4.5 above for the T=100year return value for SWH on the isodepth contour and line transect, based on T_{0}=50 years of data. The typical number of tropical cyclone events occurring in 50 years is approximately 30, already corresponding to a very small sample size for extreme value analysis. We retain the largest n values of STM in the sample, for n=10, 15, and 20 for this analysis. The overall performance of STME estimates, relative to those from singlelocation analysis and an empirical estimate from the full synthetic cyclone data, is summarized in Fig. 12 for the line transect, using the method of probabilityweighted moments (and see also Table 2 in the next section for a summary including estimates using maximum likelihood). The figure's features are similar to those of figures discussed earlier. Estimates from STME show lower bias and reduced uncertainty relative to those from singlelocation analysis. There is slight underestimation of the return value, but the empirical estimate sits comfortably within the 25 %–75 % uncertainty band (corresponding to the “box” interval). The corresponding plots (not shown) for the isodepth contour, and for estimation using maximum likelihood, are similar.
This work considers the estimation of Tyear return values for SWH over a geographic region, from small sets of T_{0} years of synthetic tropical cyclone data, using the STME (spacetime maxima and exposure) methodology. We assess the methodology by comparing estimates of the Tyear return value (T>T_{0}) for locations in the region from STME with those estimated directly from a large database corresponding to T_{L} (>T) years of synthetic cyclones. We find that STME provides estimates of the T=500year return value from T_{0}=200 years of data in the region of the Guadeloupe archipelago with low bias. We also compare STME estimates of Tyear return values for locations in the region with those obtained by extreme value analysis of data (for T_{0} years) at individual locations. We find that the uncertainty of STME estimates is lower than that of singlelocation estimates. Comparison of the performance of inferences for the T=100year return value from T_{0}=50 also suggests STME outperforms singlelocation analysis.
For reasonable application of the STME approach, it is important that characteristics of tropical cyclones over the region under consideration satisfy a number of conditions. These conditions are shown not to be violated for a region around the Guadeloupe archipelago but that use of the STME method over a larger spatial domain would not be valid (see for example Wada et al., 2019). This demonstrates that selection of an appropriate geographical region for STME analysis is critical to its success. Once such a region is specified, we find that STME provides a simple but principled approach to return value estimation within the region from small samples of tropical cyclone data.
Return value estimates from STM (see for example Fig. 10) show a small increasing bias with increasing sample size for extreme value estimation. However, the resulting bias in full STME return values is small. Corresponding estimates based on singlelocation analysis also show relatively small but increasing negative bias with increasing sample size. In the present work, the tail of the distribution of STM was estimated by fitting a generalized Pareto model, using either maximum likelihood estimation or the method of probabilityweighted moments. Estimates for extreme quantiles of STM using either approach are in good agreement.
Table 1 summarizes the performance of STME and singlelocation analysis in estimation of the bias and uncertainty of the 500year return value, relative to empirical estimates from the full synthetic cyclone data, for analysis sample sizes of n=20, 30, 40, 50, and 60. Bias B(n;𝙼𝚝𝚑) and uncertainty U(n;𝙼𝚝𝚑) are estimated as average characteristics over all $\left\mathcal{L}\right=\mathrm{31}$ locations $\mathrm{\ell}\in \mathcal{L}=\mathit{\{}\mathrm{1},\phantom{\rule{0.125em}{0ex}}\mathrm{2},\phantom{\rule{0.125em}{0ex}}\mathrm{\dots},\phantom{\rule{0.125em}{0ex}}\mathrm{31}\mathit{\}}$ on the isodepth contour and line transect corresponding to the relevant sample size, using the expressions
Here, $\stackrel{\mathrm{\u0303}}{h}(\mathrm{\ell};n,\mathtt{\text{Mth}})$ and ${\stackrel{\mathrm{\u0303}}{h}}_{\mathrm{0}}\left(n\right)$ correspond to the mean 500year return value estimated using sample size n from the inference method Mth
(either maximum likelihood or probabilityweighted moments) and directly from the full synthetic cyclone database; $r(\mathrm{\ell};n,\mathtt{\text{Mth}})$ and r_{0}(n) are the corresponding 50 % uncertainty bands. The table summarizes the findings presented pictorially in Figs. 6–8. In terms of bias, STME and singlelocation estimates underestimate the return value on average. STME is less biased than singlelocation estimates except for sample sizes of 20 and 30 using probabilityweighted moments. STME also provides estimates of the 500year return value with higher precision than the singlelocation analysis.
Table 2 provides the corresponding summary for estimation of the T=100year return value from a sample corresponding to approximately T_{0}=50 years of data. Features are similar to those of Table 1.
An appropriate choice of sample size n for STME analysis is likely to be related to the size n_{0} of the full sample available and the period T_{0} to which the sample corresponds. For example, in the current work for T=500 years, n=20 and n_{0}=124 are approximately equivalent to the largest 15 % of cyclones for the sample period T_{0}=200 year. That is, the smallest cyclone considered in the n=20 STME model has a return period of the order of 10 years. With n=60, we use approximately half the sample for STME analysis, and the smallest cyclone in the STME analysis has a return period of the order of 3 years. In the case T=100 years, T_{0}=50, and n_{0}≈30, we found that STME performance was still reasonable using n=12, 15, and 20.
Inferences from the current work confirm the findings of previous studies (Wada et al., 2018, 2020) that STME provides improved estimates of return values compared to statistical analysis at a single location. From an operational perspective, STME is useful for regions like the southwest Pacific Ocean (McInnes et al., 2014) or Indian Ocean basin (Lecacheux et al., 2012), where cycloneinduced storm wave data are limited. For such locations, STME achieves low bias and higher precision and should be preferred to the singlelocation approach.
MATLAB code for the analysis is provided on GitHub at https://github.com/ygraigarw/STME (Wada et al., 2021).
Sample wave data for the analysis are available on Zenodo at https://doi.org/10.5281/zenodo.4627903 (Krien et al., 2021).
RW, JR, and PJ originally conceived the study. YK and JR conducted cyclone and wave modelling. RW and PJ worked on statistical methodology and modelling. All authors contributed to writing the paper.
The contact author has declared that neither they nor their coauthors have any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the special issue “Coastal hazards and hydrometeorological extremes”. It is not associated with a conference.
Numerical simulations were conducted using the computational resources of the C3I (Centre Commun de Calcul Intensif) in Guadeloupe. Jeremy Rohmer acknowledges the funding of the CaribCoast Interreg project (https://www.interregcaraibes.fr/caribcoast, last access: 1 February 2022). MATLAB code for the analysis is provided on GitHub at Wada et al. (2021). Sample wave data for the analysis are available on Zenodo at Krien et al. (2021).
This research has been supported by Interreg (grant no. 2014TC16RFTN008). Jeremy Rohmer received funding from the CaribCoast Interreg project (https://www.interregcaraibes.fr/caribcoast, last access: 1 February 2022, grant no. 2014TC16RFTN008).
This paper was edited by Joanna Staneva and reviewed by two anonymous referees.
Barbier, E. B.: Policy: Hurricane Katrina's lessons for the world, Nat. News, 524, 285, https://doi.org/10.1038/524285a, 2015. a
Bloemendaal, N., Muis, S., Haarsma, R. J., Verlaan, M., Apecechea, M. I., de Moel, H., Ward, P. J., and Aerts, J. C.: Global modeling of tropical cyclone storm surges using highresolution forecasts, Clim. Dynam., 52, 5031–5044, 2019. a
Bloemendaal, N., Haigh, I. D., de Moel, H., Muis, S., Haarsma, R. J., and Aerts, J. C.: Generation of a global synthetic tropical cyclone hazard dataset using STORM, Scient. Data, 7, 1–12, 2020. a
Coles, S. An introduction to statistical modeling of extreme values, Springer, ISBN 9781849968744 2001 a, b
Dasgupta, R., Basu, M., Kumar, P., Johnson, B. A., Mitra, B. K., Avtar, R., and Shaw, R.: A rapid indicatorbased assessment of foreign resident preparedness in Japan during Typhoon Hagibis, Int. J. Disast. Risk Reduct., 51, 101849, https://doi.org/10.1016/j.ijdrr.2020.101849, 2020. a
Davison, A. C.: Statistical models, Cambridge University Press, Cambridge, UK, ISBN 9780521734493, 2003. a
de Zea Bermudez, P. and Kotz, S.: Parameter estimation of the generalized Pareto distribution – Part I, J. Stat. Plan. Inference, 140, 1353–1373, 2010a. a
de Zea Bermudez, P. and Kotz, S.: Parameter estimation of the generalized Pareto distribution – Part II, J. Stat. Plan. Inference, 140, 1374–1388, 2010b. a
Desarthe, J.: Ouragans et submersions dans les Antilles françaises (XVIIeXXe siècle) – Hurricanes and Stormsurge in French Antilles (17th–20th century), Études caribéennes, https://doi.org/10.4000/etudescaribeennes.7176, 2015. a
Emanuel, K., Ravela, S., Vivant, E., and Risi, C.: A statistical deterministic approach to hurricane risk assessment, B. Am. Meteorol. Soc., 87, 299–314, 2006. a, b
Furrer, R. and Naveau, P.: Probability weighted moments properties for small samples, Stat. Probabil. Lett., 70, 190–195, 2007. a
Hosking, J. R. M. and Wallis, J. R.: Parameter and Quantile Estimation for the Generalized Pareto Distribution, Technometrics, 29, 339–349, 1987. a
Jevrejeva, S., Bricheno, L., Brown, J., Byrne, D., De Dominicis, M., Matthews, A., Rynders, S., Palanisamy, H., and Wolf, J.: Quantifying processes contributing to marine hazards to inform coastal climate resilience assessments, demonstrated for the Caribbean Sea, Nat. Hazards Earth Syst. Sci., 20, 2609–2626, https://doi.org/10.5194/nhess2026092020, 2020. a
Jonathan, P., Randell, D., Wadsworth, J., and Tawn, J.: Uncertainties in return values from extreme value analysis of peaks over threshold using the generalised Pareto distribution, Ocean Eng., 220, 107725, https://doi.org/10.1016/j.oceaneng.2020.107725, 2021. a
Kennedy, A. B., Westerink, J. J., Smith, J. M., Hope, M. E., Hartman, M., Taflanidis, A. A., Tanaka, S., Westerink, H., Cheung, K. F., Smith, T., Hamann, M., Minamide, M., Ota, A., and Dawson, C.: Tropical cyclone inundation potential on the Hawaiian Islands of Oahu and Kauai, Ocean Model., 52, 54–68, 2012. a
Knapp, K. R., Kruk, M. C., Levinson, D. H., Diamond, H. J., and Neumann, C. J.: The international best track archive for climate stewardship (IBTrACS) unifying tropical cyclone data, B. Am. Meteorol. Soc., 91, 363–376, 2010. a
KoussoulaBonneton, A.: Le passage dévastateur d'un ouragan: conséquences socioéconomiques. Le cas du cyclone Hugo en Guadeloupe, La Météorologie, https://doi.org/10.4267/2042/53441, 1994. a
Krien, Y., Dudon, B., Roger, J., and Zahibo, N.: Probabilistic hurricaneinduced storm surge hazard assessment in Guadeloupe, Lesser Antilles, Nat. Hazards Earth Syst. Sci., 15, 1711–1720, https://doi.org/10.5194/nhess1517112015, 2015. a, b, c, d
Krien, Y., Wada, R., Rohmer, J., and Jonathan, P.: Synthetic tropical cyclone data for the Caribbean Sea, Zenodo [data set], https://doi.org/10.5281/zenodo.4627903, 2021. a, b
Landsea, C. W. and Franklin, J. L.: Atlantic hurricane database uncertainty and presentation of a new database format, Mon. Weather Rev., 141, 3576–3592, 2013. a, b
Lecacheux, S., Pedreros, R., Le Cozannet, G., Thiébot, J., De La Torre, Y., and Bulteau, T.: A method to characterize the different extreme waves for islands exposed to various wave regimes: a case study devoted to Reunion Island, Nat. Hazards Earth Syst. Sci., 12, 2425–2437, https://doi.org/10.5194/nhess1224252012, 2012. a, b
Lecacheux, S., Rohmer, J., Paris, F., Pedreros, R., Quetelard, H., and Bonnardot, F.: Toward the probabilistic forecasting of cycloneinduced marine flooding by overtopping at Reunion Island aided by a timevarying randomforest classification approach, Nat. Hazards, 105, 227–251, 2021. a, b
Liese, F. and Vajda, I.: On divergences and informations in statistics and information theory, IEEE T. Inform. Theor., 52, 4394–4412, 2006. a
Lin, N., Emanuel, K., Oppenheimer, M., and Vanmarcke, E.: Physically based assessment of hurricane surge threat under climate change, Nat. Clim. Change, 2, 462–467, 2012. a
McInnes, K. L., Walsh, K. J., Hoeke, R. K., O'Grady, J. G., Colberg, F., and Hubbert, G. D.: Quantifying storm tide risk in Fiji due to climate variability and change, Global Planet. Change, 116, 115–129, 2014. a
Merrifield, M., Becker, J., Ford, M., and Yao, Y.: Observations and estimates of wavedriven water level extremes at the Marshall Islands, Geophys. Res. Lett., 41, 7245–7253, 2014. a
NadalCaraballo, N. C., Campbell, M. O., Gonzalez, V. M., Torres, M. J., Melby, J. A., and Taflanidis, A. A.: Coastal Hazards System: A Probabilistic Coastal Hazard Analysis Framework, J. Coast. Res., 95, 1211–1216, 2020. a
Simpson, R. H. and Saffir, H.: The hurricane disaster potential scale, Weatherwise, 27, 169–186, https://doi.org/10.1080/00431672.1974.9931702, 1974. a
Stephens, S. A. and Ramsay, D.: Extreme cyclone wave climate in the Southwest Pacific Ocean: Influence of the El Niño Southern Oscillation and projected climate change, Global Planet. Change, 123, 13–26, 2014. a
Vickery, P., Skerlj, P., and Twisdale, L.: Simulation of hurricane risk in the US using empirical track model, J. Struct. Eng., 126, 1222–1237, 2000. a
Wada, R., Waseda, T., and Jonathan, P.: A simple spatial model for extreme tropical cyclone seas, Ocean Eng., 169, 315–325, 2018. a, b, c, d
Wada, R., Jonathan, P., Waseda, T., and Fan, S.: Estimating Extreme Waves in the Gulf of Mexico Using a Simple Spatial Extremes Model, in: Proceedings of the ASME 2019 38th International Conference on Ocean, Offshore and Arctic Engineering, Volume 9: Rodney Eatock Taylor Honoring Symposium on Marine and Offshore Hydrodynamics, Takeshi Kinoshita Honoring Symposium on Offshore Technology, 9–14 June 2019, Glasgow, Scotland, UK, V009T13A007, https://doi.org/10.1115/OMAE201995442, 2019. a
Wada, R., Jonathan, P., and Waseda, T.: Spatial Features of Extreme Waves in Gulf of Mexico, in: Proceedings of the ASME 2020 39th International Conference on Ocean, Offshore and Arctic Engineering, Volume 6B: Ocean Engineering, 3–7 August 2020, Virtual, Online, V06BT06A007, https://doi.org/10.1115/OMAE202019190, 2020. a, b, c
Wada, R., Rohmer, J., Krien, Y., and Jonathan, P.: STME (spacetime maxima and exposure) spatial extremes model for tropical cyclones, GitHub [code], https://github.com/ygraigarw/STME, 2021. a, b
Wu, J.: Windstress coefficients over sea surface from breeze to hurricane, J. Geophys. Res.Oceans, 87, 9704–9706, 1982. a
Zahibo, N., Pelinovsky, E., Talipova, T., Rabinovich, A., Kurkin, A., and Nikolkina, I.: Statistical analysis of cyclone hazard for Guadeloupe, Lesser Antilles, Atmos. Res., 84, 13–29, 2007. a