Study of the threshold for the POT method based on hindcasted significant wave heights of tropical cyclone waves in the South China Sea

10 An assessment of extreme significant wave heights is performed in the South China Sea (SCS), which is crucial for the coastal and offshore engineering in this area. Two significant factors influencing the assessment are the initial database and the assessing method. The initial database is a basic for assessment, and the assessing method is used to extrapolate appropriate return significant wave heights based on this database during a period. In this study, a 40-year (1975-2014) hindcasted significant wave height of tropical cyclone waves is adopted as the 15 initial database. Based on this database, the peak significant wave height of every tropical cyclone wave is directly extracted as the initial sample; the independent and identically distributed assumption is satisfied; and the interference for the selection of the sample is avoided. The peak over threshold (POT) method with the generalized Pareto distribution (GPD) model is employed to extract the sufficiently large and high sample for model estimation. The peak excesses over a sufficiently high value (i.e., threshold) are fitted; thus, the return 20 Nat. Hazards Earth Syst. Sci. Discuss., https://doi.org/10.5194/nhess-2018-349 Manuscript under review for journal Nat. Hazards Earth Syst. Sci. Discussion started: 9 January 2019 c © Author(s) 2019. CC BY 4.0 License.


Introduction
Reasonable assessment of extreme significant wave heights is highly important for the security and cost of coastal defences and offshore structures (Ojeda andGuillé n, 2006, 2008;Ojeda et al., 2010Ojeda et al., , 2011;;Mortlock andGoodwin, 2015, 2016;Mortlock et al., 2017).To obtain this assessment, an appropriate probability distribution model is fitted based on a stable sample, which is extracted from an accurate initial database by a reliable sampling method.
The initial database is a basic for assessment of extreme significant wave heights (Godoi et al., 2017;Lucas et al., 2017;Li et al., 2018).In previous studies, the long-term continuous database is usually employed as the initial database, such as a 32-year measured significant wave height in the Gulf of Maine (Viselli et al., 2015), a 44-year hindcasted significant wave height in the North Atlantic Ocean (Muraleedharan et al., 2016) and a 22-year hindcasted significant wave height in the Yellow Sea (Gao et al., 2018).Considering that the extreme significant wave height is extrapolated based on an independent and identically distributed sample required for the extreme value theory (EVT) (Coles, 2001;Sobradelo et al., 2011), these time series buoy measurements and numerical hindcasts are processed before sampling.The homogenous methodology is used to extract homogenous significant wave heights via separation in carefully chosen directional sectors and seasonal analyses as well as separation of the sea state into independent wave systems (Lerma et al., 2015;Solari and Alonso, 2017).The declustering methodology, such as the double-threshold approach (Mazas and Hamm, 2011) and the minimum separation time method (Kapelonis et al., 2015), is used to differentiate the individual wave event.After implementing these methodologies, the initial sample is independently extracted under the same type of meteorological event, and the sample can be further extracted from the initial sample.However, these methodologies introduce uncertainty in the initial sample (such as the subjectivity of practitioners in the selections of the initial threshold and time window), which influences the selection of the sample.
Among these methods, a graphical diagnostic referred to as the sensitivity of the return significant wave height to the threshold (Scarrott and MacDonald, 2012) is commonly accepted (Petrov et al., 2013;Northrop and Coleman, 2014;Vanem, 2015b;Northrop et al., 2017;Sulis et al., 2017).This method fits the GPD over a range of candidate thresholds and selects the suitable threshold through identifying stability of return significant wave heights.If return significant wave heights are insensitive to the threshold, the corresponding threshold can be selected as the suitable threshold.The benefit of this method is that it requires practitioners to graphically inspect and comprehend the features of data and assess the uncertainty of the candidate thresholds (Scarrott and MacDonald, 2012).The drawback of this method is that the threshold is not uniquely selected, and another criterion is needed to identify the optimal one (Lerma et al., 2015).
In the South China Sea (SCS), time series wave parameters have been simulated (Zheng et al., 2012;Mirzaei et al., 2015;Yaakob et al., 2016), and extreme waves have been investigated based on long-term continuous data (Zheng et al., 2015;Chen et al., 2017;Wang et al., 2018).The annual maxima (AM) method (Tawn, 1988) is usually employed to extract the annual maximal significant wave height as the sample for extrapolation.
Considering that the sampling method is a basic for assessment of extreme significant wave heights, Shao et al., (2018a) compare the AM method with the POT method.They find that the distribution and representative of samples in the SCS limit the application of the AM method.Even though the return period is close to the size of the dataset, the sample of the AM method may be unreasonable for extrapolation.Because the POT is a natural sampling method without addition limitation and it guarantees the number and representative of samples when the threshold is suitable, Shao et al., (2018a) suggest that the POT method is more suitable for sampling in the SCS.In this study, the POT method is further studied in the SCS based on this conclusion.Before assessing the extreme significant wave height in the SCS, the meteorological analysis is implemented.The tropical cyclone is a major extreme weather in the SCS, which always drives the storm wave (Anoop et al., 2015;Hithin et al., 2015;Sanil Kumar and Anoop, 2015;Ojeda et al., 2017;Wang et al., 2017;Mortlock et al., 2018;Sanil Kumar et al., 2018).In addition, the number of tropical cyclones is counted in the SCS, which shows that the tropical cyclone wave is feasible for studying extreme significant wave heights.Thus, extreme significant wave heights are extrapolated based on tropical cyclone waves, and 40-year (1975-2014) hindcasted significant wave heights obtained during tropical cyclones are employed as the initial database.Because this initial database is independently simulated during the tropical cyclone, the peak significant wave height of every tropical cyclone wave is directly extracted as the initial sample, and this initial sample satisfies the independent and identically distributed assumption.Moreover, the process of the assessment is simplified, and the sample can be extracted without the influence of the homogenous and declustering methodologies.After determining the initial sample, the POT method is used to extract the peak significant wave heights over the threshold as the sample.Shao et al., (2018a) and Liang et al., (2019) have analysed the sensitivity of the return significant wave height to the threshold for threshold selection.They found that the suitable threshold should be determined within the stable threshold range (i.e., a threshold range corresponding to a range of stable return significant wave heights).
However, a unique threshold cannot be identified by this method.To select the unique threshold within the stable threshold range, Shao et al., (2018a) defined the highest threshold within the common stable threshold range as the suitable threshold.They preliminarily studied threshold selection of the POT method to analyse the sampling methods (i.e., the POT method and AM method) in the SCS.Liang et al., (2019) focused on threshold selection of the POT method and proposed an automated threshold selection method based on the characteristic of extrapolated significant wave heights (ATSME).The ATSME employs the differences in extrapolated significant wave heights for neighbouring thresholds as the diagnostic parameters to identify the uniquely stable threshold range via an automated technique and selects the highest threshold within the stable threshold range as the suitable threshold for different return periods.This method can select a suitable threshold by a pragmatically automated and computationally inexpensive technique.Considering that threshold selection criteria of Shao et al., (2018a) and Liang et al., (2019) are based on the sensitivity of the return significant wave height to the threshold, a suitable threshold cannot directly be selected without a subjective definition, which may be different for different practitioners.Liang et al., (2019) diagnosed the return significant wave height within the stable threshold range.If some return significant wave heights within the stable threshold range are relatively different from the others, the corresponding candidate thresholds are rejected.Thus, the influence of the subjective definition for the threshold and return significant wave height is weakened.This diagnostic process is crucial for the assessment of extreme wave heights, especially for the regional assessment of extreme significant wave heights.However, the subjective definition still exists in the ATSME.In the present study, threshold selection of the POT method is further studied based on the characteristic of the tropical cyclone wave in the SCS.The study results reveal that the separation value shown in the distribution of the initial sample is suitable for sampling.To validate the thresholds obtained by the distribution of the initial sample, the asymptotic tail approximation and estimation uncertainty are analysed, which show that the capabilities of this method for threshold selection.After determining the sample, the GPD model is used to extrapolate extreme significant wave heights in the SCS.
The article is structured as follows.In the next section, the EVT including the sampling method and probability distribution model is introduced.Initial data and study sites are described in Section 3. In Section 4, initial samples are extracted, and the sensitivity of the return significant wave height to the threshold is discussed.
Section 5 studies characteristics of tropical cyclone waves and the distribution of the initial sample.Finally, the conclusions are presented in Section 6.

Background
The POT method extracts the peak significant wave heights above a selected value (i.e., threshold), u , as the sample.
For a threshold, u , that is sufficiently high, the distribution function of peak excesses over the threshold can be approximated by a member of the GPD (Pickands, 1975;Embrechts et al., 1997): where * Hs represents peak excess over the threshold;  represents the scale parameter; and k represents the shape parameter.These GPD parameters (  and k ) are estimated using the maximum likelihood estimation method, which is recommended by Mazas and Hamm (2011): where N represents the number of events exceeding the threshold (i.e., the number of samples), and Hs represents the peak significant wave height.
The return significant wave height for the i -year, i Hs , is defined as follows: Thus, it can be calculated with the following equation: where T N represents the size of the dataset.

Initial data
As required by the EVT, the extreme significant wave height is extrapolated based on the independent wave under the same type of meteorological event (Lerma et al., 2015;Solari and Alonso, 2017).Before assessing the extreme significant wave height, the meteorological analysis is needed to identify the extreme weather.In the SCS, the tropical cyclone frequently occurs, and the relatively high wave usually appears during the tropical cyclone (Shao et al., 2017).It means that the tropical cyclone wave represents the extreme wave in the SCS well and the extreme significant wave height can be assessed based on the tropical cyclone wave.Therefore, significant wave heights from a 40-year (1975-2014) hindcast of tropical cyclone waves (Shao et al., 2018a) are adopted as the initial database, which is simulated using the third-generation spectral wind-wave model SWAN (an acronym for Simulating WAves Nearshore) (Booij et al., 1999;Mortlock et al., 2014;Amrutha et al., 2016).
This model is forced by the blended wind, which is obtained by combining the European Centre for Medium-Range Weather Forecasts reanalysis wind and the Holland model wind (Shao et al., 2018b).Nine hundred and seventy-four tropical cyclone waves are independently simulated during the tropical cyclone.The spatial resolution is 0.0625° for both longitude and latitude, and the temporal resolution is 1 h.

Study sites
In this study, 22 locations are selected as the study site.Detailed information on latitude and longitude for the study site and the number of tropical cyclones recorded at the study site is shown in Table 1.A tropical cyclone is recorded at the study site when the distance between the centre of this tropical cyclone and the study site is within 300 km.Hourly significant wave heights simulated during the recorded tropical cyclones are adopted as the initial database at the study site.
To study the feasibility of the tropical cyclone wave for extrapolating extreme significant wave heights, the number of recorded tropical cyclones (this number determines the number of initial samples) is analysed at the 22 study locations.The number of recorded tropical cyclones is 247 to 403, and the annual mean number of recorded tropical cyclones is 6.175 to 10.075.The corresponding tropical cyclone waves are sufficient for assessment of extreme significant wave heights (Mazas and Hamm, 2011).To present this assessment and study characteristics of tropical cyclone waves in detail, location #1 (22.00°N, 118.75°E) is selected as a representative.

Initial samples 5
The initial sample needs to satisfy the independent and identically distributed assumption.Considering that the initial database is only simulated during the tropical cyclone and comes from different tropical cyclone waves, the peak significant wave height of the tropical cyclone waves is directly extracted as the initial sample.For example, 328 tropical cyclones are recorded at location #1; thus, 328 peak significant wave heights during these tropical cyclones are extracted as the initial sample.10

Sensitivity of return values to thresholds
The threshold plays a crucial role in the POT method, which is used to extract the high peak significant wave heights as the sample from the initial sample.When the threshold is suitable, the number of samples is sufficiently large, and the value of the sample is sufficiently high.Based on this sample, the extreme significant wave height can be extrapolated, and the return significant wave height is reliable.Shao et al., (2018a) and Liang et al., (2019) analysed the sensitivity of the return significant wave height to the threshold.As shown by the theories of the GPD model (Eqs.( 2) and ( 4)), the return significant wave height for a specific return period is dependent on the threshold and the sample (the number and value of samples).When the return significant wave height is stable against the threshold, the sample is stable for extrapolation.Based on these theories and the influence of the excluded sample on the return significant wave height, Shao et al., (2018a) and Liang et al., (2019) suggested that the suitable threshold should be determined within the stable threshold range.
In the present work, the candidate thresholds within the stable threshold range are further studied to select the unique threshold without a subjective definition.These equally spaced with increasing candidate thresholds are identified by a threshold interval of 0.05 m, which is recommended by Shao et al., (2018a) and Liang et al., (2019).For each candidate threshold, the GPD is fitted by using the maximum likelihood estimation method, and the 50-year, 100-year, 150-year and 200-year return significant wave heights are extrapolated.By analysing the return significant wave height, the stable threshold range is uniquely obtained for a specific return period.For example, the stable threshold ranges for the 50-year, 100-year, 150-year and 200-year return periods at location #1 are (3.3 m, 5.75 m), (3.3 m, 5.25 m), (3.3 m, 4.6 m) and (3.3 m, 4.5 m), respectively.The detail on the stable threshold range can be found in the papers of Shao et al., (2018a) and Liang et al., (2019).

Characteristics of tropical cyclone waves
In this study, extreme significant wave heights are extrapolated based on tropical cyclone waves.To select the unique threshold without a subjective definition, characteristics of tropical cyclone waves are investigated.The track and intensity of tropical cyclones affect the tropical cyclone wave at the study site.When the track of the tropical cyclone is close to the study site and the intensity of the tropical cyclone is high, the corresponding tropical cyclone wave is sufficiently high for representing the extreme wave at the study site.In this case, the peak significant wave height of this tropical cyclone wave should be extracted as the sample.For example, the peak significant wave height during tropical cyclone Pabuk in 2007 recorded at location #1 is 5.27 m; the peak significant wave height during tropical cyclone Linfa in 2009 recorded at location #1 is 8.17 m; the peak significant wave height during tropical cyclone Molave in 2009 recorded at location #1 is 9.48 m; and the peak significant wave height during tropical cyclone Meranti in 2010 recorded at location #1 is 4.51 m.Tracks of these tropical cyclones are close to location #1 and intensities of these tropical cyclones are high when these tropical cyclones influence waves at location #1 (shown in Fig. 1).In contrast, when the track of the tropical cyclone is far from the study site or the intensity of the tropical cyclone is low, the corresponding tropical cyclone wave is insufficiently high for representing the extreme wave at the study site.In this case, the peak significant wave height of this tropical cyclone wave should not be extracted as the sample.For example, the peak significant wave height during tropical cyclone Maria in 2000 recorded at location #1 is 2.59 m, and the peak significant wave height during tropical cyclone Toraji in 2001 recorded at location #1 is 1.57 m.Although intensities of these tropical cyclones are high when these tropical cyclones influence waves at location #1, tracks of these tropical cyclones are too far from location #1 (shown in Fig. 2).The peak significant wave height during tropical cyclone Trami in 2001 recorded at location #1 is 2.47 m, and the peak significant wave height during tropical    The above analyses show that the track and intensity of tropical cyclones influence the tropical cyclone wave at the targeted location.This influence can be reflected in the distribution of the initial sample (i.e., the distribution 5 of the peak significant wave height).In Fig. 5, the distribution of the initial sample at location #1 is presented.
The peak significant wave height is counted from 0 m to 15 m with the interval of 0.05 m (this interval is equal to the threshold interval).It can easily be observed that peak significant wave heights are concentrated in two ranges: range 1 (0-4.15m) and range 2 (4.15-15 m), with a separation value of 4.15 m.To clearly show ranges 1 and 2, the curve of distribution of peak significant wave heights is plotted.In range 1, 191 peak significant wave heights 10 are found.These peak significant wave heights come from 191 independent tropical cyclone waves, and the corresponding tropical cyclone has a weak influence on the wave at location #1.The track and intensity of these tropical cyclones are analysed, which are similar to those shown in Figs. 2, 3 and 4. In range 2, 137 peak significant wave heights are found.These peak significant wave heights come from 137 independent tropical cyclone waves, and the corresponding tropical cyclone has a strong influence on the wave at location #1.The track and intensity of these tropical cyclones are analysed, which are similar to those shown in Fig. 1.It can be concluded that the distribution of the initial sample has a natural separation distinguishing the high peak significant wave height from the low peak significant wave height.Moreover, this separation value (the corresponding annual mean number of samples is 3.425) is within the stable threshold range shown in subsection 4.2.Based on the conclusions of Shao et al., (2018a) and Liang et al., (2019), the separation value can be used to extract a stable sample.To further validate the separation value for sampling, the asymptotic tail approximation and estimation uncertainty are analysed.In Fig. 6, the quantile plot for the threshold of 4.15 m is presented, which shows that there are generally few differences between the empirical and fitted values via the GPD model, indicating a good fit for the selected threshold.In Table 2, the return significant wave height with the confidence interval at location #1 under the threshold of 4.15 m is shown.The return significant wave heights for the return periods of 50-year, 100-year, 150-year and 200-year are 12.07 m, 12.70 m, 13.00 m and 13.20 m, respectively.
The likelihood method (Schendel and Thongwichian, 2017) reparametrizes the likelihood in terms of the unknown quantile and uses profile likelihood arguments to construct an approximate 95% confidence interval.
The confidence intervals for the return periods of 50-year, 100-year, 150-year and 200-year are (11.39 m, 13.08 m), (12.02 m, 13.92 m), (12.31 m, 14.36 m) and (12.50 m, 14.66 m), respectively.Their performances indicate that the variance in the extrapolated significant wave heights is acceptable.difference of the return significant wave height within the stable threshold range may be relatively large, especially for a short return period.In addition, the return significant wave heights shown in Tables 2 and 3 are compared with the return significant wave heights presented in the paper of Liang et al., (2019).In the present paper, the study locations are same as the study locations presented in the paper of Liang et al., (2019).It can be found that these two groups of return significant wave heights are similar due to the diagnosis of the return significant wave height within the stable threshold range.If some return significant wave heights within the stable threshold range are relatively different from the others, the corresponding candidate thresholds are rejected.
For example, the thresholds of 5.28 m, 4.64 m, 4.2 m and 4.24 m are obtained at location #12 for the return periods of 50-year, 100-year, 150-year and 200-year, respectively.The return significant wave heights for the return periods of 50-year, 100-year, 150-year and 200-year are 9.69 m, 9.89 m, 9.96 m and 10.05 m, respectively.
These four return significant wave heights are similar to return significant wave heights shown in Table 3.
Consequently, threshold selection criteria of Shao et al., (2018a) and Liang et al., (2019) are based on the sensitivity of the return significant wave height to the threshold.These two criteria can be used to assess the extreme significant wave height in any areas in theory.The distribution of the initial sample is based on the characteristic of the tropical cyclone wave.This criterion may only be used to assess the extreme significant wave height in a tropical cyclone wave-dominated area.However, the distribution of the initial sample can be used to visually distinguish the high peak significant wave height from the low peak significant wave height, and the suitable threshold can uniquely be selected without a subjective definition.

Conclusions
In this study, extreme significant wave heights are assessed in the SCS.Before implementing this assessment, the 5 meteorological phenomenon is analysed to identify the extreme weather.In the SCS, the tropical cyclone frequently occurs and always drives the storm wave.Thus, the extreme wave is studied based on the tropical cyclone wave, and a 40-year hindcasted significant wave height of tropical cyclone waves is employed as the initial database.Because this initial database is only simulated during the tropical cyclone and comes from independent tropical cyclone waves, the peak significant wave height of every tropical cyclone wave is directly extracted as the initial sample.The independent and identically distributed characteristic of the initial sample is satisfied, and the interference of homogenous and declustering methodologies for the selection of the sample is avoided.
Based on the initial sample, the POT method is used to extract the peak significant wave heights over the threshold as the sample.To avoid the subjective definition of the threshold selection criterion, characteristics of tropical cyclone waves are analysed.The analysis results show that the track and intensity of tropical cyclones affect the sample at the targeted location.When the track of the tropical cyclone is close to the targeted location and the intensity of the tropical cyclone is high, the peak significant wave height of this tropical cyclone wave should be extracted as the sample at the targeted location.In contrast, when the track of the tropical cyclone is far from the targeted location or the intensity of the tropical cyclone is low, the peak significant wave height of this tropical cyclone wave should not be extracted as the sample at the targeted location.These characteristics can be reflected in the distribution of the initial sample.The separation value is easily observed in the distribution of the initial sample, and this separation value divides the initial sample into the low peak significant wave heights (the corresponding track is far or the corresponding intensity is low) and the high peak significant wave heights (the corresponding track is close and the corresponding intensity is high).Considering that this separation value is within the stable threshold range, this separation value can be used to extract a stable sample for extrapolation.
Therefore, the separation value shown in the distribution of the initial sample is selected as a suitable threshold for sampling.Based on the extracted sample, the GPD model is used to extrapolate the 50-year, 100-year, 150-year and 200-year return significant wave heights at the 22 study locations in the SCS.To validate the reliabilities of the selected threshold and corresponding return significant wave height, the asymptotic tail approximation and estimation uncertainty are analysed, which show that the selected threshold is suitable and the return significant wave height is reasonable.Considering that the separation value shown in the distribution of the initial sample reflects the characteristic of the tropical cyclone wave, this separation value is suggested for sampling when an assessment of extreme significant wave heights is needed in a tropical cyclone wave-dominated area (such as the SCS).
Nat. Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2018-349Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 9 January 2019 c Author(s) 2019.CC BY 4.0 License.cyclone Wutip in 2007 recorded at location #1 is 2.20 m.Although tracks of these tropical cyclones are close to location #1, intensities of these tropical cyclones are low when these tropical cyclones influence waves at location #1 (shown in Fig. 3).The peak significant wave height during tropical cyclone Kai-tak in 2005 recorded at location #1 is 1.11 m, and the peak significant wave height during tropical cyclone Kammuri in 2008 recorded at location #1 is 2.36 m.Tracks of these tropical cyclones are far from location #1, and intensities of these 5tropical cyclones are low when these tropical cyclones influence waves at location #1 (shown in Fig.4).

Fig. 1 .
Fig. 1.Tracks of centres of tropical cyclones Pabuk, Linfa, Molave and Meranti (triangle stands for location #1, curves stand for tracks of centres and circles stand for locations of centres).

Fig. 3 .
Fig. 3. Tracks of centres of tropical cyclones Trami and Wutip (triangle stands for location #1, curves stand for 5 tracks of centres and circles stand for locations of centres).
Fig. 4. Tracks of centres of tropical cyclones Kai-tak and Kammuri (triangle stands for location #1, curves stand for tracks of centres and circles stand for locations of centres).

Fig. 5 .
Fig. 5. Histogram of the peak significant wave height from 0 m to 15 m with intervals of 0.05 m at location #1.

Fig. 6 .
Fig. 6.The quantile plot for GPD-fitted peak significant wave heights at location #1 for the threshold of 4.15 m.
Fig. 7. Histograms of the peak significant wave height at locations #7 and #10.

Fig. 8 .
Fig. 8. Quantile plots for GPD-fitted peak significant wave heights ((a) for the threshold of 3.35 m at location #7 and (b) for the threshold of 4.1 m at location #10).

Table 1
Study locations and numbers of tropical cyclones.

Table 3
Statistics for thresholds, samples and return significant wave heights with 95% confidence intervals.