Variations in return value estimate of ocean surface waves – a study based on measured buoy data and ERA-Interim reanalysis data

An assessment of extreme wave characteristics during the design of marine facilities not only helps to ensure their safety but also assess the economic aspects. In this study, return levels of significant wave height (Hs) for different periods are estimated using the generalized extreme value distribution (GEV) and generalized Pareto distribution (GPD) based on the Waverider buoy data spanning 8 years and the ERA-Interim reanalysis data spanning 38 years. The analysis is carried out for wind-sea, swell and total Hs separately for buoy data. Seasonality of the prevailing wave climate is also considered in the analysis to provide return levels for short-term activities in the location. The study shows that the initial distribution method (IDM) underestimates return levels compared to GPD. The maximum return levels estimated by the GPD corresponding to 100 years are 5.10 m for the monsoon season (JJAS), 2.66 m for the premonsoon season (FMAM) and 4.28 m for the post-monsoon season (ONDJ). The intercomparison of return levels by block maxima (annual, seasonal and monthly maxima) and the r-largest method for GEV theory shows that the maximum return level for 100 years is 7.20 m in the r-largest series followed by monthly maxima (6.02 m) and annual maxima (AM) (5.66 m) series. The analysis is also carried out to understand the sensitivity of the number of observations for the GEV annual maxima estimates. It indicates that the variations in the standard deviation of the series caused by changes in the number of observations are positively correlated with the return level estimates. The 100-year return level results ofHs using the GEV method are comparable for short-term (2008 to 2016) buoy data (4.18 m) and long-term (1979 to 2016) ERA-Interim shallow data (4.39 m). The 6 h interval data tend to miss high values of Hs, and hence there is a significant difference in the 100-year return level Hs obtained using 6 h interval data compared to data at 0.5 h interval. The study shows that a single storm can cause a large difference in the 100-year Hs value.


Introduction
Coastal zones are relatively dynamic compared to the rest of the regions due to numerous natural as well as anthropogenic activities.Events such as extreme waves, storm surges and coastal flooding cause large catastrophes in the coastal region.The long-term (climate) behavior of sea state variables can be studied using non-stationary multivariate models that represent the time dependence of the variables (Solari and Losada, 2011).Various marine activities such as the design of coastal and offshore facilities, planning of harbor operations and ship design require detailed assessment of wave characteristics with certain return periods (Caires and Sterl, 2005;Menéndez et al., 2009;Goda et al., 2010).Generally, extreme value theory (EVT) is used for the determination of return levels by adopting a statistical analysis of historic time series of wave heights obtained from various sources such as in situ buoy measurements (e.g., Soares and Scotto, 2004;Méndez et al., 2008;Viselli et al., 2015), satellite data (e.g., Alves et al., 2003;Izaguirre et al., 2010), and hindcasted or reanalysis data by numerical models (e.g., Goda et al., 1993;Caires and Sterl, 2005;Teena et al., 2012;Jonathan et al., 2014).EVT consists of two types of distributions, viz. the generalized extreme value (GEV) distribution family which includes the Gumbel, Fréchet and Weibull distributions (Gumbel, 1958;Katz et al., 2002) and generalized Pareto distribution (GPD) T. Muhammed Naseef and V. Sanil Kumar: Variations in return value estimate of ocean surface waves which incorporates the peak over threshold (POT) approach (Pickands, 1975;Coles et al., 2001).
GEV distribution by annual maxima (AM) observations (Goda, 1992) is one of the widely used methods in the EVT analysis.The main difficulty with using this method is the unavailability of reliable observations at a location of interest.To overcome the data scarcity, two different alternatives have been used by various authors: (i) the initial distribution method (IDM), in which all the data are used (Alves and Young, 2003); and (ii) the r-largest approach (Smith, 1986), where a number of the largest observations from a block period are considered rather than one observation as used in the AM method.The POT method (Abild et al., 1992) provides a good number of observations available for the analysis.Although there have been various proposals to automate threshold selection, threshold estimation for the application of the POT method to a single sample is still not resolved (Solari and Losada, 2012;Solari et al., 2017).GPD is another class of distribution introduced by Pickands (1975) and has been used by several authors such as Caires and Sterl (2005) and Thevasiyani et al. (2014).Teena et al. (2012) and Samayam et al. (2017) have carried out the EVT analysis of ocean surface waves in the northern Indian Ocean based on wave hindcast data and ERA-Interim reanalysis data.
The most reliable source of ocean wave data is buoy measurements, and it can be used for EVT analysis (Panchang et al., 1999).In this paper, data from a directional Waverider buoy located in the central western shelf of India are used.Seasonality is one of the important aspects of climate data, and, therefore, it should be incorporated in the EVT analysis of waves, especially in a region such as the Arabian Sea.Seasonal analysis of the extremes helps in the planning of short-term marine activities such as offshore explorations and maintenance of coastal facilities.In the present paper, the EVT analysis is carried out by following both the GEV and GPD methods considering wind-sea, swell and total significant wave height (H s ) separately.The IDM and POT methods are used for total wave height analysis, and block maxima (annual and monthly maxima) and the r-largest method are used in wind-sea and swell height analysis.Since the measured buoy data are for a short period of 8 years, the ERA-Interim reanalysis data from 1979 to 2016 are also used for comparing the H s value with the 100-year return period.
The paper is organized as follows.Section 2 deals with the data and methodology used in the analysis.It also presents the threshold selection adopted in the study, and Sect. 3 explains the results obtained in the analysis, categorized into seasons using total H s data, and comparison of return level estimation by different GEV approaches using wind-sea and swell height data.A case study is also included in the section for realizing the uncertainty related to observations in the AM approach when a limited number of observations are available.The influence of length of wave data on the estimated H s return value is also covered under this section.Section 4 provides the concluding remarks.

Data
Data used in the analysis are from a Datawell directional Waverider buoy deployed off Honnavar (14.304 • N; 74.391 • E) at a water depth of 9 m.The half-hourly sampled data cover the period from March 2008 to February 2016.The waves at the location show strong intra-annual variations due to the prevailing wind system during monsoon and non-monsoon seasons (Sanil Kumar et al., 2014).To understand the local and remote influences on the design wave characteristics, we analyzed H s of wind-sea, swell and total waves separately.A season-wise study is also carried out since it will provide insight into the design wave heights for short-term coastal activities.
The H s data from ERA-Interim (Dee et al., 2011), the global atmospheric reanalysis product of the European Centre for Medium-Range Weather Forecast (ECMWF), from 1979 to 2016 (38 years) are also used to evaluate the 100and 50-year return period wave height in the shallow (water depth ∼ 20 m) and deep water.The shallow region is close to the buoy location, and the deep water location is at a water depth of ∼ 4000 m (Table 1).The ERA-Interim reanalysis used in the study has a spatial resolution of 0.125 • × 0.125 • and a temporal resolution of 6 h.

Methodology
EVT analysis is carried out by following the GEV distribution model and the POT method in which exceedance over a reliable threshold wave height can be fit into GPD.In the POT method, distribution of excess, x, over a threshold u is defined as where y = x −u.Pickands (1975) shows that the distribution function of excess, F u (y), for a sufficiently high threshold u converges to GPD, having a cumulative distribution function (CDF) as follows: GEV has a CDF as follows: where α is scale parameter in the range of α > 0, β is the location parameter with possible values of −∞ < β < ∞ and   k is the shape parameter in the range of −∞ < k < ∞.GPD can be further categorized into three distributions based on its tail features.When k = 0, GPD corresponds to an exponential distribution (medium-tailed or Pareto type I) with mean α; when k > 0, GPD is short-tailed, also known as Pareto type II; when k < 0, the distribution takes the form of ordinary Pareto distribution, having a long-tailed distribution (also known as Pareto type III).Parameter estimation and statistical distribution fitting are carried out by using the WAFO toolbox (Brodtkorb et al., 2000), developed by Lund University, Sweden.
The analysis is carried out by using the wind-sea, swell and total H s data covering ∼ 8 years (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016).From the measured data, to separate the wind seas and swells, the method proposed by Portilla et al. (2009) is used.The separation algorithm is based on the assumption that the energy at the peak frequency of a swell cannot be higher than the value of a Pierson-Moskowitz (PM) spectrum with the same frequency.If the ratio between the peak energy of a wave system and the energy of a PM spectrum at the same frequency is above a threshold value of 1, the system is considered to represent wind sea -otherwise it is taken to be a swell.A separation frequency f c is estimated following Portilla et al. (2009), and the swell and wind-sea parameters are obtained for frequencies ranging from 0.025 Hz to f c and from f c to 0.58 Hz respectively.The GPD method is used for seasonal analysis of different period data series.The GEV method is used for intercomparison of return level estimation among wind-sea, swell and resultant data sets by extracting different block maxima series: (i) seasonal maxima, which contain the highest observations from each season; (ii) monthly maxima, which contain one highest observation from each month; and (iii) annual maxima.The parameters are estimated using the probability-weighted moment (PWM) method since the data set duration is very limited, and the PWM method holds good results compared to other methods such as the maximum likelihood method (Hosking et al., 1985).
To study the uncertainties related to the length of the observation, we extracted 3, 6, 12 and 24 h data series from the half-hourly original data and carried out EVT analysis.Since the wave climate in the study location is strongly characterized by the prevailing seasonal behavior of wind system, we took further consideration of uncertainties related to a seasonal aspect of wave climate by extracting three seasonal data sets, viz.pre-monsoon (FMAM), monsoon (JJAS) and post-monsoon (ONDJ) seasons.
The major drawback of EVT analysis using the block maxima method, especially the annual maxima, is that it does not consider the significant amount of observations which are closely related to storm features of the data set.Those omissions of observations would cause significant variations in the final results of EVT analysis, especially in the cases where EVT analysis is performed for a very limited data set.EVT is based on the assumption that the observations under consideration are independent and identically distributed (Coles et al., 2001).We can expect identical status of ocean wave observations for a large extent.Since the POT approach resamples the data over a threshold value, making identical and independent observations is a tedious task.A suitable combination of threshold and minimum separation time between the resampled observations must be taken into account to establish independence among the observations.The average duration of tropical storms in the Arabian Sea is 2-3 days (Shaji et al., 2014).Therefore, in the present analysis, we fixed a minimum of 48 h of separation time in     (AD) test and Cramér-von Mises (CM) test (Stephens, 1974;Choulakian and Stephens, 2001).The distributions used in the analysis are validated using graphical tools such as quantile-quantile (Q-Q) plots and CDF plots.In addition to above graphical tools, we checked the reliability of the chosen thresholds for the POT method by using different GOF tests such as KS, AD and CM tests (Table 2).A p value > 0.05 indicates that the selected distribution does not show a significant difference from the original data within the 5 % significance interval.

Long-term statistical analysis of total H s
The mean wave climate at the study location is characterized by an annual mean H s of 1.04 m.The maximum H s of the data during 2008-2016 is 4.70 m, and the next highest H s is 4.34 m (Fig. 1), whereas the highest wind-sea H s and swell H s are 4.29 and 4.28 m respectively.A statistical analysis of H s was carried out by considering the seasonal characteristics of the wave climate.To study the seasonal aspects of the return level estimation, the data are grouped into three different seasonal series, viz.FMAM, JJAS and ONDJ seasons, in addition to full-year data.Since the study is located off the central west coast of India, the wave climate shows distinct variability throughout the year.Previous studies such as  that of Anoop et al. (2015) reported that average H s attains its peak at around 3 m during JJAS and that the FMAM season is relatively calm (0.5-1.5 m) compared to ONDJ (1.5-2 m).The seasonal analysis is carried out using H s data following both the GEV and GPD methods.Here, the initial distribution method is considered in the GEV method rather  than block maxima (Mathiesen et al., 1994).One of the challenging tasks for GPD modeling is the selection of a suitable threshold value.The threshold should be high enough for observations to be independent, and data after POT analysis must have the necessary number of observations in order to converge the POT analysis into GPD.SME plots and PS plots are used to select a range of initial thresholds.Upon analyzing the resultant GPD fit for those thresholds, the final thresholds are chosen with the help of GOF tests, which are presented in Table 2. Figure 2 and Table 3 show the estimated parameters using the PWM method for both GEV and GPD.It is clear that shape parameters in both cases are negative, indicating that the models are a type III distribution for GPD and a Weibull distribution for GEV.Table 3 also shows the RMSE in the chosen model for each data series with estimated CDF.It is evident that the JJAS season has a lower RMSE (∼ 0.07 m on average) when considering the GPD model, while, in the case of the GEV model, the fullyear data series has a lower RMSE (∼ 0.02 m on average).The ONDJ season shows a higher discrepancy in both cases, resulting in an average RMSE of 0.31 and 0.54 m for GPD and GEV respectively.Figure 3 shows the typical SME and PS plots used for choosing a range of thresholds before fixing the final threshold for POT analysis on each series.In this particular case (6 h data series of FMAM season), a range of thresholds from 1.10 to 1.32 m was selected, and the final threshold of 1.19 m was fixed for analyzing the GOF test results (Table 2).

Full year
Here, we considered full-year data series without dealing with seasonality, and both the GEV and GPD are used in the analysis.Initially, a range of thresholds from 2.5 to 3.4 m was selected, and further adjustment of the threshold is carried out by analyzing the GOF test results.Table 2 shows the selected thresholds and the corresponding GOF test results for each series in the full-year data analysis.It is clear that the selected thresholds are in good agreement with the GOF test results.Both the KS test and CM test give a p value > 0.32.Moreover, both CDF plots and Q-Q plots (see Fig. 4, first and second rows respectively) show that the selected GPD models exhibited a good performance for the particular POT series.After acquiring the best fit model, return levels (Table 4) were estimated for 10, 50 and 100 years.The GPD model estimates a 10-year return level smaller than that of the maximum measured total H s value by 5 to 15 %.An underestimation of 10 to 25 % from the maximum measured value was reported by Samayam et al. ( 2017) compared to the 36-and 30-year return levels based on ERA-Interim reanalysis data for deep waters around the Indian mainland.
The initial distribution approach underestimates the return levels in such a way that even the 100-year return level does not cross the highest observation (4.70 m) in the data and the largest 100-year return level is reported as 4.26 m when dealing with half-hourly data series (Table 4).The large number of observations having a very low H s in the data series used in the analysis leads to the underestimation in the initial distribution method, whereas the GPD model estimated 4.73 and 4.96 m as the 50-and 100-year return levels respec-  tively.When considering different time interval data, both 12 and 24 h data series estimate lower return levels compared to other series by the GEV model.It is evident that there are uncertainties related to the sampling interval adopted for the return value estimation.The standard deviation for GPD estimation when considering different time intervals is 0.57 m, which is highest among the other seasonal data.GEV estimation reports an even lower spread of return levels with 0.16 m standard deviation.

Pre-monsoon season
The data from February to May constitute the pre-monsoon data set.Pre-monsoon is the calmest season in the study location, with a maximum and an average H s of around 1.94 and 0.73 m respectively.Using SME and PS plots, a range of thresholds from 1.19 to 1.32 m is selected for each time series and fitted to the corresponding GPD by using the resultant POT values.The final threshold selected by the help of GOF tests is presented in Table 2. KS and CM tests give a p value of more than 0.43 and 0.45 respectively on average (Table 2).Since the p values are more than 0.05, the chosen POT is not significantly different from the time series data.CDF plots and Q-Q plots (Fig. 5) for the different data series of the season illustrate the reliability of the chosen model.Return levels for different return periods using a particular GPD are presented in Table 4. GEV estimation exhibits the same characteristics of underestimation as shown in the fullyear analysis.Average 100-year return levels estimation using different time interval data using the GEV model attained a value of only 1.77 m, which is less than the highest observed data point in the season, whereas GPD reports an average 100-year return level of 2.49 m.Time interval analysis for the season exhibits the least discrepancies among the return level estimations compared to other seasons.Standard deviations of 0.11 and 0.08 m for GPD and GEV estimations respectively were observed for 100-year return levels considering different time series data.

Monsoon season
The  are recorded for the maximum and average respectively during the season.A range of thresholds (2.78 to 3.49 m) is selected for preliminary GPD fitting as a result of interpreting SME and PS plots of each data series, and the corresponding final thresholds were selected after clarifying with the GOF test results (Table 2).Both KS and CM tests report a p value > 0.56, indicating that the resulting POT series for the selected threshold converges into GPD.CDF and Q-Q plots in Fig. 6 shows the reliability of the adopted threshold value.Return levels for the distinct return period were estimated using the resultant POT series.Table 4 provides 10-, 50-and 100-year return period values estimated using GPD and GEV models.For half-hourly data, GPD projects a value of 4.80 m for the 100-year return level, whereas GEV underestimates it, with a value of 4.29 m.The GPD model shows a 0.36 m standard deviation among the return levels for different time interval data.Both the 12 and 24 h series gave lower return levels compared to other series.

Post-monsoon season
The post-monsoon season constitutes data from the October to January months of the year, and the observed maximum H s in this season is 2.41 m.The majority of observations during this season lie below the average value of H s .Only 32 % of the observations lie above 1.13 m, and 8 % of the data are above 1.5 m.Hence, selecting the best threshold for the season was more difficult.GPD was fitted for a range of thresholds (0.7 to 1.3 m) selected from SME and PS plots corresponding to each series.Most suitable thresholds were selected after checking the goodness of fit of GPD (Table 2).
The GOF test results show that the ONDJ series holds maximum uncertainties on threshold selection due to lower p values for the KS test ranges from 0.13 to 0.48 and from 0.19 to 0.45 for the CM test.Figure 7 shows the CDF and Q-Q plots.The GEV and GPD estimations for the post-monsoon season show very large difference among return levels (Table 4).The average percentage difference between the 100year return values obtained from GEV and GPD estimations is ∼ 60 %.This shows that the GEV model clearly underperforms during the ONDJ season, when the initial distribution methods were adopted.The highest return level reported by the GPD model is 4.28 m, whereas GEV estimated about 2.3 m for the season.The ONDJ season has a standard deviation of 0.30 and 0.13 m for the GPD and GEV estimation respectively while using different sampling intervals.

Long-term statistical analysis of wind seas and swells
In this section, we relied on the GEV method based on block maxima.For that purpose, we extracted total, wind-sea and swell H s data into different block maxima, viz.monthly, seasonal and annual maxima series.Two seasonal maxima series are considered in such a way that one includes the highest two observations in a season and another one consist of the highest observation from each season.Therefore, the monthly maxima series includes 96 data points.Both seasonal maxima series (seasonal maxima 1 and 2) consist of 24 and 48 data points respectively.The annual maxima series covers eight data points.Table 5 shows the estimated return levels corresponding to various return periods.It is clear that both seasonal maxima series provide the highest return levels for total H s (6.56 and 7.20 m) and swell H s (5.95 and 6.35 m), whereas wind-sea H s is 6.16 m when the annual maxima series is considered.The GEV-AM model shows an underestimation of the 10-year return level compared to the maximum measured data.The annual maxima series resulted in a value of 5.66 m as the 100-year return level for the total H s (Fig. 8), which is comparable to Teena et al. (2012) estimation for the location off the central west coast of India.We performed a separate analysis of the annual maxima series to get insight into the abnormal results observed for wind-sea data series.Here, we considered four unique series of different lengths by taking annual maxima observations from 2008 to 2016; that is, the first series (S1) consists of five data points (2008)(2009)(2010)(2011)(2012) and second series (S2) consists of six data points (2008)(2009)(2010)(2011)(2012)(2013) and so on.The density plots showing the probability for different wave height class are presented in Fig. 9 along with the corresponding GPD fit.We calculated the standard deviation for each series and the percentage difference between each series and the parent series (S0).The result shows that return levels are positively Table 6.Table showing the results of the case study.The standard deviations (SDs) of each data series considered are provided, and percentage differences among the SDs of each series with parent series (S0) are given in the brackets.The percentage difference in the corresponding return level estimation is also shown in the brackets of the respective return periods.correlated with standard deviation (Table 6).In the case of the total H s , the correlation between the changes in standard deviation and the corresponding changes in 100-year return levels is 0.997, whereas for wind sea and swell they are 0.964 and 0.647 respectively.The annual maxima of wind sea (4.29 m) for the year 2015 caused an abrupt change in the standard deviation of the series by about 0.46 m, which is more than 17 % of the average of the series excluding 2015.Therefore, the 100-year return level for wind sea overshoots by about 6.16 m, resulting in a 66 % difference from return value obtained for S3 series.In this case study, the length of the special series under consideration does not influence the estimated return levels; that is, in the case of the total H s series, the 100-year return level for the S1 series is greater than for both the S2 and S3 series.The same characteristics can also be seen in the case of swell H s .Therefore, return levels for annual maxima by the GEV model have greater influence over how a single data point, i.e., the annual maxima, alters the standard deviation of the series rather than the changes in the length of the series.

Influence of length of wave data on the estimated significant wave height return value
An analysis is carried out to check uncertainties in return level estimation related to the length of the wave record.From the 0.5 h buoy-measured data, data at 6 h intervals are extracted and used for the analysis, and the return levels obtained by using 6 h measured buoy data are compared with the return level obtained from the 6 h ERA-Interim data at shallow and deep locations (Fig. 8).The 6-hourly ERA-Interim reanalysis 38-year data (1979-2016) 2015) observed that ERA-Interim overestimates the H s for shallow water locations along the west coast of India due to swell height overestimation, and the difference between the ERA-Interim H s and the buoy H s is up to 15 %.For the study location, the storm-induced wave heights during the non-monsoon period are less than the monsoon-induced waves.The 1st week of June is the onset of the Indian summer monsoon, and the maximum H s in the study area is due to monsoon influence; in all years, this occurs during June to September.The 100-year return levels using the GEV method are comparable for buoy data (4.18 m) and ERA-Interim shallow data (4.39 m), while that for ERA-Interim deep is 5.67 m (Fig. 8).It is clear that the 100-year H s return level using GEV for ERA-Interim data is lower than the maximum H s in the data, while, in the case of buoy data, the 100-year return level is slightly higher than the highest H s value.The return levels obtained by the GPD method show significant discrepancy among 100-year estimates.The 100year return level obtained for buoy data is 4.46 m, but that using ERA-Interim shallow data is 6.18 m and that for ERA-Interim deep is 7.28 m.The 100-year H s return level for deep water has closer values following GEV and GPD, while, in the shallow water, a significant difference is obtained.The 6 h interval data tend to miss 18 values of H s between 4.11 and 4.70 m, and hence there is a significant difference in the 100-year return level of H s based on GEV-AM obtained using these data compared to that based on the data at 0.5 h intervals.
We have examined the difference in the return level of H s by considering data in different blocks; i.e., 10, 20, 30 and 38 years using the ERA-Interim shallow water data.The study indicates a large underestimation (∼ 18 %) in the return level estimate if we consider only the first 10 years of (1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988) data in place of the 38 years (Fig. 10).The large difference in the values of H s return level is due to the occurrence of a tropical storm in the Arabian Sea during 9-12 June 1996, which resulted in high wave heights, H s values of up to 5.46 m, whereas the maximum H s excluding this storm is 4.63 m.During the 1996 storm, an H s of 5.69 m is measured by a Datawell directional Waverider buoy moored at 23 m water depth off Goa (Sanil Kumar et al., 2006), which is ∼ 150 km north of the present study area.The data blocks containing this storm data, i.e., the 20-year (1979-1998) and 30-year (1979-2008) data, did not show much difference in The long-term and decadal trend of wave climate in the different parts of major oceans is studied (Young et al., 2011).We have examined the trend in H s at the shallow location based on the ERA-Interim data from 1979 to 2016.The study shows that the annual maximum H s shows a weak increasing trend (1.1 cm yr −1 ), whereas there is no significant trend in the annual mean value (Fig. 11).Sanil Kumar and Anoop (2015) observed that during 1979 to 2012 the average trend of annual mean H s for all the locations in the western shelf seas was 0.06 cm yr −1 .

Influence of water depth on the measured buoy data
The relative water depth based on the spectral peak period (d/L p ) indicates that most of the time (97.8 to %) the wave regime is in intermediate water (Table 7).Only during 0.1 to 0.8 % of the time do the waves satisfy the deep water condition.Hence, the waves measured by the buoys are influenced by the bathymetry, and the wave characteristics are different in the deep water.The wave rose plots from March 2008 to February 2016 based on the measured buoy data and the ERA-Interim reanalysis data at shallow and deep water locations are presented in Fig. 12.As the waves move from deep to shallow waters, the direction of high waves shifted from southwest to west.The limiting value of wave height based on breaker criteria is 0.6 to 0.78 times the water depth (Massel, 1966).The maximum H s in the measured   priate thresholds for the POT method is justified using different GOF tests results.Analysis of the total H s shows that the IDM approach underestimates return levels for different seasons compared to the corresponding GPD.The 100-year return levels estimated by IDM are almost comparable with the corresponding GPD estimation for the 10-year period, but there is a significant difference in the return level estimates when considering different sampling intervals.IDM estimates largely underestimate return levels for the postmonsoon season since the majority of the observation in this season lies away from its tail of the distribution.
Long-term statistics of wind-sea and swell data are calculated by the GEV model following block maxima and the r-largest methods.Annual maxima and monthly maxima are considered for block maxima series, and two seasonal maxima series are considered for the r-largest method.It is shown that these methods give higher return levels than the GPD models.The the r-largest method provides 7.20 m as the 100-year return level when compared to 5.27 m of the GPD model.The sensitivity analysis of the GEV-AM model shows that change in the standard deviation of data series under consideration causes discrepancies in the return level estimates rather than a change in the length of the series.Both GEV and GPD models underestimate 10-year return levels compared to maximum measured data.The 100-year return levels acquired by using the GEV method are comparable for short-term (2008 to 2016) buoy data (4.18 m) and the long-term (1979 to 2016) ERA-Interim shallow data (4.39 m).The 6 h interval data tend to miss high values of H s , and hence there is a significant difference in the 100-year return level H s obtained using these data compared to data at 0.5 h intervals.The ERA-Interim data show that from 1979 to 2016 the annual maximum H s shows a weak increasing trend (1.1 cm yr −1 ).The study shows that a single storm can create a large difference in the 100-year H s value, compared to the differences in values obtained from a different length of the data block.

Figure 1 .
Figure 1.Time series plot of the significant wave height measured by buoy and from ERA-Interim data at shallow and deep water.

Figure 2 .
Figure 2.Estimated shape parameters for different seasonal data with different sampling intervals used in the (a) GEV and (b) GPD model.

Figure 3 .
Figure 3.Typical (a) SME and (b) PS plots used for selecting a range of thresholds required for POT analysis.In this particular case, a range of 1.19 to 1.32 m was selected.

Figure 4 .
Figure 4. Figure corresponding to the full-year analysis.(a) to (e) represent the CDF plots for half, 3, 6, 12 and 24-hourly data respectively; (f) to (j) correspond to Q-Q plots; (k) to (o) correspond to return levels estimated using the GPD model.

Figure 5 .
Figure 5. Same as in Fig. 4 but corresponding to the pre-monsoon season.

Figure 6 .
Figure 6.Same as in Fig. 4 but corresponding to the monsoon season.

Figure 7 .
Figure 7. Same as in Fig. 4 but corresponding to the post-monsoon season.

Figure 8 .
Figure 8.Return levels of significant wave heights for different return periods based on buoy data (2008-2016), ERA-Interim shallow water data and ERA-Interim deep water data (1979-2016) at 6 h intervals by the GEV model using annual maxima series.

Figure 9 .
Figure 9. Density plots showing the probability for different wave height class.Total, wind-sea and swell H s are presented row wise.Columns correspond to the selected number of data points (5 to 8 years).The solid curve is the corresponding GPD fit.

Figure 10 .
Figure 10.Return levels of significant wave heights for different return periods based on ERA-Interim shallow water data in different block years by the GEV model using annual maxima series.

Figure 11 .
Figure 11.Variation of the (a) annual maximum and (b) annual mean H s at the shallow locations based on ERA-Interim data.The solid line indicates the trend in H s during 1979 to 2016.

Figure 12 .
Figure 12.Wave rose plots from March 2008 to February 2016 based on the measured buoy data and the ERA-Interim reanalysis data at shallow and deep water locations.

Table 1 .
The comparison of 50-and 100-year significant wave height return levels based on buoy, ERA-Interim shallow and ERA-Interim deep data at 6 h intervals along with data statistical parameters.

Table 2 .
Different goodness of fit tests used for selecting threshold values of POT analysis.H = 0 indicates that the test does not reject the hypothesis at the 5 % significance level (i.e., p value > 0.05 or test statistics is less than critical value), and H = 1 indicates that the hypothesis is rejected.KS test represents the Kolmogorov-Smirnov test and CM test represents the Cramér-von Mises test.
Seasons Time interval H s-max (m) Threshold (m) KS test CM test p value Test statistics Critical value H p value Test statistics Critical value H

Table 3 .
Table showing different parameters and corresponding RMSEs of data and the estimated CDF used during each data series analysis.

Table 4 .
Estimated return values corresponding to different seasons using total wave height (H s ) following the GEV and GPD methods.Here the GEV method follows the initial distribution approach.

Table 5 .
Return levels estimated by the GEV model using total, wind-sea and swell data for different block maxima series.
Data Total H s (m) Wind-sea H s (m) Swell H s (m) 10 years 50 years 100 years 10 years 50 years 100 years 10 years 50 years 100 years are used in this analysis.Buoy data consist of 11 479 data points, and ERA-Interim data consist of 55 520 data points (Table1).The highest observed H s in the 6-hourly buoy data is 4.11 m followed by 4.03 m, while the maximum H s in ERA-Interim shallow water is 5.45 m and in ERA-Interim deep water it is 7.13 m.The H s values at the deep location are ∼ 1.4 times the values at the shallow location, and this resulted in a higher return level of H s at the deep location.Sanil Kumar and Muhammed Naseef ( Long-term statistical analysis of extreme waves is carried out based on GEV and GPD models using measured buoy data from March 2008 to February 2016 and the ERA-Interim data from 1979 to 2016.Return levels are calculated for resultant, wind-sea and swell H s separately.The analysis is also conducted for data under three different seasons.The parent data are resampled into 3-, 6-, 12-and 24-hourly series and are used to estimate the discrepancy in return level estimation between the different series.Selection of appro- www.nat-hazards-earth-syst-sci.net/17/1763/2017/Nat.Hazards Earth Syst.Sci., 17, 1763-1778, 2017

Table 7 .
The percentage of time of the waves in the shallow, intermediate and deep water regime in different years along with the mean wave period and mean peak wave period.YearMean wave period (s) Criteria based on ratio of water depth and wave length corresponding to mean wave period Mean peak wave period (s) Criteria based on ratio of water depth and wave length corresponding to peak wave period