Increased coastal flooding caused by extreme sea levels (ESLs) is one of the major hazards related to sea level rise. Estimates of return levels obtained under the framework provided by extreme-event theory might be biased under climatic non-stationarity. Additional uncertainty is related to the choice of the model. In this work, we fit several extreme-value models to two long-term sea level records from Venice (96 years) and Marseille (65 years): a generalized extreme-value (GEV) distribution, a generalized Pareto distribution (GPD), a point process (PP), the joint probability method (JPM), and the revised joint probability method (RJPM) under different detrending strategies. We model non-stationarity with a linear dependence of the model's parameters on the mean sea level. Our results show that non-stationary GEV and PP models fit the data better than stationary models. The non-stationary PP model is also able to reproduce the rate of extremes occurrence fairly well. Estimates of the return levels for non-stationary and detrended models are consistently more conservative than estimates from stationary, non-detrended models. Different models were selected as being more conservative or having lower uncertainties for the two datasets. Even though the best model is case-specific, we show that non-stationary extremes analyses can provide more robust estimates of return levels to be used in coastal protection planning.

Coastal zones are extremely vulnerable to extreme sea levels (ESLs; Kron, 2013). Exposure to coastal flooding damage is projected to increase in the future (Jongman et al., 2012) due to the higher frequency, magnitude, and duration of extreme sea levels (Tebaldi et al., 2021; Devlin et al., 2021). The mean sea level rise is among the causes of this increase (Menéndez and Woodworth, 2010; Marcos et al., 2009). The design of structures to protect coasts from flooding (minimizing, for example, damage to infrastructures and coastal erosion) relies on the knowledge of ESLs that are likely to occur with a given probability (Boettle et al., 2016). Extreme-event theory provides a theoretical background to fit historical extremes with specific probability distribution functions (Coles et al., 2001) and is widely used for estimating the probability of occurrence of ESLs. However, non-stationarity poses some challenges to the development of solid estimates of such return levels.

The results of extreme-value theory are valid under the assumptions of independence and stationarity of extremes (Khaliq et al., 2006). Here, stationarity means that all the realizations of the extremes in the data record are generated from the same distribution (Coles et al., 2001). While independence is satisfied with a proper selection of extremes from the dataset, stationarity is often assumed but not verified (Khaliq et al., 2006). However, several sources of non-stationarity can affect sea level data: changes in coastal morphology, low-frequency climatic variability, and climate change (Salas and Obeysekera, 2014). The estimation of return levels from stationary models might not be appropriate (e.g., less conservative) because of the implicit assumptions that the characteristics of the extremes remain the same in the future (Caruso and Marani, 2022; Razmi et al., 2017; Dixon and Tawn, 1999; Salas and Obeysekera, 2014; Haigh et al., 2010; Ragno et al., 2019). Two approaches are commonly used to cope with non-stationarity. Detrending the sea level data with annual or long-term mean sea levels is a common practice to remove long-term signals in the mean of the dataset (Bernier et al., 2007; Tebaldi et al., 2012; Mentaschi et al., 2016). Alternatively, the parameters of the probability distribution function that generates the extremes can be explicitly modeled as dependent on some covariates (Méndez et al., 2007; Grinsted et al., 2013; Cid et al., 2016; Sweet and Park, 2014; Razmi et al., 2017). However, clear indications of which approach better suits non-stationary conditions are still missing.

The choice of the proper method to conduct the extreme-sea-level analysis is also a challenge. Several methods exist, both direct, based on fitting theoretical probability distribution functions (PDFs) to the data, and non-direct, relying on mixtures of empirical and parametric PDFs. Extreme values in the data can be defined as either maxima over uniform blocks of data or values that exceed a defined threshold (Coles et al., 2001). Several theoretical PDFs were derived accordingly: the generalized extreme-value distributions (Mudersbach and Jensen, 2010), the generalized Pareto distributions (Wahl et al., 2017), and the point process (Boettle et al., 2016). Indirect methods such as the joint probability method or the revised joint probability method (Pugh and Vassie, 1978) also exist. These methods decompose the sea level in the tide and surge components. Different methods might be more or less suited (in terms of explained and residual variance; see Sect. 2.3.6) to accommodating non-stationary data and might lead to different estimates of extreme-sea-level probabilities (Wahl et al., 2017; Razmi et al., 2017). However, a comparison of the suitability of different direct and indirect methods for modeling non-stationarity is currently missing.

Using two long-term sea level time series from Venice (96 years, NE Italy), and Marseille (65 years, southern France) with different extents of non-stationarity, this paper aims at (i) compiling information on the existing direct and indirect methods for extreme-sea-level estimation, (ii) assessing which parametric method and detrending approach best accommodate non-stationary conditions, and (iii) comparing return level and return period estimates from different parametric and non-parametric methods. We perform all the analyses using three different detrending approaches.

The city of Venice and its lagoon are exposed to the risk of flooding due to extreme sea levels (Ferrarin et al., 2022). The tide regime is semi-diurnal, with a mean tidal range from 50

On the contrary, the area where the Marseille tide gauge is located has a lower tidal range (around 10

We used sea level data recorded by the tide gauge station located in Venice (gauge name: Punta della Salute) covering the period
1924–2019. Data from 2020 onwards are affected by the activation of a storm surge barrier system that prevents ESLs from propagating inside the Venice
Lagoon (MOSE) and therefore were not included in the analysis. The float-operated tide gauge is located inside a still well; measurements were
recorded mechanically until 1988 and electronically from 1989 onwards. Until 1989, semi-diurnal maxima and minima are available (four measurements
per day); then data were recorded hourly in the period 1989–1994, every half hour in 1995–2006, and every 10 min in 2007–2019. We resampled all
data recorded after 1989 to an hourly resolution with Pugh filters. We used a filter with 27 coefficients for 10

Hourly sea level data recorded at Marseille are available for the time period 1849–2017. Measurements were performed with a float-operated tide gauge until 1988, with an acoustic sensor for 1989–2008, and with a radar sensor from 2009 onwards. The measurements were recorded mechanically until 1988 and electronically from 1989 onwards (Wöppelmann et al., 2014). A total record length of 65 years (spanning 1903–2017) was used to fit the models (incomplete years were discarded).

We used two different approaches for detrending the sea level data before fitting the models: (a) we removed from each sea level observation the yearly average mean sea level (hereafter MSL detrending); (b) we removed from each sea level observation the sea level average calculated over the previous 19 years (hereafter MSL_L detrending) to remove long-term fluctuations due to interferences between lunar precession and solar activity (Valle-Levinson et al., 2021); (c) we used non-detrended data to fit the models (hereafter NDT).

Extreme events are defined as events with a low probability of occurrence (Coles et al., 2001). Given a set of independent and identically distributed
random variables

In this work, we used a block length of 1 year to extract BM to fit the generalized extreme-value (GEV) models. We selected the threshold for POT models (generalized Pareto distribution, GPD, and point process, PP) with a
two-step approach. First, data above the 99th percentile were selected, and events separated by more than a fixed time window were considered
independent and retained. We used a time window of 78

The BM distribution depends on

The POT distribution depends on

The occurrence of POT can also be modeled as a point process. Under stationary conditions, the process follows a Poisson distribution (Coles et al., 2001; Menéndez and Woodworth, 2010):

When the location and scale are not constant (e.g., a dependence on a covariate is introduced), the process rate is not constant over time and the point process is non-homogeneous (Cebrián et al., 2015). The probability of occurrence of extremes in a non-homogeneous point process is not constant over time; hence this model is appropriate for modeling extremes whose occurrence frequency is not constant in time (Coles et al., 2001).

Unlike the methods mentioned above, the joint probability method (JPM) is non-parametric. The JPM is based on the decomposition of the sea level

Two limitations of the JPM are that consecutive sea levels are assumed to be independent and that the upper tail of the empirical surge distribution
is biased by the lack of observations of extremes. As a result, the JPM cannot produce ESL estimates for sea levels higher than the combination of
the highest tide and surge (Tawn et al., 1989; Batstone et al., 2013). The revised joint probability method (RJPM) aims at improving both
issues. First, an extremal index that accounts for dependencies in the sea level data is introduced (

The tidal component of the mean sea level used in the JPM was calculated with the “oce” package (Kelley, 2018) in the R computing environment v4.1.2 (R Core Team, 2021), using the yearly detrended sea level data (MSL), 7 harmonic constants (M2, S2, N2, K2, K1, O1, P1) for Venice, and 24 harmonic constants for Marseille (MSM, MM, MSF, MF, Q1, O1, NO1, PI1, P1, S1, K1, J1, 2N2, MU2, N2, NU2, M2, L2, T2, S2, K2, MN4, M4, MS4; Wöppelmann et al., 2014). The surge was calculated as the difference between the sea level observation and the corresponding tide. We used 1990–2019 hourly data for Venice and 1968–2016 for Marseille (record length of 30 years for both stations). The same time series were used to calculate the tidal coefficients for tide estimation.

For the JPM, we used all the tide and surge data from the sea level decomposition to generate the empirical frequency distribution over classes with a 10

We used the package “extRemes” (Gilleland and Katz, 2016) to fit the parametric models (GEV, GPD, PP) based on the maximum likelihood criterion (Castillo et al., 2005; Coles et al., 2001).

Example of the effects of curves parameters on the return period estimation for a sea level

Both BM and POT approaches require the modeled random variables to follow the same parent distribution

The likelihood ratio test is employed to assess whether the inclusion of a covariate in the model formulation improved significantly the fit. Two
nested competing models

The return period is defined as

Before fitting the models, we employed a Mann–Kendall test to check if BM and POT resulting from different detrending strategies follow a temporal trend. Additionally, we used linear models and quantile regressions (75th quantile) to relate BM and POT with the mean sea level and used the significance of the regressions as indication for stationarity.

Trend in the data used to fit the models. lm: linear model; qr: quantile regression (75th data quantile). VE denotes Venice; MS denotes Marseille.

n.s.: non-significant; .:

To check if the inclusion of non-stationary covariates can improve the models (objective ii), we fitted different configurations of GEV, GPD, and PP models to the full dataset (96 years). We fitted (a) models without covariates, (b) models with the location linearly depending on the yearly mean sea level, and (c) models with the location and logarithm of the scale linearly depending on the yearly mean sea level. We used the likelihood ratio test (Eq. 8) to assess whether the inclusion of mean-sea-level-dependent parameters improved the fit significantly.

Venice data used to fit the models. See Fig. S2 for Marseille data. Plots are grouped vertically according to the detrending method (MSL: mean sea level; MSL_L: long-term mean sea level; NDT: non-detrended) and horizontally according to the maxima typology (BM: block maxima; POT: peaks over threshold). The text in the label in the top left corner of each plot shows the significance level of the Mann–Kendall trend test (n.s.: non-significant; .:

To check visually the dependence of parameters on the mean sea level, we fitted stationary GEV, GPD, and PP models (i.e., without covariates on the scale and location parameters) to BM and POT subsets using a 30-year moving time window. We can assume that data sampled in a 30-year window can be considered stationary. We tested for the presence of a trend in the fitted parameters with a Mann–Kendall test. We plotted the sequence of stationary parameters together with non-stationary ones as a mean to visually check the uncertainty related to parameter estimation.

The PP models were further validated by comparing the process rate (Eq. 4) and the empirical rate of POT exceedances (number of excesses per year) with Pearson's correlation test.

After fitting the models, we compared the estimates of the return level for different return periods (objective iii). For the non-stationary models,
we first calculated the location and scale parameters with a yearly mean sea level of

Finally, we derived the curves from non-detrended, non-stationary models under different covariates values. For Venice, we used

Likelihood ratio test results. The column test type describes which model configurations were compared; nc-l: no covariates compared with covariates on location; l-sl: covariates on location compared with covariates on both location and scale; nc-s: no covariates compared with covariates on scale. VE denotes Venice; MS denotes Marseille.

n.s.: non-significant; .:

Regarding the data used to fit the models, the Mann–Kendall tests detected a significant trend for the non-detrended BM in Venice, a marginally significant trend for the detrended BM, and no trend for POT (Fig. 2). No trends were recorded for BM or POT in Marseille (Fig. S2). We found evidence for a dependence of the median BM on the mean sea level for both detrended and non-detrended data in Venice, while little support for a trend was recorded in Marseille. In Venice, the median POT and the upper POT quantile were significantly dependent on the mean sea level only for the MSL_L detrending method (Table 1).

After fitting the models, the likelihood ratio test for Venice data shows that the inclusion of the covariate (mean sea level) improves the fit
significantly for the location (

Models validation for Venice showed that the location parameter dependent on the covariate well reproduces the temporal trends of the corresponding
stationary parameters obtained from the time-window analysis in the GEV and PP models (Fig.

Comparison between the parameters estimated in the time window analysis (dashed line; the grey envelope represents the uncertainty in the parameters from the time window analysis) and the parameters estimated by different model configurations over the full data record length. Here only results for Venice are reported; see Fig. S3 for Marseille. Parameters from all the configurations of the GEV, GPD, and PP models that do not include covariates are shown. Parameters from models with covariates are shown only if models improve significantly the fit (see Table 2 for the likelihood ratio test). The shape

Comparisons between the rates fitted by the point process (PP) for Venice and the empirical process rate of models with covariates on the location (model type l) and models with covariates on the location and scale (model type s).

n.s.: non-significant; .:

Additionally, the PP models estimated the occurrence rate of threshold exceedances in Venice in good agreement with those calculated from the POT data (Table 3).

Return level plot actualized to 2019 for Venice. See Fig. S4 for Marseille. Plots are grouped vertically according to the detrending method (MSL: mean sea level; MSL_L: long-term mean sea level; NDT: non-detrended) and horizontally according to the distribution function (GEV: generalized extreme value; GPD: generalized Pareto distribution; PP: point process). The dashed line is the empirical return level for the joint probability method (JPM). Curves are color-coded based on the model configuration. Note the horizontal axis is logarithmic. Return level curves for direct models with covariates are reported only if the addition of the covariate improves the fit significantly (

The return levels estimated by non-stationary models for Venice were in the range 133–146

Difference in return levels between each fitted model and a non-detrended GEV fit for different return periods. Return levels of models with covariates are shown only if the model significantly improves the fit compared to models without covariates (

Finally, we compared how the return levels for return periods of 2, 20, 100, and 200 years differ among models (Fig. 5, Table S1). Among stationary
models, the GPD yields conservative estimates for 2 years and the GEV model is more conservative for 20 and 100 years for all detrending
configurations. Among models with covariates on the location, the GEV model yields higher return level estimates. Among non-stationary models fitted to
non-detrended data, GPD models with covariates on the scale yield conservative estimates for all return periods. Estimates from GEV models with
covariates on the location and scale fitted to detrended data are more conservative for 20, 100, and 200 years. The JPM and RJPM yield projections that are in agreement with parametric models. Return levels from models without covariates fitted to non-detrended data were consistently less conservative for all return periods and both Venice and Marseille. The highest differences between detrended, non-detrended, and stationary models were for short return periods. Among all the analyzed methods, in Venice the GEV model with covariates on the location, the JPM, and the RJPM yield the most conservative estimates of return levels for longer return times (

The direct methods show varying uncertainty in the prediction of return levels (Fig. 5). In both Venice and Marseille, the GEV model with covariates on the
location has the highest uncertainty (12

Return level plots for different values of mean sea level. Mean sea level is expressed with respect to the local reference. The current mean sea level is

Extrapolations of non-detrended, non-stationary models for the future showed that estimates of future ESLs are strongly influenced by the future mean
sea level (Fig. 6). Events that currently have a return level above 200 years will already have return levels

Our results show that most of the fitted ESL models benefit from the inclusion of covariates on either the location or the scale parameters when using non-detrended data. The highest improvements in the fit occurred for the Venice data that have a higher non-stationarity than Marseille. We used only the yearly averaged mean sea level as covariates to build simple models, but other predictors can be used, depending on the objective of the study. For instance, the North Atlantic Oscillation index, the Arctic Oscillation, and the East Atlantic–Western Russia Oscillation index can be used to include a dependence on climate (Menéndez and Woodworth, 2010). Where climatic predictors are missing, seasonality effects can be included, e.g., with harmonic dependencies on the yearly Julian day (Méndez et al., 2006). Other predictors could include global and regional meteorological parameters, which could influence storm surge intensities and frequencies (Grinsted et al., 2013). A dependence on time can also be included (Mudersbach and Jensen, 2010). However, particular care should be used in the choice of the predictors. Complex models can be useful for explaining historical patterns but might be of little utility for future projections. For instance, bias could arise due to uncertainties in predictors' future trajectories or to future predictor values being out of the ranges used to calibrate the models. In this regard, simpler models can be helpful for future projections when clear links between extremes occurrence and specific predictor classes are established (Schuwirth et al., 2019).

In this work, we used the mean sea level as a covariate because of the strong link with storm surge occurrences (Lionello et al., 2021). Our results show that the mean-sea-level-dependent location of both GEV and PP models improves the ESL fit for both Venice and Marseille. The location parameter is the first-order moment of the extremes distributions. The inclusion of a linear dependence on the mean sea level translates rigidly the distribution function towards higher (positive slope) or lower (negative slope) values without affecting the shape of the distribution. GEV and PP models also marginally improved with a dependence on the scale. The scale parameter relates to the second-order moment of the distribution (the “spreading”). A dependence of the scale parameter on the mean sea level could suggest an influence on the variability in the magnitude of storm surges. In shallow areas a higher sea level corresponds to lower dissipation of the tidal energy, yielding higher ESLs (Arns et al., 2017). In the Venice Lagoon, this factor might also be influenced by the morphological transformations that the Venice Lagoon underwent during the 20th century and that might have affected the dynamics of the tide propagation (Caruso and Marani, 2022). Different explanations for this pattern are possible. For instance, the North Atlantic Oscillation index (NAO), not included in this analysis, might act as a latent variable: negative NAO phases in the Mediterranean basin can lead to increases in monthly mean sea levels and in the number of storms (Cid et al., 2016).

Overall, this work shows how including non-stationarity in extreme-event analysis can support an improved understanding of extreme events. Including dependencies from the mean sea levels also allows for flexible forecasts of ESLs under sea level rise scenarios.

The significant covariate dependencies could also be influenced by the type of data. The BM data in Venice show clear increasing trends, which were
captured by the GEV model. BM could be extracted with different methods, such as monthly blocks, or for

While all the parametric methods improved with the inclusion of non-stationarity, the JPM and RJPM are the methods that should be least influenced by
non-stationarity, since the methodology requires detrending the data before the calculation of tide and surge histograms. However, as the residual
trend on detrended BM for Venice shows, the removal of the mean sea level might not be sufficient to make the series stationary. Thus, estimates
of the return level with the JPM might also be biased. Estimations of return levels for long return periods are not possible due to the lack of surge and
tide events that are needed to populate the extremal classes of the distribution. In our analysis, the JPM allows for estimating return periods
corresponding to levels of

Some parametric models were improved by the inclusion of covariates on the location, with a stronger influence on models fitted to non-detrended data. Particular care should be taken when detrending data prior to the model fit as this action implicitly assumes that the mean sea level is mainly responsible for data non-stationarity, and higher-order interactions are neglected. In shallow areas this could not be the case (Arns et al., 2017). Thus, inclusion of covariates on the model's parameters could perform better than detrending in such cases. With our data, the use of the mean sea level as a covariate results in a rigid translation towards higher return levels for GEV and PP plots due to the significance of the location dependence. The effect on the GPD is also a change in slope due to the significance of the scale parameter. However, data from different gauging stations might show different behaviors. For instance, sites where the sea level variability increases with mean sea level might also show a significant dependence on the scale parameter in the GEV and PP models.

Our results show that the models that have lower uncertainty and the models that yield the most conservative estimates of return levels are different between Venice and Marseille. Even though we highlighted that the difference between the two datasets is the extent of non-stationarity, other factors can affect the selection of a good model: the relative importance of tide and surge (Dixon and Tawn, 1999), the tidal regime, the location of the tide gauge, the record length, and the presence of outliers (Haigh et al., 2010). A similar analysis on a dataset covering a wider range of sites would allow us to consistently link the best-performing methods to the characteristics of the sea level data.

In this paper, we fitted different extreme-value models to long-term sea level data for Venice and Marseille. We show that including non-stationarity in the analysis of extreme events improves the fit of most of the models. Among direct methods, for return periods longer than 20 years, the point process with a dependence of the location on the mean sea level is the most conservative in Venice. The generalized extreme-value distribution with a dependence of the location on the mean sea level is the most conservative in Marseille. Among indirect methods, the revised joint probability method yields results that are comparable with the most conservative methods for return periods longer than 100 years for both Venice and Marseille. Among direct methods, the generalized extreme-value distribution fitted to detrended data has the lowest uncertainty for return level estimation in Venice. The point process with a location dependence has the lowest uncertainty for return level estimation in Marseille for return periods longer than 20 years. Overall, we show that non-stationary extremes analyses can provide more robust estimates of return levels to be used in coastal protection planning.

Sea level data for Venice were retrieved from the Italian Institute for Environmental Protection and Research (

The supplement related to this article is available online at:

SM, DB, and AB designed the study. DB and SM implemented the models. FC and EC collected the data. DB led the writing of the manuscript with inputs from all co-authors.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the EU Interreg project AdriaClim (grant number IT-HR 10252001).

This research has been supported by the Interreg project AdriaClim (grant no. IT-HR 10252001).

This paper was edited by Philip Ward and reviewed by two anonymous referees.