Importance of non-stationary analysis for assessing extreme sea levels under sea level rise

. Increased coastal flooding caused by extreme sea levels (ESLs) is one of the major hazards related to sea level rise. Estimates of return levels obtained under the framework provided by extreme events theory might be biased under climatic non-stationarity. Additional uncertainty is related to the choice of the model. In this work, we fit several extreme values models to two long-term sea level record from Venice (96 years) and Marseille (65 years): a Generalized Extreme Value distribution (GEV), a Generalized Pareto Distribution (GPD), a Point Process (PP), the Joint Probability Method 10 (JPM), and the Revised Joint Probability Method (RJPM) under different detrending strategies. We model non-stationarity with a linear dependence of the model’s parameters from the mean sea level. Our results show that non-stationary GEV and PP models fit the data better than stationary models. The non-stationary PP model is also able to reproduce the rate of extremes occurrence fairly well. Estimates of the return levels for non-stationary and detrended models are consistently more conservative than estimates from stationary, non-detrended models. Different models were selected as being more 15 conservative or having lower uncertainties for the two datasets. Even though the best model is case-specific, we show that non-stationary extremes analyses can provide more robust estimates of return levels to be used in coastal protection planning.


Introduction
Coastal zones are extremely vulnerable to extreme sea levels (ESLs; Kron, 2013).Exposure to coastal flooding damage is projected to increase in the future (Jongman et al., 2012) due to the higher frequency, magnitude, and duration of extreme sea levels (Tebaldi et al., 2021;Devlin et al., 2021).The mean sea level rise is among the causes of this increase (Menéndez and Woodworth, 2010;Marcos et al., 2009).The design of structures to protect coasts from flooding (minimizing, for example, damage to infrastructures and coastal erosion) relies on the knowledge of ESLs that are likely to occur with a given probability (Boettle et al., 2016).Extreme-event theory provides a theoretical background to fit historical extremes with specific probability distribution functions (Coles et al., 2001) and is widely used for estimating the probability of occurrence of ESLs.However, non-stationarity poses some challenges to the development of solid estimates of such return levels.
The results of extreme-value theory are valid under the assumptions of independence and stationarity of extremes (Khaliq et al., 2006).Here, stationarity means that all the realizations of the extremes in the data record are generated from the same distribution (Coles et al., 2001).While independence is satisfied with a proper selection of extremes from the dataset, stationarity is often assumed but not verified (Khaliq et al., 2006).However, several sources of nonstationarity can affect sea level data: changes in coastal morphology, low-frequency climatic variability, and climate change (Salas and Obeysekera, 2014).The estimation of return levels from stationary models might not be appropriate (e.g., less conservative) because of the implicit assumptions that the characteristics of the extremes remain the same in the future (Caruso and Marani, 2022;Razmi et al., 2017;Dixon and Tawn, 1999;Salas and Obeysekera, 2014;Haigh et al., 2010;Ragno et al., 2019).Two approaches are commonly used to cope with non-stationarity.Detrending the sea level data with annual or long-term mean sea levels is a common practice to remove long-term signals in the mean of the D. Baldan et al.: Importance of non-stationary analysis for assessing extreme sea levels under sea level rise dataset (Bernier et al., 2007;Tebaldi et al., 2012;Mentaschi et al., 2016).Alternatively, the parameters of the probability distribution function that generates the extremes can be explicitly modeled as dependent on some covariates (Méndez et al., 2007;Grinsted et al., 2013;Cid et al., 2016;Sweet and Park, 2014;Razmi et al., 2017).However, clear indications of which approach better suits non-stationary conditions are still missing.
The choice of the proper method to conduct the extremesea-level analysis is also a challenge.Several methods exist, both direct, based on fitting theoretical probability distribution functions (PDFs) to the data, and non-direct, relying on mixtures of empirical and parametric PDFs.Extreme values in the data can be defined as either maxima over uniform blocks of data or values that exceed a defined threshold (Coles et al., 2001).Several theoretical PDFs were derived accordingly: the generalized extreme-value distributions (Mudersbach and Jensen, 2010), the generalized Pareto distributions (Wahl et al., 2017), and the point process (Boettle et al., 2016).Indirect methods such as the joint probability method or the revised joint probability method (Pugh and Vassie, 1978) also exist.These methods decompose the sea level in the tide and surge components.Different methods might be more or less suited (in terms of explained and residual variance; see Sect.2.3.6) to accommodating non-stationary data and might lead to different estimates of extreme-sea-level probabilities (Wahl et al., 2017;Razmi et al., 2017).However, a comparison of the suitability of different direct and indirect methods for modeling nonstationarity is currently missing.
Using two long-term sea level time series from Venice (96 years, NE Italy), and Marseille (65 years, southern France) with different extents of non-stationarity, this paper aims at (i) compiling information on the existing direct and indirect methods for extreme-sea-level estimation, (ii) assessing which parametric method and detrending approach best accommodate non-stationary conditions, and (iii) comparing return level and return period estimates from different parametric and non-parametric methods.We perform all the analyses using three different detrending approaches.

Tide gauge locations
The city of Venice and its lagoon are exposed to the risk of flooding due to extreme sea levels (Ferrarin et al., 2022).The tide regime is semi-diurnal, with a mean tidal range from 50 cm during neap tides to 100 cm during spring tides (Umgiesser et al., 2021): such values are among the highest ranges measured in the Mediterranean Sea.Compared to other sites in the northern Adriatic Sea, the Venice Lagoon has experienced higher sea level rise due to the combined effects of subsidence and eustatism (+2.5 mm yr −1 in the last 150 years; De Biasio et al., 2020;Zanchettin et al., 2021).The current long-term mean sea level is about 30 cm above the local 1897 reference (named Zero Mareografico di Punta della Salute, ZMPS: average sea level for the period 1885-1909 measured at the Punta della Salute gauging station).As a result, an increase in the frequency and magnitude of ESLs causing flooding of the city of Venice has been recorded (Umgiesser et al., 2021).The events with the highest recorded sea levels occurred on 4 November 1966 (+194 cm) and 12 November 2019 (+189 cm; Lionello et al., 2021).
On the contrary, the area where the Marseille tide gauge is located has a lower tidal range (around 10 cm, Fig. S1 in the Supplement) and is located against a stable geological background, with a relative sea level rise of +1.1 mm yr −1 in the last 150 years (Wöppelmann et al., 2014;Letetrel et al., 2010).Marseille data are referred to the nautical chart datum (NCD, zéro hydrographique), which is the sea level corresponding to the lowest tide and is 32.9 cm below the national datum (IGN 1969, average sea level for the period 1885-1897 measured at Marseille gauging station; Wöppelmann et al., 2014).The long-term mean sea level was 35 cm above the NCD in 1903 and 50 cm above the NCD in 2017.

Tide gauge data
We used sea level data recorded by the tide gauge station located in Venice (gauge name: Punta della Salute) covering the period 1924-2019.Data from 2020 onwards are affected by the activation of a storm surge barrier system that prevents ESLs from propagating inside the Venice Lagoon (MOSE) and therefore were not included in the analysis.The floatoperated tide gauge is located inside a still well; measurements were recorded mechanically until 1988 and electronically from 1989 onwards.Until 1989, semi-diurnal maxima and minima are available (four measurements per day); then data were recorded hourly in the period 1989-1994, every half hour in 1995-2006, and every 10 min in 2007-2019.We resampled all data recorded after 1989 to an hourly resolution with Pugh filters.We used a filter with 27 coefficients for 10 min data, a filter with 18 coefficients for 15 min data, and a filter with 12 coefficients for 30 min data (Pugh, 1987).The data have no gaps; a total record length of 96 years was used to fit the models.To calculate the long-term mean sea level before 1924, we used yearly mean sea level data from other tide gauge stations active in the city of Venice (and thus affected by the same subsidence rate as Punta della Salute) whose records cover the period 1885-1922 (namely Campo Santo Stefano, Arsenale, and Punta della Salute -Canal Grande; for details see Zanchettin et al., 2021).
Hourly sea level data recorded at Marseille are available for the time period 1849-2017.Measurements were performed with a float-operated tide gauge until 1988, with an acoustic sensor for 1989-2008, and with a radar sensor from 2009 onwards.The measurements were recorded me-chanically until 1988 and electronically from 1989 onwards (Wöppelmann et al., 2014).A total record length of 65 years (spanning 1903-2017) was used to fit the models (incomplete years were discarded).

Data detrending approach
We used two different approaches for detrending the sea level data before fitting the models: (a) we removed from each sea level observation the yearly average mean sea level (hereafter MSL detrending); (b) we removed from each sea level observation the sea level average calculated over the previous 19 years (hereafter MSL_L detrending) to remove longterm fluctuations due to interferences between lunar precession and solar activity (Valle-Levinson et al., 2021); (c) we used non-detrended data to fit the models (hereafter NDT).

Extreme-value distributions
Extreme events are defined as events with a low probability of occurrence (Coles et al., 2001).Given a set of independent and identically distributed random variables X 1 , . .., X n , with parent distribution F , a probability distribution function describing the occurrence probability of extreme values can be derived with two approaches.The block maxima (BM) approach considers the distribution of the maxima of the set X 1 , . .., X n over blocks of length n, M n = max{X 1 , . .., X n }, and assesses Pr(M n < z), i.e., the probability that the random variable M n is greater than z.The use of sufficiently large blocks ensures that the maxima are independent (Méndez et al., 2007).The peaks-over-threshold (POT) approach assesses Pr(X > u + y|X > u), i.e., the probability that the random variable X exceeds a sufficiently high threshold u by the value y.When fitting POT, an appropriate threshold needs to be selected to properly model excesses as extremes (Zhang et al., 2000).
In this work, we used a block length of 1 year to extract BM to fit the generalized extreme-value (GEV) models.We selected the threshold for POT models (generalized Pareto distribution, GPD, and point process, PP) with a twostep approach.First, data above the 99th percentile were selected, and events separated by more than a fixed time window were considered independent and retained.We used a time window of 78 h for both Venice and Marseille, consistent with the value used in other studies in the Mediterranean (Marcos et al., 2009).This value also corresponds to the average decay time of seiches, the lowest-frequency sub-seasonal oscillation in the northern Adriatic Sea (Raicich et al., 1999).Second, we fitted multiple POT models based on different thresholds, and we selected the lowest value that ensures the stability of the GPD and PP parameters (see Sect. 2.3.2).This procedure ensures that the threshold excesses can be properly modeled as extremes, and Eq. ( 2) holds (Coles et al., 2001).For the Venice data, thresholds of 100 and 80 cm are appropriate to select POT for non-detrended and detrended data, respectively, yielding 319 POT for NDT, 284 for MSL, and 359 for MSL_L.For the Marseille data, thresholds of 85 and 40 cm are appropriate to select POT for non-detrended and detrended data, respectively, yielding 203 POT for NDT, 207 for MSL, and 223 for MSL_L.

Generalized extreme-value distribution
The BM distribution depends on F , the parent distribution of the random variables in each block via G(z) = Pr(M n < z) = F n (z), converging to the generalized extreme-value (GEV) distribution when n is large enough (Coles et al., 2001): where a + = max(a, 0); µ is the location parameter (proportional to the first-order moment of the distribution); σ is the scale parameter (always positive, proportional to the secondorder moment of the distribution); and ξ is the shape parameter that determines the type of distribution function: the heavy-tailed Fréchet (ξ > 0), the upper-bounded Weibull (ξ < 0), and the limit-case Gumbel (ξ → 0).

Generalized Pareto distribution
The POT distribution depends on F , the parent distribution of the random variables via , with y = z − u, converging to the generalized Pareto distribution (GPD) when the threshold is large enough (Coles et al., 2001): where u is the threshold; σ u is the GPD scale parameter dependent on the threshold; and ξ the shape parameter that determines the type of the distribution function: heavy-tailed Pareto (ξ > 0), upper-bounded beta (ξ < 0), and the exponential as the limit case (ξ → 0).When BM are GEV-distributed, POT is theoretically expected to follow a GPD with the same shape parameter and scale depending on the GEV parameters (Gilleland and Katz, 2016).This property can drive the selection of an appropriate threshold u.First, multiple GPDs are fitted to different sets of data obtained varying the threshold.Then, the parameters are plotted as a function of the threshold.For sufficiently high thresholds, the theoretical approximation yields and the parameters are independent of the threshold value.The minimum threshold that meets this requirement is then selected (Coles et al., 2001).

Point process approach
The occurrence of POT can also be modeled as a point process.Under stationary conditions, the process follows a Poisson distribution (Coles et al., 2001;Menéndez and Woodworth, 2010): where λ is the rate of the process (number of events over a reference time period).The process rate depends on the GEV parameters (Gilleland and Katz, 2016;Boettle et al., 2016;Cid et al., 2016): When the location and scale are not constant (e.g., a dependence on a covariate is introduced), the process rate is not constant over time and the point process is non-homogeneous (Cebrián et al., 2015).The probability of occurrence of extremes in a non-homogeneous point process is not constant over time; hence this model is appropriate for modeling extremes whose occurrence frequency is not constant in time (Coles et al., 2001).

Joint probability and revised joint probability methods
Unlike the methods mentioned above, the joint probability method (JPM) is non-parametric.The JPM is based on the decomposition of the sea level z in the tide (x) and surge (y) components (Pugh and Vassie, 1978).The probability distribution of the sea level P (z) results from the convolution of the distributions of the tide and the surge: where z = x + y, P T (x) is the distribution of the tide, and P S (y) is the distribution of the surge (both obtained from hourly records).The tide and the surge interaction are significant only in shallow waters (Prandle and Wolf, 1978) and can be considered independent in most practical applications (Pugh and Vassie, 1978).Two limitations of the JPM are that consecutive sea levels are assumed to be independent and that the upper tail of the empirical surge distribution is biased by the lack of observations of extremes.As a result, the JPM cannot produce ESL estimates for sea levels higher than the combination of the highest tide and surge (Tawn et al., 1989;Batstone et al., 2013).The revised joint probability method (RJPM) aims at improving both issues.First, an extremal index that accounts for dependencies in the sea level data is introduced (θ −1 , units: hours).The extremal index is used as a correction factor in the return period calculation based on P (z) (see Sect. 2.3.7) and is defined as the average number of measurements an extreme-sea-level cluster is usually composed of (Tawn et al., 1989).Second, the RJPM fits the surge distribution with an extreme-value distribution to smooth the empirical distribution, for projections beyond the highest measured surge (Tawn et al., 1989).Both GEV and GPD approaches have been used to this end (Batstone et al., 2013;Enríquez et al., 2022;Baranes et al., 2020).
For the JPM, we used all the tide and surge data from the sea level decomposition to generate the empirical frequency distribution over classes with a 10 cm width.The maximum theoretical sea level (sum of maximum tide and maximum surge) falls within the highest class.Then, we calculated the discrete convolution between the two histograms (see Table 1 in Pugh and Vassie, 1978).For the RJPM, we fitted a Gumbel distribution function to the annual maxima of the surge (following Tawn et al., 1989).Then we defined surge classes of 10 cm width and calculated the probability of the surge extremes falling into each class as the integral of the fitted Gumbel distribution calculated over each class.After the convolved distribution was calculated, we used the probability of the sea level falling into each sea level class to calculate the return periods (Pugh and Vassie, 1978;Marcos et al., 2009).We then corrected the estimated return levels with the extremal indices.We found extremal indices of 5.5 and 13 h to be appropriate for Venice and Marseille, respectively.

Model fitting
We used the package "extRemes" (Gilleland and Katz, 2016) to fit the parametric models (GEV, GPD, PP) based on the maximum likelihood criterion (Castillo et al., 2005;Coles et al., 2001).

Stationarity and parameter dependence
Both BM and POT approaches require the modeled random variables to follow the same parent distribution F .Nonstationary conditions can be modeled by including covariates in the GEV, GPD, and PP parameters (Méndez et al., 2007).For instance, a linear dependence of location (µ) and scale (σ ) parameters can be assumed from the covariate c and can be expressed as follows (Coles et al., 2001): where the logarithm on the scale parameter in Eq. ( 7) is used to constrain the scale parameter to positive values.

Comparing different model configurations
The likelihood ratio test is employed to assess whether the inclusion of a covariate in the model formulation improved significantly the fit.Two nested competing models M 0 ⊂ M 1 can be compared using the deviance statistic (Coles et al., 2001).For example, M 1 can be a model whose parameters depend on covariates, while M 0 can be a model whose parameters do not depend on covariates.The deviance is expressed as where l 1 (M 1 ) and l 0 (M 0 ) are the maximized log likelihoods of models M 1 and M 0 , respectively.The model M 0 has by definition a lower complexity than M 1 , which is the case when covariates on the model's parameters are added to M 1 .High deviance values support the hypothesis of M 1 explaining a larger variation in the data than M 0 (likelihood ratio test).The hypothesis is rejected when D > c α , where c α is the (1 − α) quantile of a χ 2 k distribution, in which k is the difference in dimensionality between M 1 and M 0 (i.e., the difference in the number of parameters).

Return level estimation
The return period is defined as Tr(z) = [1 − G(z)] −1 , where G is the probability distribution function for the GEV, GPD, or PP models (Caruso and Marani, 2022) or the empirical sea level probability from the JPM and RJPM (Tawn et al., 1989).In practice, the extreme levels of the random variable are calculated as a function of the return period via the PDF quantiles (Coles et al., 2001).In a non-stationary analysis, the model's PDF is not constant in time (Fig. 1), and the quantiles are not uniquely determined.To allow for the comparison of estimated return levels from non-stationary models, in this work we first fixed the covariates values and then calculated the quantiles of the resulting probability distribution function.For the JPM and the RJPM we included the extremal index as a correction factor in the estimation of the return period Tr(z

Data analysis
Before fitting the models, we employed a Mann-Kendall test to check if BM and POT resulting from different detrending strategies follow a temporal trend.Additionally, we used linear models and quantile regressions (75th quantile) to relate BM and POT with the mean sea level and used the significance of the regressions as indication for stationarity.
To check if the inclusion of non-stationary covariates can improve the models (objective ii), we fitted different configurations of GEV, GPD, and PP models to the full dataset (96 years).We fitted (a) models without covariates, (b) models with the location linearly depending on the yearly mean sea level, and (c) models with the location and logarithm of the scale linearly depending on the yearly mean sea level.We used the likelihood ratio test (Eq.8) to assess whether the inclusion of mean-sea-level-dependent parameters improved the fit significantly.
To check visually the dependence of parameters on the mean sea level, we fitted stationary GEV, GPD, and PP models (i.e., without covariates on the scale and location parameters) to BM and POT subsets using a 30-year moving time window.We can assume that data sampled in a 30-year window can be considered stationary.We tested for the presence https://doi.org/10.5194/nhess-22-3663-2022 Nat. Hazards Earth Syst.Sci., 22, 3663-3677, 2022  of a trend in the fitted parameters with a Mann-Kendall test.
We plotted the sequence of stationary parameters together with non-stationary ones as a mean to visually check the uncertainty related to parameter estimation.The PP models were further validated by comparing the process rate (Eq.4) and the empirical rate of POT exceedances (number of excesses per year) with Pearson's correlation test.
After fitting the models, we compared the estimates of the return level for different return periods (objective iii).For the non-stationary models, we first calculated the location and scale parameters with a yearly mean sea level of +35 cm in Venice and +52.4 cm in Marseille (equal to the 2000-2019 long-term mean sea level for the two stations).Once the model's parameters were fixed, we calculated the sea levels corresponding to return periods of 2, 20, 100, and 200 years.Estimates of return levels from models fitted to detrended data were added back to the long-term mean sea level.This additive procedure is simplified and neglects the non-linear interactions between the future mean sea level and the occurrence of extremes (Arns et al., 2015(Arns et al., , 2017)).
Finally, we derived the curves from non-detrended, nonstationary models under different covariates values.For Venice, we used +0 cm, +25 cm (annual mean sea level in 1966, the year of the largest ESL on record), +35 cm (annual mean sea level for 2019, the last year used in the analysis), and +51 cm (expected annual mean sea level in 2050 under IPCC scenario SSP2-4.5;Garner et al., 2021;Masson-Delmotte et al., 2021).For Marseille, we used +54 cm (annual mean sea level for 2019) and +71 cm (expected annual mean sea level in 2050 under IPCC scenario SSP2-4.5).

Results
Regarding the data used to fit the models, the Mann-Kendall tests detected a significant trend for the non-detrended BM in Venice, a marginally significant trend for the detrended BM, and no trend for POT (Fig. 2).No trends were recorded for BM or POT in Marseille (Fig. S2).We found evidence for a dependence of the median BM on the mean sea level for both detrended and non-detrended data in Venice, while little support for a trend was recorded in Marseille.In Venice, the median POT and the upper POT quantile were significantly dependent on the mean sea level only for the MSL_L detrending method (Table 1).
After fitting the models, the likelihood ratio test for Venice data shows that the inclusion of the covariate (mean sea level) improves the fit significantly for the location (µ) parameter of both the GEV and the PP models for NDT and MSL_L data and only for the GEV model for MSL data (Table 2).The addition of a dependence on the scale (σ ) parameter was marginally significant for the GPD for NDT and MSL_L data.The inclusion of the dependence on the scale on the PP improved the fit only for MSL_L data (Table 2).In Marseille, the inclusion of the covariate improved the fit for the location of the GEV and PP models for NDT and for the location of the PP model for MSL_L data.
Models validation for Venice showed that the location parameter dependent on the covariate well reproduces the temporal trends of the corresponding stationary parameters obtained from the time-window analysis in the GEV and PP models (Fig. 3).Smaller improvements are observed for Marseille, where most of stationary models well reproduce the parameters trends (Fig. S3).
Additionally, the PP models estimated the occurrence rate of threshold exceedances in Venice in good agreement with those calculated from the POT data (Table 3).
The return levels estimated by non-stationary models for Venice were in the range 133-146 cm for a return period of 2 years, 169-184 cm for 20 years, 192-203 cm for 100 years, and 198-218 cm for 200 years (Fig. 4, Table S1 in the Supplement).Estimates of 100-year return levels for non-detrended models with covariates were in the range 169-181 cm, while for detrended models without covariates they were higher (186-187 cm).Models that include covariates on the location showed an increased extreme estimate for smaller return periods (< 10 years for GEV and < 3 years for PP, Figs. 4 and S4), with higher discrepancies for nondetrended data.The return levels estimated by non-stationary models for Marseille were in the range 102-114 cm for a return period of 2 years, 128-141 cm for 20 years, 139-153 cm for 100 years, and 144-158 cm for 200 years (Table S2).Estimates of 100-year return levels for non-detrended models with covariates were in the range 143-147 cm, while for detrended models without covariates they were higher (156-153 cm).
Finally, we compared how the return levels for return periods of 2, 20, 100, and 200 years differ among models (Fig. 5, Table S1).Among stationary models, the GPD yields conservative estimates for 2 years and the GEV model is more conservative for 20 and 100 years for all detrending configurations.Among models with covariates on the location, the GEV model yields higher return level estimates.Among nonstationary models fitted to non-detrended data, GPD models with covariates on the scale yield conservative estimates for all return periods.Estimates from GEV models with covariates on the location and scale fitted to detrended data are more conservative for 20, 100, and 200 years.The JPM and RJPM yield projections that are in agreement with parametric models.Return levels from models without covariates fitted to non-detrended data were consistently less conservative for all return periods and both Venice and Marseille.The highest differences between detrended, non-detrended, and stationary models were for short return periods.Among all the analyzed methods, in Venice the GEV model with covariates on the location, the JPM, and the RJPM yield the most conservative estimates of return levels for longer return times (> 50 years), while for return times of 2 and 20 years the RJPM is less conservative than other methods.A similar behavior is observed in Marseille for the RJPM.Differently, in Marseille the JPM provides a less conservative return level for all return times.A consistent behavior was observed when stationary models fitted to data covering 30 years were compared with the JPM and RJPM (Fig. S5).
The direct methods show varying uncertainty in the prediction of return levels (Fig. 5).In both Venice and Marseille, the GEV model with covariates on the location has the highest uncertainty (12 cm in Venice and 15 cm in Marseille) for the 2-year return period, and the PP without covariates has the lowest uncertainty (7 cm both in Venice and Marseille).
In Venice, the PP with covariates on the location has a lower uncertainty for return levels of 20, 100, and 200 years (15, 20, and 25 cm, respectively).Non-detrended models without covariates and detrended models have similar uncertainty (slightly lower for GEV).In Marseille, the GEV model fitted to non-detrended data has the lowest uncertainty for return levels of 20, 100, and 200 years (13, 23, and 27 cm, respectively).
Extrapolations of non-detrended, non-stationary models for the future showed that estimates of future ESLs are strongly influenced by the future mean sea level (Fig. 6).Events that currently have a return level above 200 years will already have return levels < 30 years (for GEV and GPD) and < 50 years (PP) in 2050 for Venice.For Marseille, events that currently have a return level above 200 years will already have return levels < 30 years (for GEV and GPD) and < 100 years (PP) in 2050.

Including non-stationarity in extreme-event modeling
Our results show that most of the fitted ESL models benefit from the inclusion of covariates on either the location or the scale parameters when using non-detrended data.The highest improvements in the fit occurred for the Venice data that have a higher non-stationarity than Marseille.We used only the yearly averaged mean sea level as covariates to build simple models, but other predictors can be used, depending on the objective of the study.For instance, the North Atlantic Oscillation index, the Arctic Oscillation, and the East Atlantic-Western Russia Oscillation index can be used to include a dependence on climate (Menéndez and Woodworth, 2010).Where climatic predictors are missing, seasonality effects can be included, e.g., with harmonic dependencies on the yearly Julian day (Méndez et al., 2006).Other predictors could include global and regional meteorological parameters, which could influence storm surge intensities and frequencies (Grinsted et al., 2013).A dependence on time can also be included (Mudersbach and Jensen, 2010).However, particu-lar care should be used in the choice of the predictors.Complex models can be useful for explaining historical patterns but might be of little utility for future projections.For instance, bias could arise due to uncertainties in predictors' future trajectories or to future predictor values being out of the ranges used to calibrate the models.In this regard, simpler models can be helpful for future projections when clear links between extremes occurrence and specific predictor classes are established (Schuwirth et al., 2019).
In this work, we used the mean sea level as a covariate because of the strong link with storm surge occurrences (Lionello et al., 2021).Our results show that the mean-sea- Return level curves for direct models with covariates are reported only if the addition of the covariate improves the fit significantly (p < 0.01; see Table 2).level-dependent location of both GEV and PP models improves the ESL fit for both Venice and Marseille.The location parameter is the first-order moment of the extremes distributions.The inclusion of a linear dependence on the mean sea level translates rigidly the distribution function towards higher (positive slope) or lower (negative slope) values without affecting the shape of the distribution.GEV and PP models also marginally improved with a dependence on the scale.The scale parameter relates to the second-order moment of the distribution (the "spreading").A dependence of the scale parameter on the mean sea level could suggest an influence on the variability in the magnitude of storm surges.In shallow areas a higher sea level corresponds to lower dissipation of the tidal energy, yielding higher ESLs (Arns et al., 2017).In the Venice Lagoon, this factor might also be influenced by the morphological transformations that the Venice Lagoon underwent during the 20th century and that might have affected the dynamics of the tide propagation (Caruso and Marani, 2022).Different explanations for this pattern are possible.For instance, the North Atlantic Oscillation index (NAO), not included in this analysis, might act as a latent variable: negative NAO phases in the Mediterranean basin can lead to increases in monthly mean sea levels and in the number of storms (Cid et al., 2016).
Overall, this work shows how including non-stationarity in extreme-event analysis can support an improved understanding of extreme events.Including dependencies from the mean sea levels also allows for flexible forecasts of ESLs under sea level rise scenarios.

Comparison of the models
The significant covariate dependencies could also be influenced by the type of data.The BM data in Venice show clear increasing trends, which were captured by the GEV model.BM could be extracted with different methods, such as monthly blocks, or for r-largest yearly values (where r is an integer number, e.g., 5 or 10).A global analysis (Wahl et al., 2017) showed that the annual maxima comprise the more conservative method (i.e., yield higher return period estimates).However, this aspect should be checked as part of a sensitivity analysis from case to case.POT data do not have a trend in the mean or in the higher quantiles in either Venice or Marseille and thus should yield models that are less affected by non-stationarity.However, a trend in the frequency of occurrences of POT (Ferrarin et al., 2022) was observed in Venice, which might invalidate the homogeneity assumptions of GDP and PP models.The non-homogeneity of the POT distribution can be mitigated by introducing a dependence of the threshold from a covariate (Roth et al., 2012).However, using a non-constant threshold introduces a significant uncertainty that might result in biased estimates of the return levels (Agilan et al., 2021).On the contrary, the PP explicitly models the rate of threshold exceedances: the detected significant dependence of location on the mean sea level implies a process with a non-constant occurrence rate (i.e., a non-homogeneous process, Eq. 6; Cid et al., 2016).The ability of the PP models to predict the changes in the ESL occurrence frequency with sea level rise is of particular relevance in Venice, where a system of movable gates was recently built to disconnect the lagoon from the sea and prevent the flooding of the city during ESLs (Umgiesser, 2020).
While all the parametric methods improved with the inclusion of non-stationarity, the JPM and RJPM are the methods that should be least influenced by non-stationarity, since the methodology requires detrending the data before the calculation of tide and surge histograms.However, as the residual trend on detrended BM for Venice shows, the removal of the mean sea level might not be sufficient to make the series stationary.Thus, estimates of the return level with the JPM might also be biased.Estimations of return levels for long return periods are not possible due to the lack of surge and tide events that are needed to populate the extremal classes https://doi.org/10.5194/nhess-22-3663-2022 Nat. Hazards Earth Syst.Sci., 22, 3663-3677, 2022 of the distribution.In our analysis, the JPM allows for estimating return periods corresponding to levels of +233 cm in Venice (corresponding to the sum of the maximum recorded tide, +57 cm; the maximum recorded surge, +141 cm; and the current mean sea level, +35 cm) and +163 cm in Marseille (tide: +20 cm; surge: +89 cm; current mean sea level: +54 cm), but for the shortest series, this limitation might be stronger.Another limitation of the JPM and RJPM as implemented in this paper is the lack of information on uncertainty.The development of multivariate ESL distributions could solve this issue (Ferrarin et al., 2022), but applications that include non-stationarity are still limited.Some parametric models were improved by the inclusion of covariates on the location, with a stronger influence on models fitted to non-detrended data.Particular care should be taken when detrending data prior to the model fit as this action implicitly assumes that the mean sea level is mainly responsible for data non-stationarity, and higher-order interactions are neglected.In shallow areas this could not be the case (Arns et al., 2017).Thus, inclusion of covariates on the model's parameters could perform better than detrending in such cases.With our data, the use of the mean sea level as a covariate results in a rigid translation towards higher return levels for GEV and PP plots due to the significance of the location dependence.The effect on the GPD is also a change in slope due to the significance of the scale parameter.However, data from different gauging stations might show different behaviors.For instance, sites where the sea level variability increases with mean sea level might also show a significant dependence on the scale parameter in the GEV and PP models.
Our results show that the models that have lower uncertainty and the models that yield the most conservative estimates of return levels are different between Venice and Marseille.Even though we highlighted that the difference between the two datasets is the extent of non-stationarity, other factors can affect the selection of a good model: the relative importance of tide and surge (Dixon and Tawn, 1999), the tidal regime, the location of the tide gauge, the record length, and the presence of outliers (Haigh et al., 2010).A similar analysis on a dataset covering a wider range of sites would allow us to consistently link the best-performing methods to the characteristics of the sea level data.

Conclusions
In this paper, we fitted different extreme-value models to long-term sea level data for Venice and Marseille.We show that including non-stationarity in the analysis of extreme events improves the fit of most of the models.Among direct methods, for return periods longer than 20 years, the point process with a dependence of the location on the mean sea level is the most conservative in Venice.The generalized extreme-value distribution with a dependence of the location on the mean sea level is the most conservative in Marseille.Among indirect methods, the revised joint probability method yields results that are comparable with the most conservative methods for return periods longer than 100 years for both Venice and Marseille.Among direct methods, the generalized extreme-value distribution fitted to detrended data has the lowest uncertainty for return level estimation in Venice.The point process with a location depen-dence has the lowest uncertainty for return level estimation in Marseille for return periods longer than 20 years.Overall, we show that non-stationary extremes analyses can provide more robust estimates of return levels to be used in coastal protection planning.

Figure 1 .
Figure 1.Example of the effects of curves parameters on the return period estimation for a sea level > 1.5 m.GEV curves with different location (µ) and scale (σ ) parameters corresponding to three time periods are represented.The shape (ξ ) parameter is kept constant The return period is calculated based on the highlighted area (see Sect. 2.3.8).Different locations and scales yield different return period estimates.Under non-stationary conditions, the curve's parameters change with time.

Figure 2 .
Figure 2. Venice data used to fit the models.See Fig. S2 for Marseille data.Plots are grouped vertically according to the detrending method (MSL: mean sea level; MSL_L: long-term mean sea level; NDT: non-detrended) and horizontally according to the maxima typology (BM: block maxima; POT: peaks over threshold).The text in the label in the top left corner of each plot shows the significance level of the Mann-Kendall trend test (n.s.: non-significant; .: p < 0.1; * : p < 0.05; * * : p < 0.01; * * * : p < 0.001).The continuous line represents the mean sea level value; the dashed line represents the long-term mean sea level.

Figure 3 .
Figure 3.Comparison between the parameters estimated in the time window analysis (dashed line; the grey envelope represents the uncertainty in the parameters from the time window analysis) and the parameters estimated by different model configurations over the full data record length.Here only results for Venice are reported; see Fig. S3 for Marseille.Parameters from all the configurations of the GEV, GPD, and PP models that do not include covariates are shown.Parameters from models with covariates are shown only if models improve significantly the fit (see Table2for the likelihood ratio test).The shape ξ is included in the figure, but no covariate dependence was tested for this parameter.The horizontal axis represents the final year of the time window.Plots are grouped vertically according to the detrending method (MSL: mean sea level; MSL_L: long-term mean sea level; NDT: non-detrended) and horizontally according to the distribution function (GEV: generalized extreme value; GPD: generalized Pareto distribution; PP: point process).The text in the label in the top left corner of each plot shows the significance level of the Mann-Kendall trend test on the parameters from the time-window analysis (n.s.: non-significant; .: p < 0.1; * : p < 0.05; * * : p < 0.01; * * * : p < 0.001).

Figure 4 .
Figure 4.Return level plot actualized to 2019 for Venice.See Fig. S4 for Marseille.Plots are grouped vertically according to the detrending method (MSL: mean sea level; MSL_L: long-term mean sea level; NDT: non-detrended) and horizontally according to the distribution function (GEV: generalized extreme value; GPD: generalized Pareto distribution; PP: point process).The dashed line is the empirical return level for the joint probability method (JPM).Curves are color-coded based on the model configuration.Note the horizontal axis is logarithmic.Return level curves for direct models with covariates are reported only if the addition of the covariate improves the fit significantly (p < 0.01; see Table2).

Figure 5 .
Figure 5. Difference in return levels between each fitted model and a non-detrended GEV fit for different return periods.Return levels of models with covariates are shown only if the model significantly improves the fit compared to models without covariates (p < 0.01; see Table2).Plots are grouped vertically according to the detrending method (MSL: mean sea level; MSL_L: long-term mean sea level; NDT: non-detrended) and horizontally according to the distribution function (GEV: generalized extreme value; GP: generalized Pareto distribution; PP: point process).

Figure 6 .
Figure6.Return level plots for different values of mean sea level.Mean sea level is expressed with respect to the local reference.The current mean sea level is +35 cm for Venice and +54 cm for Marseille.See Sect.2.4 for a description of the selected future mean sea levels.

Table 1 .
Trend in the data used to fit the models.lm: linear model; qr: quantile regression (75th data quantile).VE denotes Venice; MS denotes Marseille.

Table 2 .
Likelihood ratio test results.The column test type describes which model configurations were compared; nc-l: no covariates compared with covariates on location; l-sl: covariates on location compared with covariates on both location and scale; nc-s: no covariates compared with covariates on scale.VE denotes Venice; MS denotes Marseille.

Table 3 .
Comparisons between the rates fitted by the point process (PP) for Venice and the empirical process rate of models with covariates on the location (model type l) and models with covariates on the location and scale (model type s).