the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Correlation of wind waves and sea level variations on the coast of the seasonally icecovered Gulf of Finland
Milla M. Johansson
JanVictor Björkqvist
Jani Särkkä
Ulpu Leijala
Kimmo K. Kahma
Both sea level variations and windgenerated waves affect coastal flooding risks. The correlation of these two phenomena complicates the estimates of their joint effect on the exceedance levels for the continuous water mass. In the northern Baltic Sea the seasonal occurrence of sea ice further influences the situation. We analysed this correlation with 28 years (1992–2019) of sea level data, and 4 years (2016–2019) of wave buoy measurements from a coastal location outside the City of Helsinki, Gulf of Finland in the Baltic Sea. The wave observations were complemented by 28 years of simulations with a parametric wave model. The sea levels and significant wave heights at this location show the strongest positive correlation (τ=0.5) for southwesterly winds, while for northeasterly winds the correlation is negative (−0.3). The results were qualitatively similar when only the open water period was considered, or when the ice season was included either with zero wave heights or hypothetical noice wave heights. We calculated the observed probability distribution of the sum of the sea level and the highest individual wave crest from the simultaneous time series. Compared to this, a probability distribution of the sum calculated by assuming that the two variables are independent underestimates the exceedance frequencies of high total water levels. We tested nine different copulas for their ability to account for the mutual dependence between the two variables.
Urbanized and heavily populated coastal regions around the world face concrete consequences of sea level rise and climate change. To ensure safe and effective coastal protection and city planning in the future, accurate and locationtargeted estimates of coastal flood probabilities are in demand. The ultimate height to which the sea water rises in a coastal flood is determined by both the sea level variations – socalled still water level – as well as windgenerated waves on top of that. Thus, when estimating coastal flooding risks, these both need to be considered.
On the Baltic Sea coasts, the still water level variations mainly consist of storm surges and longerterm variations, the amplitude of tides being small. The flooding risks related to still water level have been extensively studied using long, highquality tide gauge records, which in many locations date back to the 19th century (e.g. Johansson et al., 2001; Eelsalu et al., 2014; Soomere et al., 2015; Kulikov and Medvedev, 2017). See also Hünicke et al. (2015) for an overview of sea level and wind wave studies in the Baltic Sea. These data have enabled detailed analyses of sea level extremes and probability distributions useful for practical applications in coastal planning (e.g. Kahma et al., 2014).
To account for the additional effect of wind waves, Leijala et al. (2018) complemented the stillwaterlevelbased flooding risk estimates with wave runup on a steep shore. They estimated the probability distributions of both phenomena separately and, assuming the two variables independent, calculated the probability distribution of their sum at two coastal locations close to Helsinki, Gulf of Finland (GoF). Even if the assumption of independence might not strictly hold, high sea levels and high waves do not necessarily cooccur, either. Hanson and Larson (2008) drew the conclusion that on the southern Swedish coast, high runup levels (sum of still water level and wave runup) are typically caused by high waves in combination with modest still water levels.
The actual dependence of waves and sea level variations is complex. While shortterm sea level variations and windgenerated waves are driven by related factors (namely wind and air pressure variations), strong winds can either fill or empty small subbasins while simultaneously generating high waves. Also, the slowly changing total water volume of a semienclosed basin does not depend on the shortterm wind speed.
To account for the correlation between sea level variations and wind waves when estimating the exceedance frequencies of their sum, a bivariate (twodimensional) probability distribution is needed. Copulas are a flexible tool with which to study this geophysical problem, since they enable different types of marginal distributions of the individual variables, unlike, e.g. multivariate Weibull or normal distributions. They have therefore been used to study, e.g. storm surges in the German Bight (Wahl et al., 2012), the joint probability of water levels and waves at the Adriatic coast (Masina et al., 2015), extreme coastal wave overtopping (Chini and Stansby, 2012), sealeveldriven nonlinear changes in wave characteristics and tides (Arns et al., 2017), and compound flooding from storm surge and rainfall (Wahl et al., 2015). Marcos et al. (2019) studied the dependence between extreme storm surges and wind waves and found that 55 % of the world coastlines face compound events, and neglecting these dependencies leads to an underestimation of extreme coastal water levels. Kudryavtseva et al. (2020) used copulas to evaluate the likelihood of the joint occurrence of high water levels and wave heights along the whole Baltic Sea coast, showing that the probability of cooccurrence varies regionally, being highest in the Bothnian Sea, but also moderate in the GoF.
The GoF (Fig. 1) is a subbasin of the Baltic Sea. The Baltic Sea is a semienclosed marginal sea connected to the North Sea and the NorthAtlantic Ocean only through the narrow and shallow Danish Straits. The straits prevent rapid changes in the total water volume of the Baltic Sea, which therefore behaves like a closed basin on a subweekly time scale, with the sea level varying internally. The internal variations amount to several decimetres, and are mainly driven by wind and air pressure, but also by the consequent internal oscillations (seiches). These variations in the GoF are controlled by the local geometry of this narrow westtoeast oriented gulf, and can therefore behave differently depending on, e.g. the wind direction (Leppäranta and Myrberg, 2009). The astronomical tides are small, with amplitudes less than 20 cm (e.g. Medvedev et al., 2013).
On time scales longer than a week the Baltic Sea water volume variations are a result of the water exchange that is lowpass filtered by the Danish Straits. This water level component is significant, with local sea level variations being up to 1.3 m (Leppäranta and Myrberg, 2009), while precipitation, evaporation, and river runoff play a minor role. On even longer time scales, the Baltic Sea is affected by the global mean sea level rise, which is counterbalanced to a varying extent by the local postglacial land uplift (e.g. Pellikka et al., 2018; Vestøl et al., 2019).
The windgenerated waves in the GoF are influenced by the fetch, the seabed, and the narrowness of the gulf (Kahma and Pettersson, 1994). The measured significant wave height has exceeded 5 m both during easterly and westerly winds (Pettersson et al., 2013; Pettersson and Jönsson, 2005), but the significant wave height close to Helsinki is expected to be less than half of the opensea value, since waves are effectively attenuated by the coastal archipelago (Björkqvist et al., 2019).
Finally, both the sea level variations and the waves are affected by the ice season, which in the Baltic Sea is 5–7 months (Leppäranta and Myrberg, 2009). While the presence of ice somewhat attenuates sea level variability – mainly by blocking the pilingup effect of wind on the sea surface (Lisitzin, 1957) – it completely blocks waves if the ice concentration is high enough. The maximum ice coverage ranges from 12.5 % to 100 % of the surface area of the Baltic Sea (Leppäranta and Myrberg, 2009); in normal and severe ice winters, the entire GoF becomes ice covered, while in mild winters only the eastern part of the gulf freezes.
In this paper, we explore the correlation of sea level and wind waves near Helsinki on the northern coast of the GoF, with the aim of determining its effect on the exceedance frequencies of their sum. The study is motivated by the high relevance of flood risk estimates for the City of Helsinki, as well as the recently obtained 4year time series of coastal wave buoy data. These data, together with the multidecadal time series of tide gauge observations, provide an excellent data set for such case study. We aim at answering the following research questions:

To what extent are these two variables correlated, and does it depend on season, ice conditions, or wind direction?

What is the observationbased probability distribution of the sum of the still water level and wave height? How much does it differ from a distribution formed assuming that the two variables are independent?

Can a copulabased bivariate distribution account for the correlation of the two variables, and more accurately describe the probability of the sum?
This paper is structured as follows. In Sect. 2, we describe the observation data sets used, the parametric wave model, and the methods used in calculating the uni and bivariate probability distributions. The results are presented in Sect. 3, followed by a discussion in Sect. 4. We summarize our conclusions in Sect. 5.
2.1 Sea level measurements
Automated tide gauges have been operational on the Finnish coast since the late 19th century, at Helsinki since 1904 (see e.g. Johansson et al., 2001 for a more detailed description of data, measurement techniques, and quality). We used instant hourly sea levels measured during 1992–2019 at the Helsinki Kaivopuisto tide gauge (Fig. 1), currently operated by the Finnish Meteorological Institute (FMI). Missing values were estimated by linear interpolation from the measurements at two other nearby tide gauges on the Finnish coast (Hanko and Hamina). There were 477 interpolated hourly values during 1992–2019, less than 0.2 % of the data. All sea levels in this study are given in relation to the N2000 height system, a Finnish realization of the common European height system (Saaranen et al., 2009).
The average sea level during 1992–2019 at Helsinki was +0.21 m, with variations ranging from −0.73 to +1.69 m (Table 1). The linear trend in the time series was −2 cm in 28 years – the land uplift and sea level rise practically cancelling each other. These longterm phenomena thus have a negligible contribution to the sea level variations studied here.
2.2 Wind and temperature measurements
FMI's automatic weather stations provided wind speed, wind direction, and air temperature measurements (Fig. 1), which were used as a forcing for the parametric wave model (Sect. 2.4). From 1992 onwards these stations covered three areas of the Baltic Sea: (i) Northern Baltic Proper (NBP) covered by Utö weather station (measuring height 19 m a.m.s.l. (metres above mean sea level)), (ii) GoF covered by Kalbådagrund and the Helsinki Lighthouse (31 and 33 m a.m.s.l.), and (iii) the coastal area near Suomenlinna, Helsinki covered by Harmaja (18 m a.m.s.l.). The Helsinki Lighthouse weather station was only used for 2019 to cover a Kalbådagrund data gap.
We calculated hourly wind and air temperature values by either interpolating the older 3 h data (up until mid to late 1990s) or by calculating hourly averages of later denser measurements. All gaps shorter than 6 h were interpolated. Continuous data blocks that were shorter than 12 h were removed. Sea surface temperature (SST) data were only available from Harmaja starting at the end of June 1995. For other times and locations we estimated the SST using a mean yearly cycle from the GoF monitoring station LL7 (Fig. 1). The mean is defined from 434 CTD soundings from R/V Aranda between 1992 and 2017.
2.3 Observed wave heights
FMI's operational GoF wave buoy is located in the centre of the basin (62 m water depth), while the coastal wave buoy is anchored at a depth of 22 m outside of Suomenlinna, Helsinki (Fig. 1). While coastal wave measurements have been made in several places along the Finnish coast, this is the only permanent coastal wave buoy. Yearly measurements at these locations started in 2000 (GoF) and 2016 (Suomenlinna). In the Baltic Sea wave buoys are removed before the ice season, since freezing damages the sensors of the Datawell Directional Waveriders. The GoF buoy's measurement periods were 1–18 January 2016, 12 March 2016–13 January 2017, 18 April 2017–29 January 2018, 14 April 2018–14 January 2019, and 11 March–31 December 2019. For the Suomenlinna wave buoy they were 27 April–29 November 2016, 20 April–15 December 2017, 3 May–26 November 2018, and 18 April–10 December 2019.
The wave buoys determined the significant wave height from 27 min vertical displacement time series as ${H}_{\mathrm{s}}=\mathrm{4}\sqrt{{m}_{\mathrm{0}}}$, where m_{0} is the zeroth moment of the variance density spectrum (see Datawell, 2019). We picked every other value to get a data set with hourly time resolution. Of these values, the highest significant wave heights during 2016–2019 were 4.12 m (GoF) and 1.47 m (Suomenlinna); the mean values were 0.81 and 0.33 m.
2.4 Simulated wave heights
We complemented the coastal wave measurements at Suomenlinna with 28 years (1992–2019) of simulations produced by a parametric wave model. The model uses fetchlimited wave growth relations (Kahma and Calkoen, 1992), while also accounting for the wind duration and the atmospheric stratification. A description of the wave evolution scheme is presented in Appendix A.
First we simulated the wave conditions at the GoF wave buoy, which were needed as boundary conditions for the coastal wave hindcast. The GoF simulation accounted for the locally generated waves in the middle of the gulf and waves propagating from the eastern part of the basin and the NBP – also simulated using the same model and the local wind. We then determined the fraction of the GoF wave energy that arrived to Suomenlinna; these waves have longer periods than what can be generated by the local wind and the nearest fetch, and will henceforth be called “long waves”. This long wave energy was quantified as the difference between the observed wave energy (for 2016) and the modelled local wave energy at Suomenlinna. The attenuation – as a function of the wind direction – was determined using a linear regression between the simulated wave energy at the GoF and the estimated long wave energy at Suomenlinna. The final time series of coastal significant wave height was determined from the sum of the simulated local wave energy at Suomenlinna and the attenuated GoF wave energy.
The simulated significant wave heights were validated using wave buoy data from 2016 to 2019. At GoF the −0.02 m bias and 0.28 m RMSE (N=25 293) were slightly worse than for stateoftheart numerical wave models (e.g. Tuomi et al., 2011; Björkqvist et al., 2018). Nonetheless, the accuracy was sufficient to serve as boundary conditions for the coastal simulations at Suomenlinna, which had a 0.00 m bias and 0.10 m RMSE (N=13 574) (Fig. 2). For Suomenlinna the validation statistics compare well with those from highresolution numerical wave model simulations (Björkqvist et al., 2020), but the parametric model showed a slight tendency to overestimate the highest significant wave heights (Fig. 2b). This is natural, since a linear regression cannot account for the increased wave–bottom interaction for the higher and longer waves during stronger winds that were not captured in the calibration period. We can, however, conclude that the quality of the simulated wave height time series is adequate for the purpose of this study.
2.5 Ice conditions and wave statistics types
Data originating from FMI's ice charts described the ice conditions at the southern point of the Suomenlinna islands (Fig. 1). The data represent an area with a radius of roughly 1 km and are therefore fairly representative of the wave buoy location. Of the statistics available for the entire study period (1992–2019), we used the start and end dates of the permanent icecover. During this time – defined as our ice period – the ice concentration is typically high (over 80 %), making it reasonable to set the significant wave height to zero. The length of our ice period varied from zero (winters 2007–2008, 2017–2018, and 2018–2019) to 139 d (winter 2002–2003).
Tuomi et al. (2011) defined four different wave statistics for water bodies with a seasonal ice cover. Using this classification, we compiled the following data sets (Table 1):

Data set F (icefree time): simulated H_{s} for the open water season only.

Data set I (icetimeincluded): simulated H_{s} set to 0 for the ice period.

Data set N (hypothetical noice): full simulated H_{s} data set. As the wave model does not account for ice, this represents a hypothetical situation with no ice present.

Data set M (measurement statistics): wave observations only (this time is a true subset of the icefree period because of measurement gaps).
We divided the 28year data sets (F, I, and N) into seven consecutive 4year periods, from 1992–1995 to 2016–2019. These 4year time series were, where applicable, used to study temporal variability.
2.6 Probability distribution of the total water level
We defined the total water level, z_{max}, as the hourly maximum level to which the continuous water mass reaches as a combined effect of the still water level, z_{still}, and the wave action. To do this, we first assumed that z_{still} does not change during 1 h. Neglecting sitespecific coastal effects, z_{max} in such case is determined by the highest individual wave crest, η_{max}. (Note that wave crest denotes the height above z_{still}, thus being half of the actual height of the highest wave.) At Suomenlinna the highest single wave crest during 30 min is approximately 92 % of the significant wave height (Björkqvist et al., 2019), which we rounded up to η_{max}=H_{s} for hourly values. With these assumptions, the maximum water level elevation is
The probability distributions of z_{max} were then calculated directly from the time series obtained from the observed and simulated z_{still} and H_{s} data. The distributions of z_{still} and H_{s} both describe the actual duration of the phenomenon, and thus have units of h yr^{−1}. The distribution of z_{max}, on the other hand, has a unit of events per year, where “event” is defined as the instant hourly maximum total water level.
If simultaneous time series of z_{still} and H_{s} are not available, the cumulative distribution function (CDF) of z_{max} needs to be calculated based on the respective univariate distributions. If the two variables are assumed to be independent (for dependent variables, see Sect. 2.7), the CDF of their sum is the convolution
where f_{x} denotes the probability density function (PDF) of z_{still}, and F_{y} denotes the CDF of H_{s} (see Leijala et al., 2018 for more details).
2.7 Copulabased probability distributions
Sklar's theorem (Nelsen, 2006) states that if H is a bivariate CDF with marginal CDFs F_{1} and F_{2}, then a copula C exists such that
Thus, the joint distribution H can be expressed as a function of the marginal distributions F_{1} and F_{2} and the copula C, which describes the dependency of the variables x_{1} and x_{2}. If x_{1} and x_{2} are independent, the copula is simply the product F_{1}(x_{1})F_{2}(x_{2}). An extensive introduction to copula methods is provided by Genest and Favre (2007). A benefit of the method is that the marginal distributions can be of any form.
We studied the suitability of nine different copulas (Gumbel, Frank, Clayton, Normal, tEV, Hüsler–Reiss, Galambos, Tawn, and Plackett) in representing the dependence structure of the observed z_{still} and H_{s} data (2016–2019). We calculated the parameters for each copula with the inversion of Kendall's τ estimator. To quantify the goodnessoffit for each of these copulas, we calculated the Cramér–von Mises statistic S_{n} as defined in Eq. (2) of Genest et al. (2009).
By using the bivariate distributions consisting of each of the nine copulas and the marginal distributions (Sect. 2.8), we simulated 10^{8} random pairs (z_{still}, H_{s}) and calculated the complementary cumulative distribution functions (CCDFs) of their sum. The calculations were performed with the R package “copula: Multivariate Dependence with Copulas” (Hofert et al., 2020; Yan, 2007; Kojadinovic and Yan, 2010; Hofert and Mächler, 2011).
2.8 Marginal distributions and extrapolation
Probabilities of sea level extremes have been estimated using extreme value distributions with, e.g. block maxima (annual/monthly) or peakoverthreshold values (e.g. Arns et al., 2013). However, when the focus is on the sum of sea level and wave height (Eq. 1), it is essential to use simultaneous values, and the extremes generally do not cooccur. Our aim was to study the correlation between these two variables in general, on a hourly time scale, not just the correlation of extremes. Thus, we chose to base our analysis on the entire data set of hourly values.
Our 28 years of hourly data allow for a direct estimate of the exceedance frequency down to 0.036 h yr^{−1}, and we therefore had to extrapolate the distribution to cover rarer values. Extremes are typically evaluated using some extreme value distribution (Eelsalu et al., 2014; Särkkä et al., 2017; Kulikov and Medvedev, 2017), but this approach is not applicable to extrapolate hourly data needed for this study.
The probability distribution of the observed sea levels on the Baltic Sea coasts is slightly asymmetric: the highend tail is thicker than the lowend tail (e.g. Johansson et al., 2001; Kulikov and Medvedev, 2017). The significant wave heights in the archipelago, again, show a clear tendency to saturate (Leijala et al., 2018), and are therefore not modelled well simply by applying some commonly used distribution type, such as the Weibull distribution (e.g. Battjes, 1972; Mathisen and BitnerGregersen, 1990). We used the empirical frequency distributions for exceedance frequencies between the 0.5 and 99.5 percentiles, which correspond to a frequency of 44 h yr^{−1} in the tail of the CDF/CCDF. We extrapolated the high (low) tails of the distribution by fitting twoparameter exponential functions $F={e}^{\mathit{\lambda}\left(x{x}_{\mathrm{0}}\right)}$ to the CCDFs (CDFs) below this frequency, following, e.g. Leijala et al. (2018). This method was applied consistently to the sea level, significant wave height (only high tail), and total water level data. The PDFs were obtained by numerically differentiating the CDFs. As our main focus in this study is not on the extrapolated extreme ends of the distribution, we left more detailed considerations of the extrapolation function out and considered the simple exponential function sufficient for our purposes.
3.1 Correlation
The simultaneous hourly sea levels and significant wave heights (Fig. 3) indicate some positive tail dependence – high sea levels cooccurring with high waves – both in the observed and simulated data. However, high waves also occur with moderate or even low sea levels, and high sea levels are not necessarily accompanied by high waves either. The individual maxima did not occur simultaneously. During the highest measured sea level (1.69 m) on 9 January 2005, the simulated wave height was a relatively modest 0.87 m. The highest observed wave height was 1.47 m on 7 December 2017, while the simulations gave 2.25 m on 23 December 2004. On those occasions, the sea levels were +0.73 and +0.94 m, respectively: high but not extreme.
The variations in both parameters are largest in winter (DJF; Fig. 4) and smallest in summer (JJA), when the wave height did not exceed 1.5 m and the sea level was between −0.3 and +1.0 m. The positive tail dependence can be seen in most of the seasonal data also.
The hypothetical noice data (type N) show situations with low sea level and wave heights ranging from 0 to 1 m, which would have occurred during the icecovered period, especially in DJF (Figs. 3 and 4). In spring (MAM), also many highsealevel and hypothetically highwave situations occurred during the icecovered time.
As a measure of the correlation, we used Kendall's tau coefficient (τ). It measures rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. Thus, no assumption of a linear relationship between the variables is made. The sea level correlates stronger with the observed wave heights than the simulated wave heights from the same period (the measurement seasons during 2016–2019) on all seasons except DJF (Table 2). The observed wave data in DJF are very sparse (Fig. 4a), practically only consisting of 15 d of observations in December 2017 and 10 d in December 2019, and thus the high correlation (τ=0.3) should not be considered representative. Likewise, the observations for MAM only represent late April and May, as the earliest the wave buoy was deployed was on 18 April and the latest on 3 May. Of the different seasons, autumn (SON) shows the lowest correlation in all data sets.
The different ways of handling the ice time in the 28year data sets (F, I, N) do not generally result in large differences in the correlation, the only exception being DJF where the data set I shows clearly higher correlation than data sets F or N. There is a tendency for lower sea levels during the icecovered period (the blue points with H_{s}=0 in Fig. 4a), which likely enhances the correlation.
The correlation clearly depends on wind direction measured at Harmaja (Fig. 5, nautical convention for wind direction). The strongest positive correlation occurs with southwesterly winds (240–260^{∘}), while northeasterly winds (20–50^{∘}) result in zero or negative correlation. The behaviour is qualitatively similar in both the observed and simulated wave heights, and with all the data sets considered (F, I and N; results for data sets I and N shown in Appendix B).
3.2 Probability distributions
The CCDF of the sea level from the period 2016–2019 shows lower frequencies for high sea levels (>0.7 m) than the CCDF from the entire period 1992–2019 (Fig. 6a). The 2016–2019 distribution is also on the lower limit of the ensemble of the seven different 4year distributions. The period 2016–2019 had less high sea levels than most of the 4year periods, having also the lowest 4year maximum (1.11 m) of all these periods. The distribution from 2004 to 2007 has a distinct tail with sea levels exceeding 1.45 m, while the maxima of all the other periods are below this. All these high values resulted from the exceptional storm Gudrun on 9 January 2005, when the sea level at Helsinki stayed above 1.4 m for 13 h from 01:00 to 13:00 UTC.
The shape of the CCDF obtained from the observed wave heights differs from those obtained from the simulated wave heights (Fig. 6b), the exceedance frequency decreasing more rapidly with increasing wave height in the observed CCDF than in the simulated ones. This is in accordance with the parametric wave model validation (Fig. 2b), which showed an overestimation of the highest significant wave heights.
The CCDFs of z_{max} calculated by summing the observed sea levels and observed or simulated wave heights according to Eq. (1), and extrapolating with exponential functions, show similar behaviour (Fig. 6c): The distribution based on observed wave heights shows lower frequencies for high z_{max} (>2.0 m) than any of those based on simulated wave heights. The distributions of the sum of z_{still} and H_{s} obtained by assuming the two variables independent, according to Eq. (2) (Fig. 6d) give lower probabilities for high z_{max} than the distributions of Fig. 6c. Below the exceedance frequency of 100 events per year down to 0.1 events per year, the assumption of independence leads to an underestimation of 0.2–0.3 m in z_{max} of the observed distribution from 2016 to 2019. In the simulated 4year distributions, the underestimation at 100 events per year ranges from 0.15 to 0.28 m, and at 1 event per year from 0.09 to 0.70 m. In the distribution based on the entire 28year simulation, the underestimation increases with decreasing exceedance frequency, from 0.06 m at 1000 events per year to 0.55 m at 0.1 events per year.
The CCDFs based on data sets F, I, and N show only minor differences (Fig. 6 for F, others in Appendix B). This indicates that the hypothetical wave and sea level behaviour during the icecovered time does not differ from that during the open water period. The number of ice days, however, is small compared to the length of the study period: 28 d (1.9 %) during 2016–2019 and 1615 d (16 %) during 1992–2019. Thus, it is expected that they do not significantly alter the results.
3.3 Copulabased distributions
Table 3 shows the parameters of the different copulas fitted to the sea levels and significant wave heights. In Fig. 7, CCDFs obtained from these copulas and the marginal distributions are compared with the observed CCDF and the CCDF obtained by assuming the two variables independent. In accordance with the above results, the observed CCDF shows higher probabilities of high total water levels (>1.1 m) than the independent case (Eq. 2).
The copulas are divided into two distinct groups. The Clayton, Frank, and Plackett copulas remain close to the independent case. On the other hand, Tawn, Gumbel, tEV, Galambos, and Hüsler–Reiss copulas capture more of the effect of dependence on the distribution, closely following the observed distribution down to exceedance frequency of 50 events per year. The Normal copula falls between these two groups, deviating from the observed distribution below 400 events per year. This behaviour is confirmed by the Cramér–von Mises statistics (Table 3), where smaller S_{n} values correspond to better fit. For the latter group S_{n}<1, while in the first group S_{n} varies from 1.22 to 3.97. The Normal copula, again, falls between these groups with S_{n}=1.17.
We tested the effect of wind direction on the copula results by conducting the copula analysis separately for two subsets of the data: situations with southwesterly wind (160–340^{∘}, Fig. 7b) and northeasterly wind (340–160^{∘}, not shown). The results for the southwesterly wind subset are qualitatively similar to those including all data: Copulas are divided into two groups. The stronger correlation (τ=0.269) results in somewhat higher probabilities of high total water levels in the distributions. In the northeasterly wind case, on the other hand, the nonexistent correlation ($\mathit{\tau}=\mathrm{0.012}$) leads to the copula analysis being somewhat pointless, as all the copulas end up close to the independent case.
To further illustrate the differences in bivariate distributions based on different copulas, we simulated an ensemble of 21 546 samples from each, these being of the same size as the observed data set (Fig. 8). The copulas with S_{n}<1 (Tawn, Gumbel, tEV, Galambos, and Hüsler–Reiss) tend to show a positive tail dependence: the slight bending of the scatter cloud into a tail on the upper right corner. The copulas with S_{n}>1 do not show such an apparent positive tail.
Since the wave conditions in the archipelago markedly differ from those on the open sea, a multiyear time series of wave data from a coastal location such as Suomenlinna provides excellent material for studies. As more data accumulate, it will provide more comprehensive knowledge on the local wave behaviour. We compensated for the shortness of the wave buoy time series by using 28 years of simulated wave data. The lightweight parametric wave model is considerably faster to run than highresolution numerical wave models. Nevertheless, the bias and RMSE of the simulations were comparable.
The validation suggested that the highest wave heights are overestimated in the simulations. The attenuation of longer waves through bottom processes is indirectly accounted for in the model through the calibration against measurements. Nonetheless, this calibration might not hold for even harsher weather conditions and even longer waves. While the simulations offer a good tool for assessing the general dependence between the wave height and sea level variations, they should not be used as is for analysing extreme values. If the wave simulations of the parametric model are to be used for extreme conditions, the wavebottom interactions need to be accounted for in a more explicit manner, and such improved model needs to be revalidated.
One of the advantages of a wave model is its ability to simulate the wave conditions during icecovered periods, or periods when there is a risk of freezing and the wave buoy cannot be kept in the sea. This extends the time series and provides an opportunity to analyse the question, “What would happen if there was no ice?” This question is relevant when we consider the conditions in the GoF in a warmer future climate, for instance. Our results did not indicate that the wave or sea level conditions, or the correlation of these, would markedly differ from those occurring during the icefree period in the present climate.
The wave conditions vary significantly among 4year periods (Fig. 6). The observational period of this study (2016–2019) was relatively calm: the 4year maximum sea level (1.11 m) was lowest among the seven consecutive 4year sets, while the maximum simulated significant wave height (1.89 m) ranked fourth. To obtain reliable estimates for probabilities of extreme sea levels on the Finnish coast, at least 30 years of data are generally used (e.g. Pellikka et al., 2018; Kahma et al., 2014). Our results obtained from 4year time series do not provide any flooding risk estimates for practical applications.
Additionally, considering flooding on the coast, the relevant parameter is the wave runup. While our total water level (Eq. 1) represents the elevation of the highest wave crest on the open sea, it is not applicable on the waves meeting the coastline. On a steep coast, a more appropriate estimate for the total water level would be ${z}_{\mathrm{still}}+\mathrm{2}\cdot {H}_{\mathrm{s}}$ due to wave reflection (Leijala et al., 2018), while on a shallow coast the eventual runup on the coastline is less than z_{still}+H_{s}.
The behaviour of the correlation with respect to wind direction (Fig. 5) is readily explained by the geography of the GoF. Southwesterly winds push water into the gulf and simultaneously raise high waves, leading to a positive correlation. Northeasterly winds, on the other hand, push water out from the GoF while still raising waves, which results in a negative correlation. When the correlation is calculated from the entire data set, these two opposing situations with different mechanisms likely cancel each other and reduce the overall correlation. To more completely describe the effect of the correlation on the distribution of the total water level, and thus the flooding risks, one option might be to use three variables instead of two: still water level, significant wave height, and wind direction. We leave this as an option for future studies.
The sea level processes in the GoF act on different time scales, from the subdaily storm surges to subweekly Baltic Sea internal variations, and weekly and longerterm changes in the total water volume. From 1 d to 1 week, the Baltic Sea mainly responds to wind and air pressure forcing like a closed basin. The response time of such variations, e.g. the transport of water between the GoF and the Baltic Proper in a seiche oscillation, is usually of the order of 1 d. In such time scale, the windgenerated waves have time to develop, and these phenomena cooccur.
The sea level variations in time scales longer than 1 week usually involve changes in the Baltic Sea water volume. These are related to longerterm (up to 60 d; Särkkä et al., 2017) prevailing wind conditions, which control the in or outflow of water in the Danish Straits. These should not have a direct relationship with the local waves. However, the longerterm weather conditions correlate with shortterm weather phenomena too: e.g. prevailing westerly winds are related to more lowpressure systems with strong winds travelling over the Baltic Sea, and might thus contribute to the overall correlation.
Our results demonstrated that the assumption of independence of sea level and significant wave height leads to an underestimation of total water levels with exceedance frequencies of 100 events per year or less by about 0.2–0.3 m in the 4year observationbased data set, and up to 0.55 m in the 28year simulationbased data set (the difference of the distributions in Fig. 6c and d). The underestimation generally increases with lower exceedance frequencies. Since low exceedance frequencies such as 0.001–0.01 events per year are highly relevant for flooding risk considerations, further investigation of the effect of the correlation on such exceedance frequencies would be worthwhile. Quantification of this effect for frequencies that go below our observation/simulationbased data sets, however, would require a more thorough analysis of the suitability of the extrapolation function, which we leave for future studies.
Our main conclusions, related to the research questions presented in Sect. 1, are as follows.
The sea level variations and significant wave heights show a positive correlation in general (τ=0.20). The correlation depends on wind direction: Southwesterly winds lead to the strongest positive correlation (up to τ=0.5), while northeasterly winds lead to zero or negative correlation. Ice conditions did not have an apparent effect on the correlation: Including hypothetical noice wave heights during the ice season did not markedly alter the correlation, and neither did the setting of the wave height to zero during the icecovered time.
The 4year probability distribution of the total water level, calculated as a sum of the observed sea level and observed significant wave heights, gives 0.2–0.3 m higher values for high total water levels (0.1–100 events per year) than the distribution calculated by assuming the two variables independent. In the simulated 28year distribution, the underestimation amounts to 0.2–0.5 m. As the total water level values rarer than 0.1 events per year are also likely underestimated, the dependence between the variables should be accounted for when calculating the probabilities of high total water levels, e.g. for flooding risk estimates.
The probability distribution of the sum based on the copula approach is closer to the observed distribution than the distribution based on the independence assumption. Of the nine copulas studied, the Tawn, Gumbel, tEV, Galambos, and Hüsler–Reiss copulas proved most suitable for our data.
The core idea of the wave model is based on known universal properties of different dimensionless parameters, namely the energy, peak frequency, phase speed, fetch, and time:
The wave growth in the model is quantified using the fetch limited wave growth relations of Kahma and Calkoen (1992):
The coefficients p_{1}, p_{2}, q_{1}, q_{2} for the wave growth relations depend on the atmospheric stratification as presented by the authors.
Instead of assuming that the waves are fetch limited at each time step, the model simulates a more realistic wave growth and decay by also accounting for the limitation of wind duration through the following propagation scheme:

Calculate the dimensionless fetch, ${\stackrel{\mathrm{\u0303}}{X}}_{{\mathit{\epsilon}}^{i\mathrm{1}}}$, based on the energy from the previous time step, ε^{i−1}, and the current wind speed, U^{i}.

Calculate a (dimensionless) duration based on the wind, U^{i}, and Eq. (A7) as:
$$\begin{array}{}\text{(A8)}& \begin{array}{rl}{\stackrel{\mathrm{\u0303}}{t}}_{\mathrm{0}}& =\underset{\mathrm{0}}{\overset{{\stackrel{\mathrm{\u0303}}{X}}_{{\mathit{\epsilon}}^{i\mathrm{1}}}}{\int}}{\left({\displaystyle \frac{\mathrm{1}}{\mathrm{2}}}\stackrel{\mathrm{\u0303}}{{c}_{\mathrm{p}}}\right)}^{\mathrm{1}}\mathrm{d}\stackrel{\mathrm{\u0303}}{X}=\underset{\mathrm{0}}{\overset{{\stackrel{\mathrm{\u0303}}{X}}_{{\mathit{\epsilon}}^{i\mathrm{1}}}}{\int}}\mathrm{4}\mathit{\pi}\stackrel{\mathrm{\u0303}}{{f}_{\mathrm{p}}}\mathrm{d}\stackrel{\mathrm{\u0303}}{X},\\ & =\underset{\mathrm{0}}{\overset{{\stackrel{\mathrm{\u0303}}{X}}_{{\mathit{\epsilon}}^{i\mathrm{1}}}}{\int}}\mathrm{4}\mathit{\pi}{q}_{\mathrm{2}}{\stackrel{\mathrm{\u0303}}{X}}^{{q}_{\mathrm{1}}}\mathrm{d}\stackrel{\mathrm{\u0303}}{X}={\displaystyle \frac{\mathrm{4}\mathit{\pi}{q}_{\mathrm{2}}}{{q}_{\mathrm{1}}+\mathrm{1}}}{\stackrel{\mathrm{\u0303}}{X}}_{{\mathit{\epsilon}}^{i\mathrm{1}}}^{\left({q}_{\mathrm{1}}+\mathrm{1}\right)}.\end{array}\end{array}$$ 
Determine an enhanced dimensionless time step as $\mathrm{\Delta}\stackrel{\mathrm{\u0303}}{t}+{\stackrel{\mathrm{\u0303}}{t}}_{\mathrm{0}}$, where Δt is the fixed time step of the model (e.g. 1 h).

As an inverse to step 2, calculate a new fetch ${\stackrel{\mathrm{\u0303}}{X}}_{t}$ based on the enhanced duration $\mathrm{\Delta}\stackrel{\mathrm{\u0303}}{t}+{\stackrel{\mathrm{\u0303}}{t}}_{\mathrm{0}}$, and Eq. (A7) as:
$$\begin{array}{}\text{(A9)}& {\stackrel{\mathrm{\u0303}}{X}}_{t}={\left({\displaystyle \frac{\left(\stackrel{\mathrm{\u0303}}{t}+{\stackrel{\mathrm{\u0303}}{t}}_{\mathrm{0}}\right)\left({q}_{\mathrm{1}}+\mathrm{1}\right)}{\mathrm{4}\mathit{\pi}{q}_{\mathrm{2}}}}\right)}^{\mathrm{1}/\left({q}_{\mathrm{1}}+\mathrm{1}\right)}.\end{array}$$ 
Calculate the minimum dimensionless fetch for a fully developed sea (using the wind speed at 10 m height) as:
$$\begin{array}{}\text{(A10)}& \begin{array}{rl}{\displaystyle \frac{U}{{c}_{\mathrm{p}}}}& ={\displaystyle \frac{\mathrm{2}\mathit{\pi}{f}_{\mathrm{p}}U}{g}}=\mathrm{2}\mathit{\pi}{\stackrel{\mathrm{\u0303}}{f}}_{\mathrm{p}}=\mathrm{0.82}\u27fa{\stackrel{\mathrm{\u0303}}{X}}_{\mathrm{FD}}\\ & ={\left({\displaystyle \frac{\mathrm{0.82}}{\mathrm{2}\mathit{\pi}{q}_{\mathrm{2}}}}\right)}^{\mathrm{1}/{q}_{\mathrm{1}}}.\end{array}\end{array}$$ 
Set the relevant fetch to ${\stackrel{\mathrm{\u0303}}{X}}^{i}=\mathrm{min}\mathit{\{}{\stackrel{\mathrm{\u0303}}{X}}_{t},{\stackrel{\mathrm{\u0303}}{X}}_{\mathrm{FD}},\stackrel{\mathrm{\u0303}}{X}\mathit{\}}$, where $\stackrel{\mathrm{\u0303}}{X}$ is the physical dimensionless fetch.

Calculate the new energy, ε^{i}, and peak frequency, ${f}_{\mathrm{p}}^{i}$, using ${\stackrel{\mathrm{\u0303}}{X}}^{i}$, U^{i}, and Eqs. (A6) and (A7).
The wave, wind and sealevel observations are available through FMI's open data portal (https://en.ilmatieteenlaitos.fi/opendata; Finnish Meteorological Institute, 2020). The code for the parametric wave model is available at https://doi.org/10.5281/zenodo.6025415 (Björkqvist et al., 2022).
MMJ calculated the correlations, probability distributions, and copulas and wrote most of the paper. JVB programmed the coastal implementation of the parametric wave model, did the model runs, and contributed to writing. JS started the work and wrote the first draft. UL contributed to the calculation of the probability distributions. KKK originally developed the parametric wave model and proposed the idea of this study.
The contact author has declared that neither they nor their coauthors have any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work done by Pekka Alenius in providing the water temperature data for the LL7 monitoring site is greatly acknowledged.
This work was partly funded by the State Nuclear Waste Management Fund in Finland through the EXWE project (Extreme weather and nuclear power plants) of the SAFIR2018 programme (The Finnish Nuclear Power Plant Safety Research Programme 2015–2018; http://safir2018.vtt.fi, last access: 8 March 2022).
This paper was edited by Philip Ward and reviewed by two anonymous referees.
Arns, A., Wahl, T., Haigh, I., Jensen, J., and Pattiaratchi, C.: Estimating extreme water level probabilities: A comparison of the direct methods and recommendations for best practise, Coast. Eng., 81, 51–66, 2013. a
Arns, A., Dangendorf, S., Jensen, J., Talke, S., Bender, J., and Pattiaratchi, C.: Sealevel rise induced amplification of coastal protection design heights, Scient. Rep., 7, 40171, https://doi.org/10.1038/srep40171, 2017. a
Battjes, J. A.: Longterm wave height distributions at seven stations around the British Isles, Deutsch. Hydrogr. Z., 25, 179–189, https://doi.org/10.1007/BF02312702, 1972. a
Björkqvist, J.V., Lukas, I., Alari, V., van Vledder, G. P., Hulst, S., Pettersson, H., Behrens, A., and Männik, A.: Comparing a 41year model hindcast with decades of wave measurements from the Baltic Sea, Ocean Eng., 152, 57–71, https://doi.org/10.1016/J.OCEANENG.2018.01.048, 2018. a
Björkqvist, J.V., Pettersson, H., and Kahma, K. K.: The wave spectrum in archipelagos, Ocean Sci., 15, 1469–1487, https://doi.org/10.5194/os1514692019, 2019. a, b
Björkqvist, J.V., VähäPiikkiö, O., Alari, V., Kuznetsova, A., and Tuomi, L.: WAM, SWAN and WAVEWATCH III in the Finnish archipelago – the effect of spectral performance on bulk wave parameters, J. Operat. Oceanogr., 13, 55–70, https://doi.org/10.1080/1755876X.2019.1633236, 2020. a
Björkqvist, J.V., Kahma, K., and Pettersson, H.: bjorkqvi/KKWave: Release v1.0.0, Zenodo [code], https://doi.org/10.5281/zenodo.6025415, 2022. a
Chini, N. and Stansby, P.: Extreme values of coastal wave overtopping accounting for climate change and sea level rise, Coast. Eng., 65, 27–37, https://doi.org/10.1016/j.coastaleng.2012.02.009, 2012. a
Datawell, B.: Datawell Waverider Reference Manual, Tech. rep., http://www.datawell.nl/Support/Documentation/Manuals.aspx, last access: 28 February 2019. a
Eelsalu, M., Soomere, T., Pindsoo, K., and Lagemaa, P.: Ensemble approach for projections of return periods of extreme water levels in Estonian waters, Cont. Shelf Res., 91, 201–210, https://doi.org/10.1016/j.csr.2014.09.012, 2014. a, b
Finnish Meteorological Institute: The Finnish Meteorological Institute's open data, https://en.ilmatieteenlaitos.fi/opendata, last access: 29 October 2020. a
Genest, C. and Favre, A.C.: Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask, J. Hydrol. Eng., 12, 347–368, https://doi.org/10.1061/(ASCE)10840699(2007)12:4(347), 2007. a
Genest, C., Rémillard, B., and Beaudoin, D.: Goodnessoffit tests for copulas: A review and a power study, Insurance: Math. Econ., 44, 199–213, https://doi.org/10.1016/j.insmatheco.2007.10.005, 2009. a
Hanson, H. and Larson, M.: Implications of extreme waves and water levels in the southern Baltic Sea, J. Hydraul. Res., 46, 292–302, https://doi.org/10.1080/00221686.2008.9521962, 2008. a
Hofert, M. and Mächler, M.: Nested Archimedean Copulas Meet R: The nacopula Package, J. Stat. Softw., 39, 1–20, https://doi.org/10.18637/jss.v039.i09, 2011. a
Hofert, M., Kojadinovic, I., Maechler, M., and Yan, J.: copula: Multivariate Dependence with Copulas, r package version 1.01, https://CRAN.Rproject.org/package=copula (last access: 4 August 2021), 2020. a
Hünicke, B., Zorita, E., Soomere, T., Madsen, K. S., Johansson, M., and Suursaar, Ü.: Recent Change – Sea Level and Wind Waves, in: Second Assessment of Climate Change for the Baltic Sea Basin, chap. 9, edited by: The BACC II Author Team, Springer, Cham, Heidelberg, New York, Dordrecht, London, 155–185, https://doi.org/10.1007/9783319160061_9, 2015. a
Johansson, M. M., Boman, H., Kahma, K. K., and Launiainen, J.: Trends in sea level variability in the Baltic Sea, Boreal Environ. Res., 6, 159–179, 2001. a, b, c
Kahma, K. K. and Calkoen, C.: Reconciling discrepancies in the observed growth of windgenerated waves, J. Phys. Oceanogr., 22, 1389–1405, 1992. a, b
Kahma, K. K. and Pettersson, H.: Wave growth in narrow fetch geometry, Global Atmos. Ocean Syst., 2, 253–263, 1994. a
Kahma, K., Pellikka, H., Leinonen, K., Leijala, U., and Johansson, M.: Pitkän aikavälin tulvariskit ja alimmat suositeltavat rakentamiskorkeudet Suomen rannikolla – Longterm flooding risks and recommendations for minimum building elevations on the Finnish coast, Raportteja – Rapporter – Reports 2014:6, Finnish meteorological Institute, ISBN 9789516978331, http://hdl.handle.net/10138/135226 (last access: 8 March 2022), 2014. a, b
Kojadinovic, I. and Yan, J.: Modeling Multivariate Distributions with Continuous Margins Using the copula R Package, J. Stat. Softw., 34, 1–20, https://doi.org/10.18637/jss.v034.i09, 2010. a
Kudryavtseva, N., Räämet, A., and Soomere, T.: Coastal flooding: Joint probability of extreme water levels and waves along the Baltic Sea coast, J. Coast. Res., 95, 1146–1151, https://doi.org/10.2112/SI95222.1, 2020. a
Kulikov, E. A. and Medvedev, I. P.: Extreme Statistics of Storm Surges in the Baltic Sea, Oceanology, 57, 772–783, 2017. a, b, c
Leijala, U., Björkqvist, J.V., Johansson, M. M., Pellikka, H., Laakso, L., and Kahma, K. K.: Combining probability distributions of sea level variations and wave runup to evaluate coastal flooding risks, Nat. Hazards Earth Syst. Sci., 18, 2785–2799, https://doi.org/10.5194/nhess1827852018, 2018. a, b, c, d, e
Leppäranta, M. and Myrberg, K.: Physical Oceanography of the Baltic Sea, Springer, ISBN 9783540797029, 2009. a, b, c, d
Lisitzin, E.: On the reducing influence of sea ice on the pilingup of water due to wind stress, Societas Scientiarum Fennica, Commentationes PhysicoMathematicae, XX, 12 pp., 1957. a
Marcos, M., Rohmer, J., Vousdoukas, M. I., Mentaschi, L., Le Cozannet, G., and Amores, A.: Increased extreme coastal water levels due to the combined action of storm surges and wind waves, Geophys. Res. Lett., 46, 4356–4364, https://doi.org/10.1029/2019GL082599, 2019. a
Masina, M., Lamberti, A., and Archetti, R.: Coastal flooding: A copula based approach for estimating the joint probability of water levels and waves, Coast. Eng., 97, 37–52, https://doi.org/10.1016/j.coastaleng.2014.12.010, 2015. a
Mathisen, J. and BitnerGregersen, E.: Joint distributions for significant wave height and wave zeroupcrossing period, Appl. Ocean Res., 12, 93–103, https://doi.org/10.1016/S01411187(05)800331, 1990. a
Medvedev, I. P., Rabinovich, A. B., and Kulikov, E. A.: Tidal oscillations in the Baltic Sea, Oceanology, 53, 526–538, https://doi.org/10.1134/S0001437013050123, 2013. a
Nelsen, R. B.: An Introduction to Copulas, Springer Series in Statistics, Springer, New York, USA, ISBN 9780387286785, 2006. a
Pellikka, H., Leijala, U., Johansson, M. M., Leinonen, K., and Kahma, K. K.: Future probabilities of coastal floods in Finland, Cont. Shelf Res., 157, 32–42, https://doi.org/10.1016/j.csr.2018.02.006, 2018. a, b
Pettersson, H. and Jönsson, A.: Wave climate in the northern Baltic Sea in 2004, https://helcom.fi/media/documents/WaveclimateinthenorthernBalticSeain2004.pdf (last access: 13 February 2021), 2005. a
Pettersson, H., Lindow, H., and Brüning, T.: Wave climate in the Baltic Sea 2012, http://www.helcom.fi/balticseatrends/environmentfactsheets/ (last access: 15 September 2017), 2013. a
Saaranen, V., Lehmuskoski, P., Rouhiainen, P., Takalo, M., Mäkinen, J., and Poutanen, M.: The new Finnish height reference N2000, in: Geodetic Reference Frames: IAG Symposium, 9–14 October 2006, Munich, Germany, edited by: Drewes, H., Springer, 297–302, https://doi.org/10.1007/9783642008603_46, 2009. a
Särkkä, J., Kahma, K. K., Kämäräinen, M., Johansson, M. M., and Saku, S.: Simulated extreme sea levels at Helsinki, Boreal Environ. Res., 22, 299–315, 2017. a, b
Soomere, T., Eelsalu, M., Kurkin, A., and Rybin, A.: Separation of the Baltic Sea water level into daily and multiweekly components, Cont. Shelf Res., 103, 23–32, https://doi.org/10.1016/j.csr.2015.04.018, 2015. a
Tuomi, L., Kahma, K. K., and Pettersson, H.: Wave hindcast statistics in the seasonally icecovered Baltic Sea, Boreal Environ. Res., 16, 451–472, 2011. a, b
Vestøl, O., Ågren, J., Steffen, H., Kierulf, H., and Tarasov, L.: NKG2016LU: a new land uplift model for Fennoscandia and the Baltic Region, J. Geod., 93, 1759–1779, https://doi.org/10.1007/s00190019012808, 2019. a
Wahl, T., Mudersbach, C., and Jensen, J.: Assessing the hydrodynamic boundary conditions for risk analyses in coastal areas: a multivariate statistical approach based on Copula functions, Nat. Hazards Earth Syst. Sci., 12, 495–510, https://doi.org/10.5194/nhess124952012, 2012. a
Wahl, T., Jain, S., Bender, J., Meyers, S. D., and Luther, M. E.: Increasing risk of compound flooding from storm surge and rainfall for major US cities, Nat. Clim. Change, 5, 1093–1097, https://doi.org/10.1038/nclimate2736, 2015. a
Yan, J.: Enjoy the Joy of Copulas: With a Package copula, J. Stat. Softw., 21, 1–21, https://doi.org/10.18637/jss.v021.i04, 2007. a