Articles | Volume 22, issue 3
Nat. Hazards Earth Syst. Sci., 22, 813–829, 2022
Nat. Hazards Earth Syst. Sci., 22, 813–829, 2022

Research article 10 Mar 2022

Research article | 10 Mar 2022

Correlation of wind waves and sea level variations on the coast of the seasonally ice-covered Gulf of Finland

Correlation of wind waves and sea level variations on the coast of the seasonally ice-covered Gulf of Finland
Milla M. Johansson1, Jan-Victor Björkqvist2,1,3, Jani Särkkä1, Ulpu Leijala1, and Kimmo K. Kahma1 Milla M. Johansson et al.
  • 1Finnish Meteorological Institute, P.O. Box 503, 00101, Helsinki, Finland
  • 2Norwegian Meteorological Institute, Allégaten 70, 5007 Bergen, Norway
  • 3Department of Marine Systems, Tallinn University of Technology, Akadeemia tee 15a, 12618 Tallinn, Estonia

Correspondence: Milla M. Johansson (


Both sea level variations and wind-generated waves affect coastal flooding risks. The correlation of these two phenomena complicates the estimates of their joint effect on the exceedance levels for the continuous water mass. In the northern Baltic Sea the seasonal occurrence of sea ice further influences the situation. We analysed this correlation with 28 years (1992–2019) of sea level data, and 4 years (2016–2019) of wave buoy measurements from a coastal location outside the City of Helsinki, Gulf of Finland in the Baltic Sea. The wave observations were complemented by 28 years of simulations with a parametric wave model. The sea levels and significant wave heights at this location show the strongest positive correlation (τ=0.5) for southwesterly winds, while for northeasterly winds the correlation is negative (−0.3). The results were qualitatively similar when only the open water period was considered, or when the ice season was included either with zero wave heights or hypothetical no-ice wave heights. We calculated the observed probability distribution of the sum of the sea level and the highest individual wave crest from the simultaneous time series. Compared to this, a probability distribution of the sum calculated by assuming that the two variables are independent underestimates the exceedance frequencies of high total water levels. We tested nine different copulas for their ability to account for the mutual dependence between the two variables.

1 Introduction

Urbanized and heavily populated coastal regions around the world face concrete consequences of sea level rise and climate change. To ensure safe and effective coastal protection and city planning in the future, accurate and location-targeted estimates of coastal flood probabilities are in demand. The ultimate height to which the sea water rises in a coastal flood is determined by both the sea level variations – so-called still water level – as well as wind-generated waves on top of that. Thus, when estimating coastal flooding risks, these both need to be considered.

On the Baltic Sea coasts, the still water level variations mainly consist of storm surges and longer-term variations, the amplitude of tides being small. The flooding risks related to still water level have been extensively studied using long, high-quality tide gauge records, which in many locations date back to the 19th century (e.g. Johansson et al.2001; Eelsalu et al.2014; Soomere et al.2015; Kulikov and Medvedev2017). See also Hünicke et al. (2015) for an overview of sea level and wind wave studies in the Baltic Sea. These data have enabled detailed analyses of sea level extremes and probability distributions useful for practical applications in coastal planning (e.g. Kahma et al.2014).

To account for the additional effect of wind waves, Leijala et al. (2018) complemented the still-water-level-based flooding risk estimates with wave run-up on a steep shore. They estimated the probability distributions of both phenomena separately and, assuming the two variables independent, calculated the probability distribution of their sum at two coastal locations close to Helsinki, Gulf of Finland (GoF). Even if the assumption of independence might not strictly hold, high sea levels and high waves do not necessarily co-occur, either. Hanson and Larson (2008) drew the conclusion that on the southern Swedish coast, high run-up levels (sum of still water level and wave run-up) are typically caused by high waves in combination with modest still water levels.

The actual dependence of waves and sea level variations is complex. While short-term sea level variations and wind-generated waves are driven by related factors (namely wind and air pressure variations), strong winds can either fill or empty small sub-basins while simultaneously generating high waves. Also, the slowly changing total water volume of a semi-enclosed basin does not depend on the short-term wind speed.

To account for the correlation between sea level variations and wind waves when estimating the exceedance frequencies of their sum, a bivariate (two-dimensional) probability distribution is needed. Copulas are a flexible tool with which to study this geophysical problem, since they enable different types of marginal distributions of the individual variables, unlike, e.g. multivariate Weibull or normal distributions. They have therefore been used to study, e.g. storm surges in the German Bight (Wahl et al.2012), the joint probability of water levels and waves at the Adriatic coast (Masina et al.2015), extreme coastal wave overtopping (Chini and Stansby2012), sea-level-driven non-linear changes in wave characteristics and tides (Arns et al.2017), and compound flooding from storm surge and rainfall (Wahl et al.2015). Marcos et al. (2019) studied the dependence between extreme storm surges and wind waves and found that 55 % of the world coastlines face compound events, and neglecting these dependencies leads to an underestimation of extreme coastal water levels. Kudryavtseva et al. (2020) used copulas to evaluate the likelihood of the joint occurrence of high water levels and wave heights along the whole Baltic Sea coast, showing that the probability of co-occurrence varies regionally, being highest in the Bothnian Sea, but also moderate in the GoF.

The GoF (Fig. 1) is a sub-basin of the Baltic Sea. The Baltic Sea is a semi-enclosed marginal sea connected to the North Sea and the North-Atlantic Ocean only through the narrow and shallow Danish Straits. The straits prevent rapid changes in the total water volume of the Baltic Sea, which therefore behaves like a closed basin on a sub-weekly time scale, with the sea level varying internally. The internal variations amount to several decimetres, and are mainly driven by wind and air pressure, but also by the consequent internal oscillations (seiches). These variations in the GoF are controlled by the local geometry of this narrow west-to-east oriented gulf, and can therefore behave differently depending on, e.g. the wind direction (Leppäranta and Myrberg2009). The astronomical tides are small, with amplitudes less than 20 cm (e.g. Medvedev et al.2013).

Figure 1On top is a larger view of the study area covering the Northern Baltic Proper (NBP) and Gulf of Finland (GoF). The coastal area off Helsinki around Suomenlinna is shown on the bottom. The Suomenlinna ice station represents the ice conditions for a radius of about 1 km. The contours show the water depth in metres.

On time scales longer than a week the Baltic Sea water volume variations are a result of the water exchange that is low-pass filtered by the Danish Straits. This water level component is significant, with local sea level variations being up to 1.3 m (Leppäranta and Myrberg2009), while precipitation, evaporation, and river runoff play a minor role. On even longer time scales, the Baltic Sea is affected by the global mean sea level rise, which is counterbalanced to a varying extent by the local post-glacial land uplift (e.g. Pellikka et al.2018; Vestøl et al.2019).

The wind-generated waves in the GoF are influenced by the fetch, the seabed, and the narrowness of the gulf (Kahma and Pettersson1994). The measured significant wave height has exceeded 5 m both during easterly and westerly winds (Pettersson et al.2013; Pettersson and Jönsson2005), but the significant wave height close to Helsinki is expected to be less than half of the open-sea value, since waves are effectively attenuated by the coastal archipelago (Björkqvist et al.2019).

Finally, both the sea level variations and the waves are affected by the ice season, which in the Baltic Sea is 5–7 months (Leppäranta and Myrberg2009). While the presence of ice somewhat attenuates sea level variability – mainly by blocking the piling-up effect of wind on the sea surface (Lisitzin1957) – it completely blocks waves if the ice concentration is high enough. The maximum ice coverage ranges from 12.5 % to 100 % of the surface area of the Baltic Sea (Leppäranta and Myrberg2009); in normal and severe ice winters, the entire GoF becomes ice covered, while in mild winters only the eastern part of the gulf freezes.

In this paper, we explore the correlation of sea level and wind waves near Helsinki on the northern coast of the GoF, with the aim of determining its effect on the exceedance frequencies of their sum. The study is motivated by the high relevance of flood risk estimates for the City of Helsinki, as well as the recently obtained 4-year time series of coastal wave buoy data. These data, together with the multi-decadal time series of tide gauge observations, provide an excellent data set for such case study. We aim at answering the following research questions:

  1. To what extent are these two variables correlated, and does it depend on season, ice conditions, or wind direction?

  2. What is the observation-based probability distribution of the sum of the still water level and wave height? How much does it differ from a distribution formed assuming that the two variables are independent?

  3. Can a copula-based bivariate distribution account for the correlation of the two variables, and more accurately describe the probability of the sum?

This paper is structured as follows. In Sect. 2, we describe the observation data sets used, the parametric wave model, and the methods used in calculating the uni- and bivariate probability distributions. The results are presented in Sect. 3, followed by a discussion in Sect. 4. We summarize our conclusions in Sect. 5.

2 Data and methods

2.1 Sea level measurements

Automated tide gauges have been operational on the Finnish coast since the late 19th century, at Helsinki since 1904 (see e.g. Johansson et al.2001 for a more detailed description of data, measurement techniques, and quality). We used instant hourly sea levels measured during 1992–2019 at the Helsinki Kaivopuisto tide gauge (Fig. 1), currently operated by the Finnish Meteorological Institute (FMI). Missing values were estimated by linear interpolation from the measurements at two other nearby tide gauges on the Finnish coast (Hanko and Hamina). There were 477 interpolated hourly values during 1992–2019, less than 0.2 % of the data. All sea levels in this study are given in relation to the N2000 height system, a Finnish realization of the common European height system (Saaranen et al.2009).

The average sea level during 1992–2019 at Helsinki was +0.21 m, with variations ranging from −0.73 to +1.69 m (Table 1). The linear trend in the time series was −2 cm in 28 years – the land uplift and sea level rise practically cancelling each other. These long-term phenomena thus have a negligible contribution to the sea level variations studied here.

2.2 Wind and temperature measurements

FMI's automatic weather stations provided wind speed, wind direction, and air temperature measurements (Fig. 1), which were used as a forcing for the parametric wave model (Sect. 2.4). From 1992 onwards these stations covered three areas of the Baltic Sea: (i) Northern Baltic Proper (NBP) covered by Utö weather station (measuring height 19 m a.m.s.l. (metres above mean sea level)), (ii) GoF covered by Kalbådagrund and the Helsinki Lighthouse (31 and 33 m a.m.s.l.), and (iii) the coastal area near Suomenlinna, Helsinki covered by Harmaja (18 m a.m.s.l.). The Helsinki Lighthouse weather station was only used for 2019 to cover a Kalbådagrund data gap.

We calculated hourly wind and air temperature values by either interpolating the older 3 h data (up until mid- to late 1990s) or by calculating hourly averages of later denser measurements. All gaps shorter than 6 h were interpolated. Continuous data blocks that were shorter than 12 h were removed. Sea surface temperature (SST) data were only available from Harmaja starting at the end of June 1995. For other times and locations we estimated the SST using a mean yearly cycle from the GoF monitoring station LL7 (Fig. 1). The mean is defined from 434 CTD soundings from R/V Aranda between 1992 and 2017.

2.3 Observed wave heights

FMI's operational GoF wave buoy is located in the centre of the basin (62 m water depth), while the coastal wave buoy is anchored at a depth of 22 m outside of Suomenlinna, Helsinki (Fig. 1). While coastal wave measurements have been made in several places along the Finnish coast, this is the only permanent coastal wave buoy. Yearly measurements at these locations started in 2000 (GoF) and 2016 (Suomenlinna). In the Baltic Sea wave buoys are removed before the ice season, since freezing damages the sensors of the Datawell Directional Waveriders. The GoF buoy's measurement periods were 1–18 January 2016, 12 March 2016–13 January 2017, 18 April 2017–29 January 2018, 14 April 2018–14 January 2019, and 11 March–31 December 2019. For the Suomenlinna wave buoy they were 27 April–29 November 2016, 20 April–15 December 2017, 3 May–26 November 2018, and 18 April–10 December 2019.

Figure 2The significant wave height simulated by the parametric wave model compared with observations from the Suomenlinna wave buoys. The (a) 2016 data used for model calibration and (b) the 2017–2019 data are shown separately.


The wave buoys determined the significant wave height from 27 min vertical displacement time series as Hs=4m0, where m0 is the zeroth moment of the variance density spectrum (see Datawell2019). We picked every other value to get a data set with hourly time resolution. Of these values, the highest significant wave heights during 2016–2019 were 4.12 m (GoF) and 1.47 m (Suomenlinna); the mean values were 0.81 and 0.33 m.

2.4 Simulated wave heights

We complemented the coastal wave measurements at Suomenlinna with 28 years (1992–2019) of simulations produced by a parametric wave model. The model uses fetch-limited wave growth relations (Kahma and Calkoen1992), while also accounting for the wind duration and the atmospheric stratification. A description of the wave evolution scheme is presented in Appendix A.

First we simulated the wave conditions at the GoF wave buoy, which were needed as boundary conditions for the coastal wave hindcast. The GoF simulation accounted for the locally generated waves in the middle of the gulf and waves propagating from the eastern part of the basin and the NBP – also simulated using the same model and the local wind. We then determined the fraction of the GoF wave energy that arrived to Suomenlinna; these waves have longer periods than what can be generated by the local wind and the nearest fetch, and will henceforth be called “long waves”. This long wave energy was quantified as the difference between the observed wave energy (for 2016) and the modelled local wave energy at Suomenlinna. The attenuation – as a function of the wind direction – was determined using a linear regression between the simulated wave energy at the GoF and the estimated long wave energy at Suomenlinna. The final time series of coastal significant wave height was determined from the sum of the simulated local wave energy at Suomenlinna and the attenuated GoF wave energy.

The simulated significant wave heights were validated using wave buoy data from 2016 to 2019. At GoF the −0.02 m bias and 0.28 m RMSE (N=25 293) were slightly worse than for state-of-the-art numerical wave models (e.g. Tuomi et al.2011; Björkqvist et al.2018). Nonetheless, the accuracy was sufficient to serve as boundary conditions for the coastal simulations at Suomenlinna, which had a 0.00 m bias and 0.10 m RMSE (N=13 574) (Fig. 2). For Suomenlinna the validation statistics compare well with those from high-resolution numerical wave model simulations (Björkqvist et al.2020), but the parametric model showed a slight tendency to overestimate the highest significant wave heights (Fig. 2b). This is natural, since a linear regression cannot account for the increased wave–bottom interaction for the higher and longer waves during stronger winds that were not captured in the calibration period. We can, however, conclude that the quality of the simulated wave height time series is adequate for the purpose of this study.

2.5 Ice conditions and wave statistics types

Data originating from FMI's ice charts described the ice conditions at the southern point of the Suomenlinna islands (Fig. 1). The data represent an area with a radius of roughly 1 km and are therefore fairly representative of the wave buoy location. Of the statistics available for the entire study period (1992–2019), we used the start and end dates of the permanent ice-cover. During this time – defined as our ice period – the ice concentration is typically high (over 80 %), making it reasonable to set the significant wave height to zero. The length of our ice period varied from zero (winters 2007–2008, 2017–2018, and 2018–2019) to 139 d (winter 2002–2003).

Table 1The sea level (still water level, zstill) and significant wave height (Hs) data sets used in this study.

Download Print Version | Download XLSX

Tuomi et al. (2011) defined four different wave statistics for water bodies with a seasonal ice cover. Using this classification, we compiled the following data sets (Table 1):

  • Data set F (ice-free time): simulated Hs for the open water season only.

  • Data set I (ice-time-included): simulated Hs set to 0 for the ice period.

  • Data set N (hypothetical no-ice): full simulated Hs data set. As the wave model does not account for ice, this represents a hypothetical situation with no ice present.

  • Data set M (measurement statistics): wave observations only (this time is a true subset of the ice-free period because of measurement gaps).

We divided the 28-year data sets (F, I, and N) into seven consecutive 4-year periods, from 1992–1995 to 2016–2019. These 4-year time series were, where applicable, used to study temporal variability.

2.6 Probability distribution of the total water level

We defined the total water levelzmax, as the hourly maximum level to which the continuous water mass reaches as a combined effect of the still water level, zstill, and the wave action. To do this, we first assumed that zstill does not change during 1 h. Neglecting site-specific coastal effects, zmax in such case is determined by the highest individual wave crest, ηmax. (Note that wave crest denotes the height above zstill, thus being half of the actual height of the highest wave.) At Suomenlinna the highest single wave crest during 30 min is approximately 92 % of the significant wave height (Björkqvist et al.2019), which we rounded up to ηmax=Hs for hourly values. With these assumptions, the maximum water level elevation is

(1) z max = z still + η max = z still + H s .

The probability distributions of zmax were then calculated directly from the time series obtained from the observed and simulated zstill and Hs data. The distributions of zstill and Hs both describe the actual duration of the phenomenon, and thus have units of h yr−1. The distribution of zmax, on the other hand, has a unit of events per year, where “event” is defined as the instant hourly maximum total water level.

If simultaneous time series of zstill and Hs are not available, the cumulative distribution function (CDF) of zmax needs to be calculated based on the respective univariate distributions. If the two variables are assumed to be independent (for dependent variables, see Sect. 2.7), the CDF of their sum is the convolution

(2) F z z max = R f x ( ξ ) F y z max - ξ d ξ ,

where fx denotes the probability density function (PDF) of zstill, and Fy denotes the CDF of Hs (see Leijala et al.2018 for more details).

2.7 Copula-based probability distributions

Sklar's theorem (Nelsen2006) states that if H is a bivariate CDF with marginal CDFs F1 and F2, then a copula C exists such that

(3) H x 1 , x 2 = C F 1 x 1 , F 2 x 2 .

Thus, the joint distribution H can be expressed as a function of the marginal distributions F1 and F2 and the copula C, which describes the dependency of the variables x1 and x2. If x1 and x2 are independent, the copula is simply the product F1(x1)F2(x2). An extensive introduction to copula methods is provided by Genest and Favre (2007). A benefit of the method is that the marginal distributions can be of any form.

We studied the suitability of nine different copulas (Gumbel, Frank, Clayton, Normal, t-EV, Hüsler–Reiss, Galambos, Tawn, and Plackett) in representing the dependence structure of the observed zstill and Hs data (2016–2019). We calculated the parameters for each copula with the inversion of Kendall's τ estimator. To quantify the goodness-of-fit for each of these copulas, we calculated the Cramér–von Mises statistic Sn as defined in Eq. (2) of Genest et al. (2009).

By using the bivariate distributions consisting of each of the nine copulas and the marginal distributions (Sect. 2.8), we simulated 108 random pairs (zstill, Hs) and calculated the complementary cumulative distribution functions (CCDFs) of their sum. The calculations were performed with the R package “copula: Multivariate Dependence with Copulas” (Hofert et al.2020; Yan2007; Kojadinovic and Yan2010; Hofert and Mächler2011).

2.8 Marginal distributions and extrapolation

Probabilities of sea level extremes have been estimated using extreme value distributions with, e.g. block maxima (annual/monthly) or peak-over-threshold values (e.g. Arns et al.2013). However, when the focus is on the sum of sea level and wave height (Eq. 1), it is essential to use simultaneous values, and the extremes generally do not co-occur. Our aim was to study the correlation between these two variables in general, on a hourly time scale, not just the correlation of extremes. Thus, we chose to base our analysis on the entire data set of hourly values.

Our 28 years of hourly data allow for a direct estimate of the exceedance frequency down to 0.036 h yr−1, and we therefore had to extrapolate the distribution to cover rarer values. Extremes are typically evaluated using some extreme value distribution (Eelsalu et al.2014; Särkkä et al.2017; Kulikov and Medvedev2017), but this approach is not applicable to extrapolate hourly data needed for this study.

Figure 3Scatter plot of significant wave heights vs sea level: (a) observed wave heights from 2016 to 2019 and (b) simulated wave heights from 1992 to 2019, data sets F (grey), I (blue), and N (red). Note that every data point of the set F (ice-free time) is also included in sets I (wave height set to zero during ice-covered time) and N (hypothetical no-ice wave heights).


The probability distribution of the observed sea levels on the Baltic Sea coasts is slightly asymmetric: the high-end tail is thicker than the low-end tail (e.g. Johansson et al.2001; Kulikov and Medvedev2017). The significant wave heights in the archipelago, again, show a clear tendency to saturate (Leijala et al.2018), and are therefore not modelled well simply by applying some commonly used distribution type, such as the Weibull distribution (e.g. Battjes1972; Mathisen and Bitner-Gregersen1990). We used the empirical frequency distributions for exceedance frequencies between the 0.5 and 99.5 percentiles, which correspond to a frequency of 44 h yr−1 in the tail of the CDF/CCDF. We extrapolated the high (low) tails of the distribution by fitting two-parameter exponential functions F=e-λx-x0 to the CCDFs (CDFs) below this frequency, following, e.g. Leijala et al. (2018). This method was applied consistently to the sea level, significant wave height (only high tail), and total water level data. The PDFs were obtained by numerically differentiating the CDFs. As our main focus in this study is not on the extrapolated extreme ends of the distribution, we left more detailed considerations of the extrapolation function out and considered the simple exponential function sufficient for our purposes.

3 Results

3.1 Correlation

The simultaneous hourly sea levels and significant wave heights (Fig. 3) indicate some positive tail dependence – high sea levels co-occurring with high waves – both in the observed and simulated data. However, high waves also occur with moderate or even low sea levels, and high sea levels are not necessarily accompanied by high waves either. The individual maxima did not occur simultaneously. During the highest measured sea level (1.69 m) on 9 January 2005, the simulated wave height was a relatively modest 0.87 m. The highest observed wave height was 1.47 m on 7 December 2017, while the simulations gave 2.25 m on 23 December 2004. On those occasions, the sea levels were +0.73 and +0.94 m, respectively: high but not extreme.

The variations in both parameters are largest in winter (DJF; Fig. 4) and smallest in summer (JJA), when the wave height did not exceed 1.5 m and the sea level was between −0.3 and +1.0 m. The positive tail dependence can be seen in most of the seasonal data also.

The hypothetical no-ice data (type N) show situations with low sea level and wave heights ranging from 0 to 1 m, which would have occurred during the ice-covered period, especially in DJF (Figs. 3 and 4). In spring (MAM), also many high-sea-level and hypothetically high-wave situations occurred during the ice-covered time.

Figure 4Scatter plots of (a) winter, (b) spring, (c) summer, and (d) autumn events. Black: observed wave heights from 2016 to 2019. Other colours: simulated wave heights from 1992 to 2019; ice types F (grey), I (blue), and N (red).


As a measure of the correlation, we used Kendall's tau coefficient (τ). It measures rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. Thus, no assumption of a linear relationship between the variables is made. The sea level correlates stronger with the observed wave heights than the simulated wave heights from the same period (the measurement seasons during 2016–2019) on all seasons except DJF (Table 2). The observed wave data in DJF are very sparse (Fig. 4a), practically only consisting of 15 d of observations in December 2017 and 10 d in December 2019, and thus the high correlation (τ=0.3) should not be considered representative. Likewise, the observations for MAM only represent late April and May, as the earliest the wave buoy was deployed was on 18 April and the latest on 3 May. Of the different seasons, autumn (SON) shows the lowest correlation in all data sets.

Table 2Kendall rank correlation coefficients (τ) between the observed (Obs.) and simulated (Sim.) significant wave heights and sea levels in four seasons: winter (DJF), spring (MAM), summer (JJA), and autumn (SON). The data set “Sim. M” includes the simulated significant wave heights from the same periods as the data set “Obs. M” contains observations, thus being a subset of “Sim. F” of the years 2016–2019.

Download Print Version | Download XLSX

The different ways of handling the ice time in the 28-year data sets (F, I, N) do not generally result in large differences in the correlation, the only exception being DJF where the data set I shows clearly higher correlation than data sets F or N. There is a tendency for lower sea levels during the ice-covered period (the blue points with Hs=0 in Fig. 4a), which likely enhances the correlation.

The correlation clearly depends on wind direction measured at Harmaja (Fig. 5, nautical convention for wind direction). The strongest positive correlation occurs with southwesterly winds (240–260), while northeasterly winds (20–50) result in zero or negative correlation. The behaviour is qualitatively similar in both the observed and simulated wave heights, and with all the data sets considered (F, I and N; results for data sets I and N shown in Appendix B).

Figure 5The Kendall correlation coefficients (τ) between significant wave height and sea level for different wind directions. The blue dots show correlation in 1 direction bins, and the red dots in 10 bins. The significant wave heights from (a) buoy observations during 2016–2019 and (b) simulated data set F during 1992–2019 are shown. The direction given is the direction from where the wind blows (nautical convention), observed at Harmaja weather station. The correlation coefficients for all data (τ=0.20 and 0.16; Table 2) are shown as horizontal lines.


Figure 6The CCDFs of (a) observed hourly sea levels, (b) observed/simulated significant wave heights, (c) total water level as their sum, and (d) total water level calculated as a sum of two independent variables. The distributions shown are those for the data set F (ice-free season). The distributions were extrapolated with exponential functions below the exceedance frequency of 44 h yr−1. The seven consecutive 4-year periods shown separately for observed sea levels and simulated waves are 1992–1995 to 2016–2019.


Figure 7The observation-based CCDF of total water level during 2016–2019 (“buoy” + “fit”), the CCDF calculated assuming the variables independent (“Sum of indep.”), as well as CCDFs calculated by taking 108 random samples based on the nine different copulas combined with the observation-based marginal distributions for (a) all data and (b) situations with southwesterly wind (160–340).


3.2 Probability distributions

The CCDF of the sea level from the period 2016–2019 shows lower frequencies for high sea levels (>0.7 m) than the CCDF from the entire period 1992–2019 (Fig. 6a). The 2016–2019 distribution is also on the lower limit of the ensemble of the seven different 4-year distributions. The period 2016–2019 had less high sea levels than most of the 4-year periods, having also the lowest 4-year maximum (1.11 m) of all these periods. The distribution from 2004 to 2007 has a distinct tail with sea levels exceeding 1.45 m, while the maxima of all the other periods are below this. All these high values resulted from the exceptional storm Gudrun on 9 January 2005, when the sea level at Helsinki stayed above 1.4 m for 13 h from 01:00 to 13:00 UTC.

The shape of the CCDF obtained from the observed wave heights differs from those obtained from the simulated wave heights (Fig. 6b), the exceedance frequency decreasing more rapidly with increasing wave height in the observed CCDF than in the simulated ones. This is in accordance with the parametric wave model validation (Fig. 2b), which showed an overestimation of the highest significant wave heights.

The CCDFs of zmax calculated by summing the observed sea levels and observed or simulated wave heights according to Eq. (1), and extrapolating with exponential functions, show similar behaviour (Fig. 6c): The distribution based on observed wave heights shows lower frequencies for high zmax (>2.0 m) than any of those based on simulated wave heights. The distributions of the sum of zstill and Hs obtained by assuming the two variables independent, according to Eq. (2) (Fig. 6d) give lower probabilities for high zmax than the distributions of Fig. 6c. Below the exceedance frequency of 100 events per year down to 0.1 events per year, the assumption of independence leads to an underestimation of 0.2–0.3 m in zmax of the observed distribution from 2016 to 2019. In the simulated 4-year distributions, the underestimation at 100 events per year ranges from 0.15 to 0.28 m, and at 1 event per year from 0.09 to 0.70 m. In the distribution based on the entire 28-year simulation, the underestimation increases with decreasing exceedance frequency, from 0.06 m at 1000 events per year to 0.55 m at 0.1 events per year.

The CCDFs based on data sets F, I, and N show only minor differences (Fig. 6 for F, others in Appendix B). This indicates that the hypothetical wave and sea level behaviour during the ice-covered time does not differ from that during the open water period. The number of ice days, however, is small compared to the length of the study period: 28 d (1.9 %) during 2016–2019 and 1615 d (16 %) during 1992–2019. Thus, it is expected that they do not significantly alter the results.

3.3 Copula-based distributions

Table 3 shows the parameters of the different copulas fitted to the sea levels and significant wave heights. In Fig. 7, CCDFs obtained from these copulas and the marginal distributions are compared with the observed CCDF and the CCDF obtained by assuming the two variables independent. In accordance with the above results, the observed CCDF shows higher probabilities of high total water levels (>1.1 m) than the independent case (Eq. 2).

The copulas are divided into two distinct groups. The Clayton, Frank, and Plackett copulas remain close to the independent case. On the other hand, Tawn, Gumbel, t-EV, Galambos, and Hüsler–Reiss copulas capture more of the effect of dependence on the distribution, closely following the observed distribution down to exceedance frequency of 50 events per year. The Normal copula falls between these two groups, deviating from the observed distribution below 400 events per year. This behaviour is confirmed by the Cramér–von Mises statistics (Table 3), where smaller Sn values correspond to better fit. For the latter group Sn<1, while in the first group Sn varies from 1.22 to 3.97. The Normal copula, again, falls between these groups with Sn=1.17.

Table 3Fitted parameters of nine different copulas for sea levels and significant wave heights from buoy observations for the period 2016–2019.

Download Print Version | Download XLSX

We tested the effect of wind direction on the copula results by conducting the copula analysis separately for two subsets of the data: situations with southwesterly wind (160–340, Fig. 7b) and northeasterly wind (340–160, not shown). The results for the southwesterly wind subset are qualitatively similar to those including all data: Copulas are divided into two groups. The stronger correlation (τ=0.269) results in somewhat higher probabilities of high total water levels in the distributions. In the northeasterly wind case, on the other hand, the non-existent correlation (τ=-0.012) leads to the copula analysis being somewhat pointless, as all the copulas end up close to the independent case.

To further illustrate the differences in bivariate distributions based on different copulas, we simulated an ensemble of 21 546 samples from each, these being of the same size as the observed data set (Fig. 8). The copulas with Sn<1 (Tawn, Gumbel, t-EV, Galambos, and Hüsler–Reiss) tend to show a positive tail dependence: the slight bending of the scatter cloud into a tail on the upper right corner. The copulas with Sn>1 do not show such an apparent positive tail.

Figure 8Scatter plots of the observed sea level (zstill) and significant wave height (Hs) during 2016–2019 (top left), and scatters of 21 546 random points (equalling the count of observations) sampled from the bivariate distributions based on the nine different copulas and the independence copula.


4 Discussion

Since the wave conditions in the archipelago markedly differ from those on the open sea, a multi-year time series of wave data from a coastal location such as Suomenlinna provides excellent material for studies. As more data accumulate, it will provide more comprehensive knowledge on the local wave behaviour. We compensated for the shortness of the wave buoy time series by using 28 years of simulated wave data. The lightweight parametric wave model is considerably faster to run than high-resolution numerical wave models. Nevertheless, the bias and RMSE of the simulations were comparable.

The validation suggested that the highest wave heights are overestimated in the simulations. The attenuation of longer waves through bottom processes is indirectly accounted for in the model through the calibration against measurements. Nonetheless, this calibration might not hold for even harsher weather conditions and even longer waves. While the simulations offer a good tool for assessing the general dependence between the wave height and sea level variations, they should not be used as is for analysing extreme values. If the wave simulations of the parametric model are to be used for extreme conditions, the wave-bottom interactions need to be accounted for in a more explicit manner, and such improved model needs to be re-validated.

One of the advantages of a wave model is its ability to simulate the wave conditions during ice-covered periods, or periods when there is a risk of freezing and the wave buoy cannot be kept in the sea. This extends the time series and provides an opportunity to analyse the question, “What would happen if there was no ice?” This question is relevant when we consider the conditions in the GoF in a warmer future climate, for instance. Our results did not indicate that the wave or sea level conditions, or the correlation of these, would markedly differ from those occurring during the ice-free period in the present climate.

The wave conditions vary significantly among 4-year periods (Fig. 6). The observational period of this study (2016–2019) was relatively calm: the 4-year maximum sea level (1.11 m) was lowest among the seven consecutive 4-year sets, while the maximum simulated significant wave height (1.89 m) ranked fourth. To obtain reliable estimates for probabilities of extreme sea levels on the Finnish coast, at least 30 years of data are generally used (e.g. Pellikka et al.2018; Kahma et al.2014). Our results obtained from 4-year time series do not provide any flooding risk estimates for practical applications.

Additionally, considering flooding on the coast, the relevant parameter is the wave run-up. While our total water level (Eq. 1) represents the elevation of the highest wave crest on the open sea, it is not applicable on the waves meeting the coastline. On a steep coast, a more appropriate estimate for the total water level would be zstill+2Hs due to wave reflection (Leijala et al.2018), while on a shallow coast the eventual run-up on the coastline is less than zstill+Hs.

The behaviour of the correlation with respect to wind direction (Fig. 5) is readily explained by the geography of the GoF. Southwesterly winds push water into the gulf and simultaneously raise high waves, leading to a positive correlation. Northeasterly winds, on the other hand, push water out from the GoF while still raising waves, which results in a negative correlation. When the correlation is calculated from the entire data set, these two opposing situations with different mechanisms likely cancel each other and reduce the overall correlation. To more completely describe the effect of the correlation on the distribution of the total water level, and thus the flooding risks, one option might be to use three variables instead of two: still water level, significant wave height, and wind direction. We leave this as an option for future studies.

The sea level processes in the GoF act on different time scales, from the sub-daily storm surges to sub-weekly Baltic Sea internal variations, and weekly and longer-term changes in the total water volume. From 1 d to 1 week, the Baltic Sea mainly responds to wind and air pressure forcing like a closed basin. The response time of such variations, e.g. the transport of water between the GoF and the Baltic Proper in a seiche oscillation, is usually of the order of 1 d. In such time scale, the wind-generated waves have time to develop, and these phenomena co-occur.

The sea level variations in time scales longer than 1 week usually involve changes in the Baltic Sea water volume. These are related to longer-term (up to 60 d; Särkkä et al.2017) prevailing wind conditions, which control the in- or outflow of water in the Danish Straits. These should not have a direct relationship with the local waves. However, the longer-term weather conditions correlate with short-term weather phenomena too: e.g. prevailing westerly winds are related to more low-pressure systems with strong winds travelling over the Baltic Sea, and might thus contribute to the overall correlation.

Our results demonstrated that the assumption of independence of sea level and significant wave height leads to an underestimation of total water levels with exceedance frequencies of 100 events per year or less by about 0.2–0.3 m in the 4-year observation-based data set, and up to 0.55 m in the 28-year simulation-based data set (the difference of the distributions in Fig. 6c and d). The underestimation generally increases with lower exceedance frequencies. Since low exceedance frequencies such as 0.001–0.01 events per year are highly relevant for flooding risk considerations, further investigation of the effect of the correlation on such exceedance frequencies would be worthwhile. Quantification of this effect for frequencies that go below our observation/simulation-based data sets, however, would require a more thorough analysis of the suitability of the extrapolation function, which we leave for future studies.

5 Conclusions

Our main conclusions, related to the research questions presented in Sect. 1, are as follows.

The sea level variations and significant wave heights show a positive correlation in general (τ=0.20). The correlation depends on wind direction: Southwesterly winds lead to the strongest positive correlation (up to τ=0.5), while northeasterly winds lead to zero or negative correlation. Ice conditions did not have an apparent effect on the correlation: Including hypothetical no-ice wave heights during the ice season did not markedly alter the correlation, and neither did the setting of the wave height to zero during the ice-covered time.

The 4-year probability distribution of the total water level, calculated as a sum of the observed sea level and observed significant wave heights, gives 0.2–0.3 m higher values for high total water levels (0.1–100 events per year) than the distribution calculated by assuming the two variables independent. In the simulated 28-year distribution, the underestimation amounts to 0.2–0.5 m. As the total water level values rarer than 0.1 events per year are also likely underestimated, the dependence between the variables should be accounted for when calculating the probabilities of high total water levels, e.g. for flooding risk estimates.

The probability distribution of the sum based on the copula approach is closer to the observed distribution than the distribution based on the independence assumption. Of the nine copulas studied, the Tawn, Gumbel, t-EV, Galambos, and Hüsler–Reiss copulas proved most suitable for our data.

Appendix A: Model scheme for wave evolution

The core idea of the wave model is based on known universal properties of different dimensionless parameters, namely the energy, peak frequency, phase speed, fetch, and time:


The wave growth in the model is quantified using the fetch limited wave growth relations of Kahma and Calkoen (1992):


The coefficients p1, p2, q1, q2 for the wave growth relations depend on the atmospheric stratification as presented by the authors.

Instead of assuming that the waves are fetch limited at each time step, the model simulates a more realistic wave growth and decay by also accounting for the limitation of wind duration through the following propagation scheme:

  1. Calculate the dimensionless fetch, X̃εi-1, based on the energy from the previous time step, εi−1, and the current wind speed, Ui.

  2. Calculate a (dimensionless) duration based on the wind, Ui, and Eq. (A7) as:

    (A8) t ̃ 0 = 0 X ̃ ε i - 1 1 2 c p ̃ - 1 d X ̃ = 0 X ̃ ε i - 1 4 π f p ̃ d X ̃ , = 0 X ̃ ε i - 1 4 π q 2 X ̃ q 1 d X ̃ = 4 π q 2 q 1 + 1 X ̃ ε i - 1 q 1 + 1 .
  3. Determine an enhanced dimensionless time step as Δt̃+t̃0, where Δt is the fixed time step of the model (e.g. 1 h).

  4. As an inverse to step 2, calculate a new fetch X̃t based on the enhanced duration Δt̃+t̃0, and Eq. (A7) as:

    (A9) X ̃ t = t ̃ + t ̃ 0 q 1 + 1 4 π q 2 1 / q 1 + 1 .
  5. Calculate the minimum dimensionless fetch for a fully developed sea (using the wind speed at 10 m height) as:

    (A10) U c p = 2 π f p U g = 2 π f ̃ p = 0.82 X ̃ FD = 0.82 2 π q 2 1 / q 1 .
  6. Set the relevant fetch to X̃i=min{X̃t,X̃FD,X̃}, where X̃ is the physical dimensionless fetch.

  7. Calculate the new energy, εi, and peak frequency, fpi, using X̃i, Ui, and Eqs. (A6) and (A7).

Appendix B: Supplementary figures

Figure B1The Kendall correlation coefficients (τ) between significant wave height and sea level for different wind directions. The blue dots show correlation in 1 direction bins, and the red dots in 10 bins. The significant wave heights from simulated data sets (a) I and (b) N 1992–2019 are shown. The direction given is the direction from where the wind blows (nautical convention), observed at Harmaja weather station. The correlation coefficients for all data (τ=0.16 and 0.15) are shown as horizontal lines.


Figure B2The CCDFs of (a) observed hourly sea levels, (b) observed/simulated significant wave heights, (c) total water level as their sum, and (d) total water level calculated as a sum of two independent variables. The distributions shown are those for the data set N (hypothetical no-ice). The distributions were extrapolated with exponential functions below the exceedance frequency of 44 h yr−1. The seven consecutive 4-year periods shown separately for observed sea levels and simulated waves are 1992–1995 to 2016–2019.


Figure B3The CCDFs of (a) observed hourly sea levels, (b) observed/simulated significant wave heights, (c) total water level as their sum, and (d) total water level calculated as a sum of two independent variables. The distributions shown are those for the data set I (ice time included). The distributions were extrapolated with exponential functions below the exceedance frequency of 44 h yr−1. The seven consecutive 4-year periods shown separately for observed sea levels and simulated waves are 1992–1995 to 2016–2019.


Code and data availability

The wave, wind and sea-level observations are available through FMI's open data portal (; Finnish Meteorological Institute2020). The code for the parametric wave model is available at (Björkqvist et al.2022).

Author contributions

MMJ calculated the correlations, probability distributions, and copulas and wrote most of the paper. JVB programmed the coastal implementation of the parametric wave model, did the model runs, and contributed to writing. JS started the work and wrote the first draft. UL contributed to the calculation of the probability distributions. KKK originally developed the parametric wave model and proposed the idea of this study.

Competing interests

The contact author has declared that neither they nor their co-authors have any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


The work done by Pekka Alenius in providing the water temperature data for the LL7 monitoring site is greatly acknowledged.

Financial support

This work was partly funded by the State Nuclear Waste Management Fund in Finland through the EXWE project (Extreme weather and nuclear power plants) of the SAFIR2018 programme (The Finnish Nuclear Power Plant Safety Research Programme 2015–2018;, last access: 8 March 2022).

Review statement

This paper was edited by Philip Ward and reviewed by two anonymous referees.


Arns, A., Wahl, T., Haigh, I., Jensen, J., and Pattiaratchi, C.: Estimating extreme water level probabilities: A comparison of the direct methods and recommendations for best practise, Coast. Eng., 81, 51–66, 2013. a

Arns, A., Dangendorf, S., Jensen, J., Talke, S., Bender, J., and Pattiaratchi, C.: Sea-level rise induced amplification of coastal protection design heights, Scient. Rep., 7, 40171,, 2017. a

Battjes, J. A.: Long-term wave height distributions at seven stations around the British Isles, Deutsch. Hydrogr. Z., 25, 179–189,, 1972. a

Björkqvist, J.-V., Lukas, I., Alari, V., van Vledder, G. P., Hulst, S., Pettersson, H., Behrens, A., and Männik, A.: Comparing a 41-year model hindcast with decades of wave measurements from the Baltic Sea, Ocean Eng., 152, 57–71,, 2018. a

Björkqvist, J.-V., Pettersson, H., and Kahma, K. K.: The wave spectrum in archipelagos, Ocean Sci., 15, 1469–1487,, 2019. a, b

Björkqvist, J.-V., Vähä-Piikkiö, O., Alari, V., Kuznetsova, A., and Tuomi, L.: WAM, SWAN and WAVEWATCH III in the Finnish archipelago – the effect of spectral performance on bulk wave parameters, J. Operat. Oceanogr., 13, 55–70,, 2020. a

Björkqvist, J.-V., Kahma, K., and Pettersson, H.: bjorkqvi/KKWave: Release v1.0.0, Zenodo [code],, 2022. a

Chini, N. and Stansby, P.: Extreme values of coastal wave overtopping accounting for climate change and sea level rise, Coast. Eng., 65, 27–37,, 2012. a

Datawell, B.: Datawell Waverider Reference Manual, Tech. rep.,, last access: 28 February 2019. a

Eelsalu, M., Soomere, T., Pindsoo, K., and Lagemaa, P.: Ensemble approach for projections of return periods of extreme water levels in Estonian waters, Cont. Shelf Res., 91, 201–210,, 2014. a, b

Finnish Meteorological Institute: The Finnish Meteorological Institute's open data,, last access: 29 October 2020. a

Genest, C. and Favre, A.-C.: Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask, J. Hydrol. Eng., 12, 347–368,, 2007. a

Genest, C., Rémillard, B., and Beaudoin, D.: Goodness-of-fit tests for copulas: A review and a power study, Insurance: Math. Econ., 44, 199–213,, 2009. a

Hanson, H. and Larson, M.: Implications of extreme waves and water levels in the southern Baltic Sea, J. Hydraul. Res., 46, 292–302,, 2008. a

Hofert, M. and Mächler, M.: Nested Archimedean Copulas Meet R: The nacopula Package, J. Stat. Softw., 39, 1–20,, 2011. a

Hofert, M., Kojadinovic, I., Maechler, M., and Yan, J.: copula: Multivariate Dependence with Copulas, r package version 1.0-1, (last access: 4 August 2021), 2020. a

Hünicke, B., Zorita, E., Soomere, T., Madsen, K. S., Johansson, M., and Suursaar, Ü.: Recent Change – Sea Level and Wind Waves, in: Second Assessment of Climate Change for the Baltic Sea Basin, chap. 9, edited by: The BACC II Author Team, Springer, Cham, Heidelberg, New York, Dordrecht, London, 155–185,, 2015. a

Johansson, M. M., Boman, H., Kahma, K. K., and Launiainen, J.: Trends in sea level variability in the Baltic Sea, Boreal Environ. Res., 6, 159–179, 2001. a, b, c

Kahma, K. K. and Calkoen, C.: Reconciling discrepancies in the observed growth of wind-generated waves, J. Phys. Oceanogr., 22, 1389–1405, 1992. a, b

Kahma, K. K. and Pettersson, H.: Wave growth in narrow fetch geometry, Global Atmos. Ocean Syst., 2, 253–263, 1994. a

Kahma, K., Pellikka, H., Leinonen, K., Leijala, U., and Johansson, M.: Pitkän aikavälin tulvariskit ja alimmat suositeltavat rakentamiskorkeudet Suomen rannikolla – Long-term flooding risks and recommendations for minimum building elevations on the Finnish coast, Raportteja – Rapporter – Reports 2014:6, Finnish meteorological Institute, ISBN 978-951-697-833-1, (last access: 8 March 2022), 2014. a, b

Kojadinovic, I. and Yan, J.: Modeling Multivariate Distributions with Continuous Margins Using the copula R Package, J. Stat. Softw., 34, 1–20,, 2010. a

Kudryavtseva, N., Räämet, A., and Soomere, T.: Coastal flooding: Joint probability of extreme water levels and waves along the Baltic Sea coast, J. Coast. Res., 95, 1146–1151,, 2020. a

Kulikov, E. A. and Medvedev, I. P.: Extreme Statistics of Storm Surges in the Baltic Sea, Oceanology, 57, 772–783, 2017. a, b, c

Leijala, U., Björkqvist, J.-V., Johansson, M. M., Pellikka, H., Laakso, L., and Kahma, K. K.: Combining probability distributions of sea level variations and wave run-up to evaluate coastal flooding risks, Nat. Hazards Earth Syst. Sci., 18, 2785–2799,, 2018. a, b, c, d, e

Leppäranta, M. and Myrberg, K.: Physical Oceanography of the Baltic Sea, Springer, ISBN 978-3-540-79702-9, 2009. a, b, c, d

Lisitzin, E.: On the reducing influence of sea ice on the piling-up of water due to wind stress, Societas Scientiarum Fennica, Commentationes Physico-Mathematicae, XX, 12 pp., 1957. a

Marcos, M., Rohmer, J., Vousdoukas, M. I., Mentaschi, L., Le Cozannet, G., and Amores, A.: Increased extreme coastal water levels due to the combined action of storm surges and wind waves, Geophys. Res. Lett., 46, 4356–4364,, 2019. a

Masina, M., Lamberti, A., and Archetti, R.: Coastal flooding: A copula based approach for estimating the joint probability of water levels and waves, Coast. Eng., 97, 37–52,, 2015. a

Mathisen, J. and Bitner-Gregersen, E.: Joint distributions for significant wave height and wave zero-up-crossing period, Appl. Ocean Res., 12, 93–103,, 1990. a

Medvedev, I. P., Rabinovich, A. B., and Kulikov, E. A.: Tidal oscillations in the Baltic Sea, Oceanology, 53, 526–538,, 2013. a

Nelsen, R. B.: An Introduction to Copulas, Springer Series in Statistics, Springer, New York, USA, ISBN 978-0-387-28678-5, 2006. a

Pellikka, H., Leijala, U., Johansson, M. M., Leinonen, K., and Kahma, K. K.: Future probabilities of coastal floods in Finland, Cont. Shelf Res., 157, 32–42,, 2018. a, b

Pettersson, H. and Jönsson, A.: Wave climate in the northern Baltic Sea in 2004, (last access: 13 February 2021), 2005. a

Pettersson, H., Lindow, H., and Brüning, T.: Wave climate in the Baltic Sea 2012, (last access: 15 September 2017), 2013. a

Saaranen, V., Lehmuskoski, P., Rouhiainen, P., Takalo, M., Mäkinen, J., and Poutanen, M.: The new Finnish height reference N2000, in: Geodetic Reference Frames: IAG Symposium, 9–14 October 2006, Munich, Germany, edited by: Drewes, H., Springer, 297–302,, 2009.  a

Särkkä, J., Kahma, K. K., Kämäräinen, M., Johansson, M. M., and Saku, S.: Simulated extreme sea levels at Helsinki, Boreal Environ. Res., 22, 299–315, 2017. a, b

Soomere, T., Eelsalu, M., Kurkin, A., and Rybin, A.: Separation of the Baltic Sea water level into daily and multi-weekly components, Cont. Shelf Res., 103, 23–32,, 2015. a

Tuomi, L., Kahma, K. K., and Pettersson, H.: Wave hindcast statistics in the seasonally ice-covered Baltic Sea, Boreal Environ. Res., 16, 451–472, 2011. a, b

Vestøl, O., Ågren, J., Steffen, H., Kierulf, H., and Tarasov, L.: NKG2016LU: a new land uplift model for Fennoscandia and the Baltic Region, J. Geod., 93, 1759–1779,, 2019. a

Wahl, T., Mudersbach, C., and Jensen, J.: Assessing the hydrodynamic boundary conditions for risk analyses in coastal areas: a multivariate statistical approach based on Copula functions, Nat. Hazards Earth Syst. Sci., 12, 495–510,, 2012. a

Wahl, T., Jain, S., Bender, J., Meyers, S. D., and Luther, M. E.: Increasing risk of compound flooding from storm surge and rainfall for major US cities, Nat. Clim. Change, 5, 1093–1097,, 2015. a

Yan, J.: Enjoy the Joy of Copulas: With a Package copula, J. Stat. Softw., 21, 1–21,, 2007. a

Short summary
We analysed the correlation of sea level and wind waves at a coastal location in the Gulf of Finland using tide gauge data, wave measurements, and wave simulations. The correlation was positive for southwesterly winds and negative for northeasterly winds. Probabilities of high total water levels (sea level + wave crest) are underestimated if sea level and waves are considered independent. Suitably chosen copula functions can account for the dependence.
Final-revised paper