Both sea level variations and wind-generated waves affect coastal flooding risks. The correlation of these two phenomena complicates the estimates of their joint effect on the exceedance levels for the continuous water mass. In the northern Baltic Sea the seasonal occurrence of sea ice further influences the situation. We analysed this correlation with 28 years (1992–2019) of sea level data, and 4 years (2016–2019) of wave buoy measurements from a coastal location outside the City of Helsinki, Gulf of Finland in the Baltic Sea. The wave observations were complemented by 28 years of simulations with a parametric wave model. The sea levels and significant wave heights at this location show the strongest positive correlation (

Urbanized and heavily populated coastal regions around the world face concrete consequences of sea level rise and climate change. To ensure safe and effective coastal protection and city planning in the future, accurate and location-targeted estimates of coastal flood probabilities are in demand. The ultimate height to which the sea water rises in a coastal flood is determined by both the sea level variations – so-called still water level – as well as wind-generated waves on top of that. Thus, when estimating coastal flooding risks, these both need to be considered.

On the Baltic Sea coasts, the still water level variations mainly consist of storm surges and longer-term variations, the amplitude of tides being small. The flooding risks related to still water level have been extensively studied using long, high-quality tide gauge records, which in many locations date back to the 19th century

To account for the additional effect of wind waves,

The actual dependence of waves and sea level variations is complex. While short-term sea level variations and wind-generated waves are driven by related factors (namely wind and air pressure variations), strong winds can either fill or empty small sub-basins while simultaneously generating high waves. Also, the slowly changing total water volume of a semi-enclosed basin does not depend on the short-term wind speed.

To account for the correlation between sea level variations and wind waves when estimating the exceedance frequencies of their sum, a bivariate (two-dimensional) probability distribution is needed. Copulas are a flexible tool with which to study this geophysical problem, since they enable different types of marginal distributions of the individual variables, unlike, e.g. multivariate Weibull or normal distributions. They have therefore been used to study, e.g. storm surges in the German Bight

The GoF (Fig.

On top is a larger view of the study area covering the Northern Baltic Proper (NBP) and Gulf of Finland (GoF). The coastal area off Helsinki around Suomenlinna is shown on the bottom. The Suomenlinna ice station represents the ice conditions for a radius of about 1 km. The contours show the water depth in metres.

On time scales longer than a week the Baltic Sea water volume variations are a result of the water exchange that is low-pass filtered by the Danish Straits. This water level component is significant, with local sea level variations being up to 1.3 m

The wind-generated waves in the GoF are influenced by the fetch, the seabed, and the narrowness of the gulf

Finally, both the sea level variations and the waves are affected by the ice season, which in the Baltic Sea is 5–7 months

In this paper, we explore the correlation of sea level and wind waves near Helsinki on the northern coast of the GoF, with the aim of determining its effect on the exceedance frequencies of their sum. The study is motivated by the high relevance of flood risk estimates for the City of Helsinki, as well as the recently obtained 4-year time series of coastal wave buoy data. These data, together with the multi-decadal time series of tide gauge observations, provide an excellent data set for such case study. We aim at answering the following research questions:

To what extent are these two variables correlated, and does it depend on season, ice conditions, or wind direction?

What is the observation-based probability distribution of the sum of the still water level and wave height? How much does it differ from a distribution formed assuming that the two variables are independent?

Can a copula-based bivariate distribution account for the correlation of the two variables, and more accurately describe the probability of the sum?

This paper is structured as follows. In Sect.

Automated tide gauges have been operational on the Finnish coast since the late 19th century, at Helsinki since 1904 (see e.g.

The average sea level during 1992–2019 at Helsinki was

FMI's automatic weather stations provided wind speed, wind direction, and air temperature measurements (Fig.

We calculated hourly wind and air temperature values by either interpolating the older 3 h data (up until mid- to late 1990s) or by calculating hourly averages of later denser measurements. All gaps shorter than 6 h were interpolated. Continuous data blocks that were shorter than 12 h were removed. Sea surface temperature (SST) data were only available from Harmaja starting at the end of June 1995. For other times and locations we estimated the SST using a mean yearly cycle from the GoF monitoring station LL7 (Fig.

FMI's operational GoF wave buoy is located in the centre of the basin (62 m water depth), while the coastal wave buoy is anchored at a depth of 22 m outside of Suomenlinna, Helsinki (Fig.

The significant wave height simulated by the parametric wave model compared with observations from the Suomenlinna wave buoys. The

The wave buoys determined the significant wave height from 27 min vertical displacement time series as

We complemented the coastal wave measurements at Suomenlinna with 28 years (1992–2019) of simulations produced by a parametric wave model. The model uses fetch-limited wave growth relations

First we simulated the wave conditions at the GoF wave buoy, which were needed as boundary conditions for the coastal wave hindcast. The GoF simulation accounted for the locally generated waves in the middle of the gulf and waves propagating from the eastern part of the basin and the NBP – also simulated using the same model and the local wind. We then determined the fraction of the GoF wave energy that arrived to Suomenlinna; these waves have longer periods than what can be generated by the local wind and the nearest fetch, and will henceforth be called “long waves”. This long wave energy was quantified as the difference between the observed wave energy (for 2016) and the modelled local wave energy at Suomenlinna. The attenuation – as a function of the wind direction – was determined using a linear regression between the simulated wave energy at the GoF and the estimated long wave energy at Suomenlinna. The final time series of coastal significant wave height was determined from the sum of the simulated local wave energy at Suomenlinna and the attenuated GoF wave energy.

The simulated significant wave heights were validated using wave buoy data from 2016 to 2019. At GoF the

Data originating from FMI's ice charts described the ice conditions at the southern point of the Suomenlinna islands (Fig.

The sea level (still water level,

Data set F (ice-free time): simulated

Data set I (ice-time-included): simulated

Data set N (hypothetical no-ice): full simulated

Data set M (measurement statistics): wave observations only (this time is a true subset of the ice-free period because of measurement gaps).

We divided the 28-year data sets (F, I, and N) into seven consecutive 4-year periods, from 1992–1995 to 2016–2019. These 4-year time series were, where applicable, used to study temporal variability.

We defined the

If simultaneous time series of

Sklar's theorem

We studied the suitability of nine different copulas (Gumbel, Frank, Clayton, Normal, t-EV, Hüsler–Reiss, Galambos, Tawn, and Plackett) in representing the dependence structure of the observed

By using the bivariate distributions consisting of each of the nine copulas and the marginal distributions (Sect.

Probabilities of sea level extremes have been estimated using extreme value distributions with, e.g. block maxima (annual/monthly) or peak-over-threshold values

Our 28 years of hourly data allow for a direct estimate of the exceedance frequency down to 0.036 h yr

Scatter plot of significant wave heights vs sea level:

The probability distribution of the observed sea levels on the Baltic Sea coasts is slightly asymmetric: the high-end tail is thicker than the low-end tail

The simultaneous hourly sea levels and significant wave heights (Fig.

The variations in both parameters are largest in winter (DJF; Fig.

The hypothetical no-ice data (type N) show situations with low sea level and wave heights ranging from 0 to 1 m, which would have occurred during the ice-covered period, especially in DJF (Figs.

Scatter plots of

As a measure of the correlation, we used Kendall's tau coefficient (

Kendall rank correlation coefficients (

The different ways of handling the ice time in the 28-year data sets (F, I, N) do not generally result in large differences in the correlation, the only exception being DJF where the data set I shows clearly higher correlation than data sets F or N. There is a tendency for lower sea levels during the ice-covered period (the blue points with

The correlation clearly depends on wind direction measured at Harmaja (Fig.

The Kendall correlation coefficients (

The CCDFs of

The observation-based CCDF of total water level during 2016–2019 (“buoy”

The CCDF of the sea level from the period 2016–2019 shows lower frequencies for high sea levels (

The shape of the CCDF obtained from the observed wave heights differs from those obtained from the simulated wave heights (Fig.

The CCDFs of

The CCDFs based on data sets F, I, and N show only minor differences (Fig.

Table

The copulas are divided into two distinct groups. The Clayton, Frank, and Plackett copulas remain close to the independent case. On the other hand, Tawn, Gumbel, t-EV, Galambos, and Hüsler–Reiss copulas capture more of the effect of dependence on the distribution, closely following the observed distribution down to exceedance frequency of 50 events per year. The Normal copula falls between these two groups, deviating from the observed distribution below 400 events per year. This behaviour is confirmed by the Cramér–von Mises statistics (Table

Fitted parameters of nine different copulas for sea levels and significant wave heights from buoy observations for the period 2016–2019.

We tested the effect of wind direction on the copula results by conducting the copula analysis separately for two subsets of the data: situations with southwesterly wind (160–340

To further illustrate the differences in bivariate distributions based on different copulas, we simulated an ensemble of 21 546 samples from each, these being of the same size as the observed data set (Fig.

Scatter plots of the observed sea level (

Since the wave conditions in the archipelago markedly differ from those on the open sea, a multi-year time series of wave data from a coastal location such as Suomenlinna provides excellent material for studies. As more data accumulate, it will provide more comprehensive knowledge on the local wave behaviour. We compensated for the shortness of the wave buoy time series by using 28 years of simulated wave data. The lightweight parametric wave model is considerably faster to run than high-resolution numerical wave models. Nevertheless, the bias and RMSE of the simulations were comparable.

The validation suggested that the highest wave heights are overestimated in the simulations. The attenuation of longer waves through bottom processes is indirectly accounted for in the model through the calibration against measurements. Nonetheless, this calibration might not hold for even harsher weather conditions and even longer waves. While the simulations offer a good tool for assessing the general dependence between the wave height and sea level variations, they should not be used as is for analysing extreme values. If the wave simulations of the parametric model are to be used for extreme conditions, the wave-bottom interactions need to be accounted for in a more explicit manner, and such improved model needs to be re-validated.

One of the advantages of a wave model is its ability to simulate the wave conditions during ice-covered periods, or periods when there is a risk of freezing and the wave buoy cannot be kept in the sea. This extends the time series and provides an opportunity to analyse the question, “What would happen if there was no ice?” This question is relevant when we consider the conditions in the GoF in a warmer future climate, for instance. Our results did not indicate that the wave or sea level conditions, or the correlation of these, would markedly differ from those occurring during the ice-free period in the present climate.

The wave conditions vary significantly among 4-year periods (Fig.

Additionally, considering flooding on the coast, the relevant parameter is the wave run-up. While our total water level (Eq.

The behaviour of the correlation with respect to wind direction (Fig.

The sea level processes in the GoF act on different time scales, from the sub-daily storm surges to sub-weekly Baltic Sea internal variations, and weekly and longer-term changes in the total water volume. From 1 d to 1 week, the Baltic Sea mainly responds to wind and air pressure forcing like a closed basin. The response time of such variations, e.g. the transport of water between the GoF and the Baltic Proper in a seiche oscillation, is usually of the order of 1 d. In such time scale, the wind-generated waves have time to develop, and these phenomena co-occur.

The sea level variations in time scales longer than 1 week usually involve changes in the Baltic Sea water volume. These are related to longer-term

Our results demonstrated that the assumption of independence of sea level and significant wave height leads to an underestimation of total water levels with exceedance frequencies of 100 events per year or less by about 0.2–0.3 m in the 4-year observation-based data set, and up to 0.55 m in the 28-year simulation-based data set (the difference of the distributions in Fig.

Our main conclusions, related to the research questions presented in Sect.

The sea level variations and significant wave heights show a positive correlation in general (

The 4-year probability distribution of the total water level, calculated as a sum of the observed sea level and observed significant wave heights, gives 0.2–0.3 m higher values for high total water levels (0.1–100 events per year) than the distribution calculated by assuming the two variables independent. In the simulated 28-year distribution, the underestimation amounts to 0.2–0.5 m. As the total water level values rarer than 0.1 events per year are also likely underestimated, the dependence between the variables should be accounted for when calculating the probabilities of high total water levels, e.g. for flooding risk estimates.

The probability distribution of the sum based on the copula approach is closer to the observed distribution than the distribution based on the independence assumption. Of the nine copulas studied, the Tawn, Gumbel, t-EV, Galambos, and Hüsler–Reiss copulas proved most suitable for our data.

The core idea of the wave model is based on known universal properties of different dimensionless parameters, namely the energy, peak frequency, phase speed, fetch, and time:

Instead of assuming that the waves are fetch limited at each time step, the model simulates a more realistic wave growth and decay by also accounting for the limitation of wind duration through the following propagation scheme:

Calculate the dimensionless fetch,

Calculate a (dimensionless) duration based on the wind,

Determine an enhanced dimensionless time step as

As an inverse to step 2, calculate a new fetch

Calculate the minimum dimensionless fetch for a fully developed sea (using the wind speed at 10 m height) as:

Set the relevant fetch to

Calculate the new energy,

The Kendall correlation coefficients (

The CCDFs of

The CCDFs of

The wave, wind and sea-level observations are available through FMI's open data portal (

MMJ calculated the correlations, probability distributions, and copulas and wrote most of the paper. JVB programmed the coastal implementation of the parametric wave model, did the model runs, and contributed to writing. JS started the work and wrote the first draft. UL contributed to the calculation of the probability distributions. KKK originally developed the parametric wave model and proposed the idea of this study.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work done by Pekka Alenius in providing the water temperature data for the LL7 monitoring site is greatly acknowledged.

This work was partly funded by the State Nuclear Waste Management Fund in Finland through the EXWE project (Extreme weather and nuclear power plants) of the SAFIR2018 programme (The Finnish Nuclear Power Plant Safety Research Programme 2015–2018;

This paper was edited by Philip Ward and reviewed by two anonymous referees.