Regional analysis of multivariate compound coastal flooding potential around Europe and environs: sensitivity analysis and spatial patterns

. In coastal regions, floods can arise through a combination of multiple drivers, including direct surface run- off, river 15 discharge, storm surge and waves. In this study, we analyse compound flood potential in Europe and environs caused by these four main flooding sources using state-of-the-art databases with homogenous coherent forcing (i.e., ERA5). First, we perform an analyseis to assess the sensitivity of the compound flooding potential to several factors: 1) sampling method; 2) time window to select the concurrent event of the conditioned driver; 3) dependence metrics; and 4) wave-driven sea level definition. We observe higher correlation coefficients using annual maxima than peaks over threshold. Regarding the other factors, our results 20 show similar spatial distributions of the compound flooding potential. Second, the dependence between the pairs of drivers using the Kendall’s rank correlation coefficient and the joint occurrence are synthesized for coherent patterns of compound flooding potential using a clustering technique. This quantitative multi-driver assessment not only distinguishes where overall compound flooding potential is the highest, but also discriminates which driver combinations are more likely to contribute to compound flooding. We identify that hotspots of compound flooding potential are located along the southern coast of the 25 North Atlantic Ocean and the northern coast of the Mediterranean Sea.

sea level along the UK coast. Such quantifications of compound flooding potential are based on dependence measures (e.g., using correlation coefficients, Wahl et al., 2015; or joint occurrence, Hendry et al., 2019) or bivariate statistical models (e.g., bivariate logistic threshold-excess model, Zheng et al., 2013;or copulas, Wahl et al., 2015, Moftakhari et al., 2017, Ward et 70 al., 2018. Furthermore, recent advances in large-scale sea level and river discharge modelling (Muis et al., 2016, Yamazaki et al., 2014 which provide time-series of these drivers over durations of more than 30 years, allow the identification of potential hotspots at country, continental and global scales (Wu et al., 2018;Bevacqua et al., 2019Bevacqua et al., , 2020Couasnon et al., 2020).
Regarding the identification of compound events, conditional sampling is usually applied (Wahl et al., 2015;Ward et al., 2018;Couasnon et al., 2020) which implies that compound events are conditioned to one of the drivers being extreme. For this 75 reason, when limiting to a bivariate characterization of compound events (e.g., when using correlation coefficients), two subsets of extreme events are identified, and the dependence is analysed when one or the other of the variables is extreme.
Another option is to select pairs of high values when both variables exceed individual high percentiles (e.g., 95th percentile, as Bevacqua et al., 2019), but in this case, compound events are defined only when both individual drivers are characterized as being extreme. This issue is similar to what happens when measuring compound flooding potential based on factors or 80 indices that quantify the effect of the dependency using copulas with AND hazard scenarios , Ganguli and Merz, 2019, Couasnon et al., 2020. However, it could be sufficient that only one of the driver variables was extreme to make a bivariate occurrence hazardous (OR hazard scenario, Moftakhari et al., 2017).
In the compound flooding studies summarised above, it is evident that a wide range of different statistical approaches have been used to define compound flooding potential, usually caused by the combination of two drivers. To date, no study has yet 85 analysed the dependence between the four potential drivers of flooding in coastal regions independently for each pair combination. Only Hendry et al. (2021, submitted) have to date considered all four potential drivers of flooding in coastal regions (precipitation, river discharge, sea level and waves). Recently, Ridder et al., (2021) have identified hotspots of compound events that potentially cause high-impact floods related to wet conditions based on the joint occurrence of multiple hazards pairs (precipitation, wind, hail, streamflow and storm surge). In other studies, the wave component has typically been 90 included in the sea level directly (by combining wave height with storm surge and/or astronomical tide) without analysing the correlation with the other drivers (Bevacqua et al., 2019, Paprotny et al., 2020. Similarly precipitation and river discharge have often been considered as an equivalent driver in the analysis of compound coastal flooding in combination with storm surge (Bevacqua et al., 2020). Additionally, different sampling methods to identify compound events and dependence measures to quantify compound flooding potential are used. The net effect of these variations in practice is to complicate comparisons 95 between different studies.
The overall aim of this paper, is to perform a regional analysis along the North Atlantic (27ºN -70.5ºN), Baltic, Mediterranean and Black Sea coastlines of the compound flood potential caused by pluvial, fluvial and oceanographic sources in the periodduring 1979 to -2018, using state-of-the-art model hindcasts with homogenous forcing (i.e., ERA5). TIn addition to this aim, two specific objectives are defined: 1) to assess the sensitivity of the compound flood potential to several factors that can 4 affect the identification of compound events and the analysis of the spatial distribution of compound flooding potential, and 2) to detect different types of compound events and the spatial patterns of compound flooding potential that arise from the combination of the four drivers. The paper is structured as follows. The datasets and methods are detailed in Section 2s and 3, respectively. The results of the two objectives are discussed in Section 4. Key findings are discussed in Section 5, with conclusions given in Section 6. 105

Data
We use modelled data to cover the entire coastlines that are focus of this study, using long-term, spatially continuous and temporally consistent gridded data for all four flood source variables, as discussed in each of the sub-sections below. Of note is that although these databases are not available on a common grid and there are differences in their spatial resolution, they are all derived from the European Centre of Medium-Range Weather Forecast (ECMWF)'s latest global atmospheric reanalysis 110 (ERA5, Hersbach et al., 2020), or from hindcasts forced by this atmospheric reanalysis. The new database GloFAS-ERA (Harrigan et al., 2020) is used for the first time to characterize river discharge when studying compound flooding potential.
We do not account for pluvial flooding directly, as pluvial flooding is a much smaller scale process. Instead, following Wahl et al (2015), we use precipitation at each analysis site as a proxy for surface runoff potential. Our four flood source variables are, therefore: precipitation (P), river discharge (Q), storm surge (S) and waves (W), this latter variable being characterized by 115 the significant wave height. Each of the four databases employed are described in the following sub-sections, including how we have selected the locations for the sensitivity analyses and compound event characterization.

Precipitation time-series
Precipitation time-series have been extracted from the ERA5 reanalysis, which is based on the Integrated Forecasting System (IFS) Cy41r2, which has been used in the ECMWF operational medium range forecasting system since 2016 and benefits from 120 a decade of developments in model physics, core dynamics and data assimilation (Hersbach et al., 2020). The ERA5 reanalysis replaces the ERA-Interim reanalysis with a significantly enhanced horizontal resolution of 31 km, compared to 80 km for ERA-Interim. Long-term (1998Long-term ( -2018 and monthly average precipitation rates from ERA-Interim and ERA5 have been evaluated by comparing them with values from other datasets (e.g., the version 7 of NASA's TRMM Multi-satellite Precipitation Analysis (TMPA) 3B43 dataset, version 7, Huffman et al. 2010) and there is a marked improvement in the 125 estimated precipitation in ERA5 compared to ERA-Interim (Hersbach et al., 2020). The ERA5 hourly dataset spans 1979 onwards and it is currently publicly available at the Copernicus Climate Change Service. Here, accumulated daily precipitation is calculated from hourly data.

River discharge time-series
River discharge time-series were extracted from the Global Flood Awareness System (GloFAS)-ERA5 reanalysis (Harrigan 130 et al., 2020). This reanalysis is a global gridded dataset (excluding Antarctica), with a horizontal resolution of 0.1° at a daily time step and with a 40 years long duration starting 1 January 1979. The GloFAS-ERA5 river discharge reanalysis was produced by coupling the land surface model runoff component of the ECMWF ERA5 global reanalysis with the LISFLOOD hydrological and channel routing model (van der Knijff et al., 2010). LISFLOOD allows the lateral connectivity of ERA5 runoff grid cells routed through the river channel to produce river discharge. ERA5 runoff is produced from the HTESSEL 135 land surface model (Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land; Balsamo et al., 2009) with an advanced land data assimilation system to assimilate conventional in-situ and satellite observations for land surface variables.
Groundwater and river routing parameters in GloFAS were calibrated against river discharge observations for 1,287 catchments globally by Hirpa et al. (2018). A total of 463 of the largest lakes (surface area > 100 km 2 ) and 667 largest reservoirs have been incorporated into the GloFAS river network. 140

Storm surge time series
Hourly and daily storm surge time-series have been extracted from the Coastal Dataset for the Evaluation of Climate Impact (CoDEC) . The third generation Global Tide and Surge Model (GTSM, Kernkamp et al., 2011), with a coastal resolution of 2.5 km (1.25 km in Europe), was forced with meteorological fields from the ERA5 climate reanalysis to simulate extreme sea levels for the period 1979 to 2017. Besides the increase of the resolution of GTSM v3.0 from 5 km along 145 the coast (50 km in the deep ocean) to 2.5 km along the coast (25 km in the deep ocean), the GTSM v3.0 model performance for tides was also improved by the implementation of additional physical processes. The validation against observed sea levels demonstrated a good performance which reflects not only the more skilful hydrodynamic simulations, but also the accuracy of the meteorological forcing. ERA5 represents better the evolution of weather systems due to an increase of the spatial and temporal resolution. The annual maxima had a mean bias 50% lower than the mean bias of the previous Global Tide and Surge 150 reanalysis dataset (Muis et al., 2016).

Wave time series
Hourly wave time-series have been extracted from the ERA5 reanalysis at a spatial resolution of 0.5° with some improvements and updates compared to ERA-Interim (Hersbach et al., 2020). The model bathymetry was updated to use a more recent version of ETOPO2 (NOAA 2006). A new wave advection scheme was introduced in the WAM model with a revised unresolved 155 bathymetry scheme to better account for the propagation along coastlines and to better model the impact of unresolved islands (Bidlot 2012). The slow attenuation of long period swells as well as the impact of shallow water on the wind input was introduced with an overall retuning of the level of dissipation due to white-capping (Bidlot 2012). The atmosphere and ocean are coupled by a two-way interaction: the atmosphere generates ocean waves through the surface wind stress, while the waves influence the atmospheric boundary layer via sea-state dependencies in the surface roughness. Altimeter measurements were 160 used to assimilate information on significant wave height. Independent buoy data were used for validation showing significant improvement in the wave height in ERA5 data compared to ERA-Interim (Hersbach et al., 2020).

Selection of the study locations
The spatial resolution of the four datasets is shown in Figure 1a for the area of Ireland, UK and north-western France. The 165 ERA5 precipitation database has a resolution of 0.25°x0.25°, the ERA5 wave database has a resolution of 0.5°x0.5°, the CODEC storm surge database has a resolution of 2.5 km along the coast, and the GloFAS-ERA5 river discharge database has a resolution of 0.1°x0.1°. The river network data implemented in GloFAS for routing operations was produced using finescale hydrography inputs from HydroSHEDS (Lehner et al., 2008).
We used HydroSHEDS data to identify the mouths of rivers with a catchment area higher than 1,000 km 2 to adjust the 170 distribution of study locations for a regional analysis. The GloFAS grid nodes closest to the locations of the river mouths were checked and we selected the grid node with the highest river discharge. The closest grid nodes of the precipitation, wave and storm surge databases to the selected GloFAS grid nodes were identified and are shown in Figure 1b. Across the whole domain (see Figure S1) we analyse 540 locations. Figure 1: a) Spatial resolution of the precipitation (ERA5), river discharge (GloFAS-ERA5), storm surge (CoDEC) and wave (ERA5) data in the area of Ireland, UK and north-western France. b) Selected river discharge grid nodes at the mouth of rivers with a catchment area > 1000 km 2 and the closest precipitation, storm surge and wave grid nodes along these coasts.

Methods 180
We characterize the compound flooding potential generated by the four flood drivers (P, Q, S, W) by calculating the dependence between all possible pairs of the four main source variables along the coasts in the study areaof the Eastern North Atlantic, Mediterranean Sea and Black Sea. In addition, we also superimpose linearly the storm surge and wave components into a combined sea level, ignoring the astronomical tidal component of sea level, as it which is deterministic (we note that for actualdetailed flood risk assessments the timing of tidal levels with the other drivers is important, but this is beyond the scope 185 of this analysis). We do this considering two definitions of the wave-driven sea level component: 1) a simplified for the wave contribution to sea level as 0.2W (e.g., Vousdoukas et al., 2017), called this variable SW, and, 2) athrough the use of more sophisticated semi-empirical formulations (e.g., Vitousek et al., 2017), that also considerstakes into account the wave period In the following sub-sections, we first describe the sensitivity analysis designed to elucidate the extent to which the methodology employed affects the quantification of compound flooding potential. Second, we describe the clustering method used here to identify different types of multivariate compound events and the spatial distribution of compound flooding potential thereby arising across the four flood drivers considered in this study.

Sensitivity analysis 200
Our first objective is to assess the sensitivity of compound flood potential to different aspects of the underpinning methodology: 1) sampling method used, 2) time window investigated, 3) dependence metrics employed, and, 4) definition of wave-driven sea level. Each of these sensitivity tests is discussed in turn below.

Sampling
Evaluation of the dependence between multiple drivers is limited to a bivariate analysis which imposes a two-sided conditional 205 sampling to select multivariate extremes. Compound extreme events are defined in previously published studies are either selected using Annual Maxima (AM) or Peak Over Threshold (POT). Although Ward et al. (2018) performed a sensitivity analysis and compared correlations between river discharge and storm surge using both POT with two thresholds (equal to 95th and 99th percentile) and AM sampling, here we build on that prior work to examine also the effect of the sampling method in the dependence between the four coastal flooding drivers considered. The disadvantage of the AM is that events are selected 210 that might not be considered extreme in the dominant variable. On the other hand, the POT approach increases the amount of selected extreme events but introduces two parameters in the selection process: 1) the threshold; 12) the definition of independent events established by a minimum time between peaks or below the threshold. of the time between peaks for each peak to be considered an independent event, and; 2) the threshold. The independence between extreme events is assured by de-clustering the events based on the duration of the storms in the study area and selecting the highest event within each storm. 215 The criteria used to select independent events in this study comprise a storm duration of 5 days for river discharge and 3 days for precipitation, storm surge and waves. These values were selected based on an analysis of the duration of highest storms conditioned to each variable in the study domain and following numbers used in previous studies , Hendry et al., 2019, Marcos et al., 2019, Bevacqua et al., 2019. Many methodologies for an automated threshold selection have been proposed based on graphical methods combined with goodness of fit (e.g., Solari et al., 2017), but such techniques are difficult 220 to implement in regional studies due to the different characteristics of time series of the several drivers involved in coastal flooding. Hence, we decide to apply the POT method with a threshold that guarantees 3 (POT3) or 6 (POT6) events/year to also analyse also the effect of the value of the threshold.

Time window
The conditional sampling introduces another factor that could affect the definition of compound events and this is related to 225 the selection of the concurrent value of the secondary variable to the identified extreme events of the dominant variable.
Specifically, there could be a temporal lag between variables that leads to a potential coastal flooding event. This lag can be implemented after identifying both series of extremes from the two drivers (Hendry et al., 2019) or by a time window when identifying the value of the conditioned variable (Wahl et al., 2015, Coausnon et al., 2019. Once a time window (Δt) is established, the value of the secondary variable is selected as the maximum value within ± Δt days. A variety 230 of temporal windows have been considered, from zero lag (Bevacqua et al., 2020), through ±1 day (Wahl et al., 2015), ±3 days (Couasnon et al., 2020) to ±5 days , Hendry et al., 2019. Furthermore, although river discharge data have been extracted at the river mouth, not all the databases for the four coastal flooding drivers have the same spatial resolution, varying considerably the distance between grid nodes at each location of the study domain. Therefore, here we test the sensitivity of the identification of bivariate compound events to time windows between 1 andof ±10 days or 1 and ±3 days, 235 keeping the lag which provides the highest correlation coefficient between each pair of variables at each location.

Dependence metrics
Several correlation coefficient definitions (e.g., Kendall's tau, Wahl et al., 2015; or Spearman's rho, Couasnon et al., 2020) and other metrics for the characterization of the dependence between events when both drivers are extreme (e.g., joint occurrence, Hendry et al., 2019, or tail dependence, Marcos et al., 2019 have been used in previous studies. Here we analyse 240 the extent to which characterization of compound events could be affected by the selection of these varying dependence metrics. Correlation coefficient: We use the Kendall's rank correlation coefficient tau and Spearman's rank correlation coefficient rho because they are commonly used nonparametric methods of detecting associations between two variables. Significance is assessed at = 0.05 (i.e., 95% confidence level), using corresponding p values. The Spearman's rank correlation between two 245 variables is defined as the covariance of the two variables normalized by the product of their standard deviations between the rank scores of those two variables. The Kendall's tau is also a rank order correlation coefficient which quantifies the difference between the % of concordant and discordant pairs among all possible pairwise events. The Kendall's correlation is considered to be more robust than the Spearman's correlation because it offers better estimates with smaller asymptotic variance and is less susceptible to outliers (Ganguli and Merz, 2019). 250 Joint occurrence: The 'joint occurrence method' (Hendry et al., 2019) consists of simply counting the number of times extreme events are identified in the two drivers analysed within the time window (Δt) considered.
Tail dependence: The dependence structure in the tails between two variables can be measured by the pair of statistics (χ, ̅ ) (e.g., Coles et al., 2001). Both coefficients χ and ̅ are defined as limit values which tend to 1 if both variables are asymptotically dependent over a certain threshold. The coefficient χ represents the probability of bivariate extreme events 255 when both variables are extreme and provides a measure of the dependence strength (Marcos et al., 2019), referred to as extremal correlation (Zscheischler et al., 2020a). ̅ is the residual tail dependence coefficient and contains additional information about the association (-1< ̅ <1) between extremes of both variables when they are asymptotically independent (χ=0). We use the function taildep from the R package extRemes (Gilleland and Katz, 2016) to derive these values.
Although non-linear interactions between storm surges and waves could amplify the magnitude of the sea level, the assumption that both contributions may be linearly summed is generally adopted and has often been used as a proxy of coastal flooding driven by oceanographic variables (Rueda et al., 2016, Bevacqua et al., 2019, Marcos et al., 2019. Regarding the wave contribution to sea level, when wind-generated waves approach nearshore and break in the shallow surf zone, they induce variations of the sea level at different time and space scales enhancing coastal flooding. The highest wave-driven contribution 265 to the total water level, called run-up, depends on two dynamically different processes: (1) wave setup, which is a time-average sea level rise occurring over a few hours to several days, and which is determined by local wind sea and swell conditions, and (2) swash which is a high-frequency process by which sea level fluctuates due to individual incident waves, with an additional low-frequency component generated by infragravity waves (related to the presence of groups in incident short waves). The magnitude and expanse of both components depends on the sea-state characteristics (significant wave height, period and 270 spectrum shape; Guza and Fedderson, 2012), as well as nearshore bathymetry and topography. Spatially, setup could extend from tens of meters in steep coastal areas to several kilometres in low-sloping coastal areas, while runup extension varies from a few meters to on the order of a hundred meters in reflective and dissipative environments, respectively (Dodet et al., 2019).
Runup is not usually included in the wave component of the sea level driver in coastal flooding analysis because its temporal duration is on the order of hours and requires local geological characteristics that could artificially inflate the wave contribution 275 in global and regional studies (Aucan et al., 2018). The setup contribution is defined using empirical formulations with different levels of sophistication. Wave setup has been approximated as the significant wave height multiplied by 0.2 (Vousdoukas et al., 2018, Bevacqua et al., 2019, Marcos et al., 2019 or by applying the Stockdon formulation (Stockdon et al., 2006) with different parameterizations , Rueda et al., 2017, Melet et al., 2018. The wave setup contribution to the total water level is very sensitive to this parameterization (Aucan et al., 2018). Following Vitousek et al., 2017, we used the 280 Stockdon formulation for dissipative beaches (Eq. 1), which is known to provide similar results as using a beach slope of 0.02 (~50% of the world's beaches have slopes smaller than 0.02, Aucan et al., 2018), Furthermore, in the definition of the sea level as the sum of S and 0.2W, here called SW, we have also applied the Stockdon formulation for wave setup for dissipative beaches (as Vitousek et al., 2017) which is known to provide similar results as using a beach slope of 0.02 (~50% of the world's beaches have slopes smaller than 0.02, Aucan et al., 2018), here termed water level (WL), as the sum of S and Setup. The two 285 variables we use here to represent the sea level are: 1) SW as the sum of S and setup given as 0.2W, and 2) WL as the sum of S and the setup calculated using the Stockdon formulation. where Hs and L0 are the deep-water wave height and wavelength, respectively.

Characterization 290
The evaluation of compound flooding potential due to the combination of four drivers based on a bivariate analysis is a complex problem due to the high dimensionality (e.g., spatial variability of the dependence metrics and relative contribution of each pair of drivers). We apply a two-step cascade classification with two sub-objectives: 1) to analyse the dependence between the metrics that characterize the bivariate compound flooding potential between the pairs of drivers, and 2) to extract spatial patterns from this dataset in order to identify hot spots of compound flooding potential arising from P-Q-S-W. 295 The two-step classification method consists of the use of self-organizing maps (SOM) as the first step and applying the kmeans algorithm (KMA) as the second step (Rueda et al., 2017). In this study, SOM is applied first to take advantage of the powerful visualization characteristics but not to obtain a reduced number of clusters. The k-means algorithm is a classification technique that divides the high-dimensional data space into a number of clusters, each one defined by a centroid and formed by the data for which the centroid is the nearest . The SOM automatically extract clusters of high-300 dimensional data and projects them into a two-dimensional organized space (2D lattice), allowing an intuitive visualization of the classification (Kohonen, 2000). A SOM algorithm is a version of the KMA in which the centroids are forced with a neighborhood adaptation mechanism to a 2D lattice preserving the original topology of the data and producing that similar patterns in the original space are close in the 2D lattice. The Maximum-Dissimilarity-Algorithm (MDA, Camus et al., 2011) is applied to initialize the KMA to obtain a better distribution of the centroids over the multi-dimensional data space and avoid 305 random initialization. The optimal number of clusters is evaluated using the Davis-Boudin (Davies and Bouldin, 1979) and the gap criteria .

Sensitivity analysis
This section describes the results for the first objective, relating to the sensitivity analysis. Results from each of the four 310 sensitivity tests is described in turn.

Sampling
Two-sided conditional sampling has been applied to the seven pairs of drivers identified when applying the AM and the POT methods with either 3 or 6 events per year using a time window of ± 3 days. Figure 2 shows the comparison of the correlation coefficients between Q and P using the three approaches when either Q or P are the dominant drivers. Only this pair of variables 315 is shown because it presents the highest correlation and the purpose of this subsection is only to test the sampling method.
Locations are divided into five regions (see Figure S1 for the locations of these regions): Northwestern Europe (NEW, 99 locations); Northeastern Europe (NEE, 165 locations), Southern North Atlantic coast (SNA, 60 locations), Western the conditional pairs of extremes selected using AM while similar correlation is obtained using POT3 or POT6, with higher 320 dispersion in the lower values of the coefficients. Correlation coefficients calculated with AM subsets of extremes are, on average, around 0.2 higher than those derived with the POT approach and this is consistent across all regions. Scatter plots display data only for those locations where significant (p<0.05) correlation coefficients are present. Overall, conclusions about the comparisons for all other pairs of drivers are similar to those for Q-P ( Figure S2). Figure 2: a) Scatter plots of the Kendall's correlation (p< 0.05) between Q and P using POT3 vs AM (upper panels) and POT3 vs POT6 (lower panels), using either Q (left panels), or P (right panels) as the dominant variable. b) Histograms of the correlation coefficients using the three approaches (AM, POT3 and POT6) for each region when either Q (left panels) or P (right panels) as the dominant drivers. Figure 3 shows the comparison of the number of co-occurring events between Q and P using the three approaches. The number of events is higher when using POT3 compared to AM and when using POT3 compared to POT6, as expected. The scatter plots follow roughly the 1:3 or 1:2 slope, respectively, indicating an approximate tripling or doubling in the number of events 335 between different approaches. However, the spatial pattern is similar (correlation coefficient between the number of cooccurring events is around 0.93-0.98) which means that the three methods identify broadly equivalent areas where prone to compound events with both variables being extreme. Results for all pairs of drivers are shown in Figure S3, which has a similar behaviour, albeit with an overall lower number of co-occurring events.
340 Figure 3: a) Scatter plots of the number of co-occurring events between Q and P using AM vs POT3 (left panel) and using POT3 vs POT6 (right panel). Dashed lines mark relationships between AM and POT3 or between POT3 and POT6 covering scaling factors equal to 2.0 and 3.0.  Figure 4 shows the comparison between the highest correlation coefficient obtained when using a time window (Δt) of ± 3 days versus using Δt = ± 10 days for the pairs of variables Q and P, S, W or SW using POT3, and the join occurrence between Q and P, S, W or SW. Only locations with a significant correlation (p<0.05) are represented in Figure 4. Results indicate there 350 is no major difference in the correlation between drivers when employing the two investigated time windows. Larger differences (~0.1 higher correlation) are obtained when Q is the dominant variable in few locations (9 of the 540) where correlation is moderate overall (~0.30). The joint occurrences tend to be slightly higher (10 number of co-occurring events) with a higher time window but also fewer compound events (by half) are detected in locations with low-medium join occurrence (<~40) using a Δt = ± 3 days (lower left corner of scatter plots in Figure 4b). 355  Figure 5a shows the comparisons when using the Kendall's versus Spearman's rank coefficients for the pairs of variables (Q-P/P-Q, Q-S/S-Q, Q-W/W-Q, Q-SW/SW-Q, P-S/S-P, P-W/W-P, P-SW/SW-P), using a time window of ± 3 days and across the 370 three approaches (AM, POT3, POT6) considered in the conditional sampling. There is a categorical correspondence between both correlation coefficients, with Kendall's coefficient having a tendency to be smaller than the Spearman's coefficient. Therefore, to characterize compound events in terms of correlation and its spatial distribution, the information provided by both coefficients is equivalent. The number of locations with significant correlation is very similar with both correlation coefficients (see Table 1). 375 empirical estimates for increasing threshold levels (Zscheischler et al., 2020a). Here, we decided to estimate χ at a probability threshold 0.95 after careful examination of the results for different levels. The comparison between χ and the joint occurrences divided by 3 events per year and the number of years (40) for the pair of variables Q-S is shown in Figure 5b. There is high correspondence (correlation coefficient around 0.9) between both dependence metrics mainly because both of them measure 380 the probability of bivariate extremes when both drivers are extreme. The remaining small differences may be due to different sampling processes leading to different extreme subsets. The statistic χ is estimated using the empirical distribution of the daily time series of Q (mean daily values) and S (maximum daily values) and the extremes are selected without any clustering. On the other hand, the number of co-occurring events was calculated using a POT with a threshold that guarantees 3 events per year and with a storm duration of 3 or 5 days to select independent events. A similar relationship between both metrics has 385 been found for other combinations of variable pairs (not shown here).  Table 1: Number of locations with significant correlation (p< 0.05) being 540 the total number of locations.

Definition of wave-driven sea level
The effect of the definition of sea level (oceanographic flooding drivers) in the characterization of compound events is analysed here by comparing the Kendall's correlation coefficients obtained between the pair of variables Q or P when using the two varying sea level definitions (SW and WL) used in this study ( Figure 6). Differences in the obtained correlation coefficients 400 are typically small (mainly around 0.05), except along the southern Atlantic coast where the differences are slightly higher than 0.15. The southern Atlantic coast region presents the highest correlations between pluvial or fluvial sources and oceanographic drivers (correlation coefficients around 0.6-0.7), within the entire study domain meaning that the identification of this region as an area with significance dependence is still preserved. The effect of the sea level definition on the correlation when Q or P are the dominant variables in the conditional sampling is much smaller (around 0.05, not shown here

Dependence between pairs of the four flooding drivers
Here we consider the results obtained using the POT3 method and a time window of ± 3 days to characterize the multivariate 420 compound events along the North Atlantic, Mediterranean and Black Sea coasts. The analysis of the dependence between S-W is performed at an hourly temporal resolution and using a smaller Δt (1 day) because it is considered that both drivers contribute simultaneously to the definition of sea level. The dependence between S-W as calculated using daily data is compared with the results using hourly data ( Figure S4 Kendall's correlation coefficient and the joint occurrences between the four pairs of variables: Q-P, Q-S, Q-W, and Q-SW are represented in Figure 7. The variables Q and P (Figure 7-a) present correlation coefficients of around 0.6-0.7 along the most southern coasts of the Atlantic study region and also in some locations in the Mediterranean Sea (coast of Gibraltar Strait, 430 Algeria, southern Italy, east coast of Turkey and Levante Region in the eastern Mediterranean), while correlation coefficients of around 0.1-0.2 are more predominant along northern European coasts (except the west coast of Jutland and west coast of the UK). Similar spatial correlations are found whether Q (red scale) or P (blue scale) are used as the dominant variable in the conditional sampling, except in locations along the French coast of the English Channel, the eastern coast of UK, and the coasts of Tunisia and Libya, where higher correlations are obtained when the compound events are conditioned to P. The joint 435 occurrence (circle size) presents a similar spatial pattern to the correlation coefficient between these two variables, with the maximum number of co-occurring events close to 100.
For Q and S (Figure 7- For Q and W (Figure 7-c), the spatial distribution is quite similar to Q-S with slightly smaller correlation along the most southern Atlantic coast and higher along the west coast of Iberian Peninsula. Correlation only when Q is the dominant variable 445 is even more pronounced in locations along the Mediterranean Sea (e.g., eastern coast of Spain). These spatial patterns are also reflected in the distribution of the correlation coefficient between Q and SW as a combination of both (Figure 7-d).
Spatial distribution of the dependence between P and S, W or SW ( Figure S5

Severity index
Here we define an index, based on driver severity, to be included in the characterization of the spatial patterns of compound flooding potential. The driver severity is calculated as the sum of normalized thresholds of each driver, applied in the 465 conditional sampling (multiplied by 0.2 in the case of W). Q thresholds, which cover a wide range of values, have been categorized into 10 intervals [0-10-25-50-100-250-500-750-1000-5000->25000 m3/s] to avoid skewing the driver severity due to very high discharge magnitude in several locations. Driver severity is divided into 11 scores from 0 to 1. Figure 8d shows the spatial distribution of the severity index (SI). Areas with highest SI are concentrated in the North Sea, the northwest of the Iberian Peninsula, the eastern coast of the Adriatic Sea, the eastern coast of the Black Sea and few locations that represent 470 large rivers. Coastal areas with the lowest SI are mainly concentrated along the southern coast of the Mediterranean Sea and the most southern coast of the Atlantic Ocean of our study domain. The SI spatial distribution indicates that an identical SI ranking can be determined by different combinations of driver extremes. To facilitate this analysis, we classify the thresholds of the four drivers (shown in Figure S6) into 10 clusters to define the main combinations of driver extremeness ( Figure S8a).
The probability of occurrence of each cluster (number of locations of the study domain represented by each cluster) associated 475 with each SI rank ( Figure S8b) provides which combinations of the four driver thresholds have an equal SI rank. For example, locations with SI equal to 0 are associated with only one cluster (represented in light green) which is defined by a combination of the lowest thresholds of Q, P and S and low W severity. On the other hand, locations with SI equal to 1 are associated with clusters 1 and 2 (in yellow and orange, respectively) which are characterized mainly by the severity of one driver (Q or S, respectively) but also associated with clusters 3, 7 and 8 (in red, dark and light blue, respectively) with high severity of two or 480 three drivers (Q, P and W, or Q and P, or Q and S, respectively). The spatial distribution of these clusters ( Figure S8c

Spatial Patterns of Compound Flooding Potential 500
The characterization of compound flooding potential can be summarized using the combination of two metrics: the Kendall's correlation (τ) and the joint occurrence (JO) for the pairs Q-P, Q-SW, P-SW and the number of co-occurring events when all three variables are extreme JO(Q-P-SW). The two-step cascade classification method is applied to the 11-dimensional array SIi], where the subscript represents the i-th grid point. Each parameter is normalized to avoid assigning different weights in 505 the classification process. We first use the SOM algorithm to obtain a large collection of centroids (20×20=400) projected onto a 2D organized lattice that helps to analyse the dependence between the 11 parameters. The hexagonal SOM of 20x20 size of the compound flooding potential derived from the 11 metrics outlined above for the study sites is shown in Figure 9. Results are shown in individual panels (Figures 9a-k) over the same 2D lattice for the different metrics defining the SOM centroids (the hexagons in a certain position correspond to the same map unit in each figure). Note that each Figure has a different scale. 510 For example, we can observe that the three parameters related with the pair Q-P (ρ1, ρ2, JO; see Figure 9a-c) present a similar distribution in the lattice which means that there is a high dependence between them. Locations with the highest correlation between Q-P(P-Q) (Figures 9a and 9b) also have the highest number of joint occurrences (Figure 9c). Centroids with the highest dependence parameters between Q-SW and P-SW are concentrated in the upper right corner of the lattice (Figure 9dj). They are also characterized by the highest number of joint occurrences between Q-P ( Figure 9c) and high-medium 515 dependence between Q-P (Figures 9a and 9b). Accordingly, the highest joint occurrences between all three variables are also found in the upper right corner (Figure 9i). Other centroids located in the upper left side of the lattice represent locations with high dependence between P-SW (Figure 9g-h) but not between Q-SW and relatively smaller dependence between Q-P.
The severity index (Figure 9k) reveals that locations with highest driver severity do not present dependence between any pair of drivers (lower right corner of each individual panel in Figure 9). These locations are the ones represented principally by 520 clusters with the severity determined by only one driver (clusters 1 and 2 in Figure 8a). The probability of many SOM centroids is null (Figure 9l) because a large lattice size (20x20) is required to guarantee that SOM centroids cover the multidimensional data space defined by the 11 parameters.

530
In the second step, we apply the KMA to find a reduced number of clusters applied to the SOM centroids. We obtain an optimal number of 8 clusters by applying the Davis-Bouldin and gap criteria. Figure 10b shows the KMA classification in 8 groups over the SOM lattice, and highlights the similarity between KMA clusters because neighbouring centroids in the 2D lattice have similar values of the eleven parameters (see black contours of Figure 9 which delimitate the SOM centroids belonging to 535 each KMA cluster). Figure 10a shows the mean value of the eleven parameters associated to each cluster and the variability within each group (25th, 75th, 5th, 95th percentile). The number of locations represented by each cluster is shown in Figure   10c.
The KMA classification reveals two clusters (red and pink groups) that represent locations where more compound flood events can occur. The pink group is characterized by the highest dependence (both correlation and joint occurrences) between Q-SW 540 occurrences between the three drivers. The SI centroid presents a medium-high severity with a wide variability. The red group represents locations with the highest dependence between Q-P (P-Q), and medium correlation, only when extremes are selected conditioned to oceanographic drivers, and high joint occurrence between Q-SW and Q-P. It is characterized by a low-medium severity index because most of the locations present mild meteocean conditions (represented by cluster 4 in Figure S8a). The 545 purple cluster follows the other two in terms of compound flooding potential. It is characterized by a high joint occurrence between Q-SW and P-SW but lower dependence between Q-P and represents locations with a high severity index. Green and blue clusters stand out for the high numbers of compound events resulting mainly from the combination of P-SW. The green cluster is also characterized by significant dependence between Q-P and Q-SW at locations with low severity meteocean climates. In contrast, the blue cluster represents locations without compound events generated by the combination of Q and 550 SW. The dark green cluster represents locations where only compound events from the combination of Q-P can occur. The The geographical distribution of the 8 KMA clusters represents the compound flooding potential patterns across the study 565 domain (see Figure 11). For example, the pink cluster which characterizes the pattern where the most compound events occur from the combination of the four drivers is distributed along the southern coasts of the North Atlantic Ocean, the eastern coast of France, as well as scattered locations along the northern coast and of the eastern coasts of the Mediterranean Sea. On the other hand, the red cluster is mainly localized along the most southern coast of the North Atlantic Ocean and isolated locations in the Mediterranean and Black Sea; recall that this cluster represents locations with low driver severity. Other locations 570 identified with significant compound flooding potential, including along the western coast of France and UK or the northeastern coast of the Mediterranean Sea, are part of the purple cluster. Figure 11: Compound flooding potential patterns identified in this study as highlighted by the spatial distribution of the 8 KMA clusters. Each KMA cluster is represented in the same colour as used in Figure 10.

Discussion
In this paper, we have analysed the compound flooding potential arising from pluvial, fluvial and oceanographic sources. The assessment is based on a bivariate analysis of the dependence between drivers (P, Q, S and W) using a two-sided conditional sampling. Our first objective is focused on the analysis of the sensitivity of the results to several factors that have been applied 585 indiscriminately in previous studies with the purpose of identifying compound events and characterizing compound flooding potential.
First, we apply AM and POT sampling approaches to analyse how the choice of these approaches affects the computed dependency between variables. It is noteworthy that our results show that a lower statistical dependence is obtained when using the POT method, which is in agreement with Ward et al. (2018), yet a broad consensus has emerged in favour of the use 590 of the POT approach for identifying extreme events (Mazas et al., 2014, Coles, 2001. The larger correlation coefficient derived using the AM approach might be due to a higher tendency that annual peaks of both drivers co-occurred, when the dependence between them is significant, while the POT method selects more combinations where drivers are less extreme, which in turn is reflected in the a lower correlation coefficient. However, AM can potentially disregard information on extremes in the remaining data from using only one data point per year (Méndez et al., 2016), or select events that are not extreme, as we have 595 observed in several of our study sites. However, the spatial patterns in both approaches are similar and comparable areas are identified as hotspots with relatively higher dependence.
In our second sensitivity analysis, we found that the choice of the time window used has almost a negligible effect on the computed correlation coefficients (at least for the time windows in excess of the 3 days used here). The higher probability of finding more severe events using a longer Δt has only be reflected in a higher correlation in few locations. However, it can 600 result in a lower number of compound events when both drivers are extreme (i.e. less joint occurrences).
In the third sensitivity analysis, we investigated differences in the characterization of the dependence between drivers using different metrics. We found that the correlations obtained are always higher when using the Spearman's rank correlation coefficient as compared to the Kendall's coefficient, but they are unequivocally related, and equivalent spatial distributions are obtained irrespective of the choice of the correlation coefficient. Regarding joint occurrences and tail dependence, we 605 found that both provide comparable quantifications of the dependence between driver variables when both variables are extreme. Moreover, the concept of joint occurrences provides a better measure of the compound flooding potential because it applies a declustering method to select independent events.
The last factor we assessed is the definition of wave-driven contribution to the sea level when using the sum of the S and W components. Any differences in correlation emerging as a result of the definition of wave-driven contribution to sea level seem 610 to be explained by a higher dependence between surge and the simplified wave-driven sea level (20% of W) than surge and setup based also on the wave period (see Figure S7) that this could be to the influence of swells. Although the same beach slope is considered in all study locations, our results showed that the combination of W and Tp in the estimation of setup influences more the selection of compound events conditioned to sea level than the semi-empirical formulation itself.
Our second objective was focused on estimating the spatial distribution of compound flooding potential considering the drivers 615 Q, P, S, W, and SW. We observed significant differences in the dependence between the pairs of drivers and even for one individual pair depending on which driver is employed as the dominant one in the selection of the compound events. We find that the correlation coefficient and joint occurrences are not always positively related. Therefore, we considered that combinations of both metrics provide complementary information about the type of compound events and represents different flooding mechanisms (Wahl et al., 2015). The joint occurrence only characterizes compound events when both drivers are 620 extreme. On the other hand, correlation coefficients characterize those compound events generated when one of the drivers is extreme but not necessarily the other, providing information about the relative severity of the secondary driver.
Regarding a comparison with previous global and regional studies of compound flooding potential in the study domain, the hotspots we have identified on the coasts of Portugal, the Strait of Gibraltar and Morocco have also been detected in Couasnon et al. (2020). However, although we found a higher number of joint Q-S occurrences on the south-west and west coasts of the 625 UK than on the east coast, as previously noted by Hendry et al. (2019), the number of co-occurrences is lower in our analysis, as is also the case around the coast of Ireland. Similar high joint occurrences between Q-S in the northern and eastern Mediterranean coasts and in the coast of Tunisia are found, in accordance with Couasnon et al. (2020). We do not observe a predominance of higher correlation when compound events are conditioned to Q as in Coausnon et al. (2020) found. However, differences in the correlation between the two conditional samples are found between P and S, W or SW. As pointed out by 630 Hendry et al. (2019), storms that generate high precipitation are different to the ones that generate high storm surges.
Specifically, heavy precipitation and extreme surges are driven by deep low-pressure systems, while intense rainfall can also be caused by convection without intense cyclonic activity (Bevacqua et al., 2019). Therefore, there is higher probability of compound events when S or W are the dominant variables (generated by extratropical storms) than when compound events are conditioned to P (convective storms). This effect seems to be less perceptible between river discharge and oceanographic 635 variables because other climatic and non-climatic factors affect the fluvial source driver (Bevacqua et al., 2020), as for example, land use characteristics or snowmelt, evaporation, and accumulated precipitation over previous weeks or months.
Bevacqua et al., 2019 found the lowest joint return periods due to high dependence between P-S were concentrated along the Atlantic coast and in the Mediterranean Sea (particularly in the regions of the Gulf of Valencia (Spain), the northwest Algeria, the Gulf of Lion (France), the Adriatic coast of Balkan Peninsula, the Aegean coast, southern Turkey and the Levante region). 640 Even though we did not calculate return periods, our results suggest similar areas of higher dependency between P-S. We find a distinct pattern between southern and northern European coasts with more joint occurrences between S-W, especially over the Irish Sea, English Channel and south coasts of the North Sea and Baltic Sea in line with the results of Petroliagkis (2018).
Similar regional patterns of dependence between S-W as we find here were reported by Marcos et al. (2019), but we find some additional local areas with strong dependence when both drivers are extreme (characterized by the statistic χ or joint 645 occurrence) such as the western coast of the Iberian Peninsula and certain areas in the Mediterranean Sea, perhaps because we used a higher Δt than in Marcos et al. (2019).
We apply a two-step clustering method to synthesize the high dimensional results of the bivariate characterization of compound flooding potential from the four sources. First, the SOM algorithm allows us to analyse the multivariate dependence between the correlation coefficients and the joint occurrences between the pairs of drivers Q-P, Q-SW and P-SW, enabling the 650 establishment of the degree of contribution of each driver combination to compound flooding in the study area. Moreover, the method also distinguishes if the identified compound events are more likely to occur when both drivers are extreme or when only one driver is extreme. With the second step of the clustering method, we identify a reduced number of types of compound events based on the contribution of each driver combination and the driver severity. The spatial distribution of these types of compound events reveals spatial patterns of compound flooding potential. These patterns allow us to discern locations with 655 the highest overall compound flooding potential and the associated contributions of each pair of drivers. Additionally, we introduce a severity index to distinguish between locations with similar compound flooding potential (from the dependence perspective) but very different driver severity.
The main limitation of our study is the identification of the compound events based on drivers. None of the contributing variables have to be necessarily extreme to create a compound flooding event. Therefore, the selection of compound events 660 should ideally be based on an impact function or a risk function that accounts for exposure and vulnerability. However, this function is usually unknown and difficult to derive, especially in regional and global studies. An intermediate approach could be based on the selection of the compound events in the extreme water levels generated by the interaction between oceanographic drivers and riverine drivers. The amplification of the flooding impact has been identified at the local scale (van den Hurk et al. 2015, Kumbier et al. 2018, and recently at global scale (Eilander et al. 2020) using a global coupled river-665 coast flood model framework (Ikeuchi et al., 2017).
Although compound flooding drivers have been found to be generally captured well in different reanalysis and hindcast products (Paptrony et al., 2020), differences in the strength of dependence derived from observations and models can vary spatially and across different variables, which is more evident when regional climate models are used, even after bias corrections have been made (Ganguli et al., 2020). In addition, Paprotny et al. (2010) also detected false positive and large 670 compound floods in observations that were missed in the modelled products. We therefore acknowledge that model biases might mischaracterize absolute values of dependence in some cases. However, the conclusions we draw from our results regarding sensitivity analysis are not likely to be altered and the relative importance of drivers and spatial patterns would also likely be less (or not at all) affected.

Conclusions 675
In this paper we have evaluated the compound flooding potential arising from the combination of precipitation, river discharge, storm surge and waves along the coasts of the eastern North Atlantic Ocean, Baltic Sea, Mediterranean Sea and Black Sea (i.e., Europe and environs). The paper provides two advances. First, we performed a series of sensitivity analyses to establish how methodological choices affect the identification of compound flood events. Specifically, we investigated: 1) the sampling method, 2) the time window used to match events in the two-way sampling, 3) the use of typical metrics applied in the 680 evaluation of the dependence between drivers, and 4) the definition of the wave-driven sea level contribution. Among these, the sampling approach showsproduces the highest differences in the quantification of the compound flooding potential.
However, none of these factors analyzed factors analysedhere causeintroduces significant differences in the spatial distribution of the compound flooding potential, sowhich means that similar resultslocations where compound flooding is more likely to occur are identified irrespective of the method. 685 Second,The second major advance forwarded by our work providescomprises a new regional characterization of compound flood potential using a methodology which aggregates the bivariate dependence between driver combinations. This multivariate characterization reveals three main locations with high compound flooding potential: the southern coast of the North Atlantic, the western coasts of France and UK and the northern coast of the Mediterranean Sea. These locations are 690 characterized by compound events that arise from the combination of the four drivers, albeit with differences related to the driver severity, the contribution of each pair of drivers and the predominance of compound events when drivers are extreme.
Other locations of relatively high compound flood potential include the eastern coast of Italy and southern Mediterranean Sea, where compound flooding is mainly driven by combinations of precipitation and sea level.P-SW.
This regional quantitative assessment of multivariate compound flooding potential can be considered as a screening pre-695 diagnostic tool for coastal management. The rResults provide information about which areas are more predisposed to experience compound flooding. In addition, this multivariate flooding potential classification identifies the relevant drivers of coastal compound flooding at each location. Thisn which can assists the selection of to decide the most appropriate methodological approach to perform high-resolution hydrodynamic and impact modelling.