Extreme waves analysis based on atmospheric patterns classification: an application along the Italian coast

The identification of homogeneous populations of data prior to perform Extreme Value Analysis (EVA) is advisable in all fields of sciences. When performing EVA on sea storms, it is also recommended to have an insight on the atmospheric processes behind the occurrence of the extremes, as this might facilitate the interpretation and ultimately use of the results. In this work, a “bottom-up” approach for the identification and classification of the atmospheric processes producing extreme 5 wave conditions is revisited, and applied to several locations among the Italian buoy network. A methodology is given for classifying samples of significant wave height peaks in homogeneous subsets, and for the computation of the overall extreme values distribution starting from the distributions fitted to each single subset. From the obtained results, it is concluded that the proposed methodology is capable of identifying clearly differentiated subsets driven by homogeneous atmospheric processes, and it allows to estimate high return-period quantiles consistent with those resulting from the usual EVA. Two well-known 10 cyclonic systems are identified as most likely responsible of the extreme conditions detected in the investigated locations. These systems are analysed in the context of the Mediterranean sea atmospheric climatology, and compared with those figured out by previous researches developed in similar frameworks.

affects the estimation of H s extreme values; (ii) to characterize the identified WPs in the framework of the Mediterranean 60 Region (MR) cyclones climatology.
The paper is structured as follows: in Sect. 2 we introduce the data and describe the methodology developed; results are presented and discussed in Sect. 3; finally, in Sect. 4 conclusions are summarized and further developments are introduced. This work takes advantage of eight hindcast points located in the Italian seas, as shown in Fig.1. This choice allowed to test the reliability of the proposed methodology under different local wave climates. In fact, the selected points are differently located along the Italian coastline, and, being exposed to different fetches, they are characterized by peculiar wave conditions. The same locations were taken into account by Sartini et al. (2015), where they performed an overall assessment of the different frequency of occurrence of the extreme waves affecting the Italian coasts. Table 1 reports the names, depths, and coordinates 70 of the selected locations.
The points correspond to as many buoys belonging to the Italian Data Buoy Network (Rete Ondametrica Nazionale or "RON", Bencivenga et al., 2012), which collected directional wave parameters over different periods between 1989 and 2012.
Unfortunately, most of the buoys are characterized by significant lacks of data due to malfunctions and maintenances of the devices. We therefore referred to hindcast data, since such a widespread lack of data would imply a loss of reliability for 75 the following analysis. We relied to the hindcast of the Department of Civil, Chemical and Environmental Engineering of the University of Genoa (http://www3.dicca.unige.it/meteocean/hindcast.html; Mentaschi et al., 2013Mentaschi et al., , 2015. It now provides wave parameters on a hourly base from 1979 to 2018 over the whole Mediterranean sea, with a spatial resolution of 0.1 • in both longitude and latitude (however, at the time the study was developed the series were defined up to 2016). Data were validated against the records of the buoys (when available); more details can be found in Mentaschi et al. (2013Mentaschi et al. ( , 2015. The wind data 80 used to drive the wave generation model were derived from the NCEP Climate Forecast System Reanalysis for the period from January 1979 to December 2010 and CFSv2 for the period from January 2011 to December 2018, downscaled over the MR at the same resolution of the hindcast, along with the pressure fields through the model Weather Research and Forecast (WRF-ARW version 3.3.1, see Skamarock, 2009;Cassola et al., 2015Cassola et al., , 2016. These wind velocities data were used here to feed the cluster analysis of the wave peaks. It should be pointed out that, in case of sea waves, other variables may concur to 85 affect their propagation (and therefore the bulk parameters), i.e. the local bathymetry and the currents. However, the bottom depth is reasonably expected not to be relevant, since all the locations investigated lie in deep water (see Table 1), while the currents were not fed into the wave model, though the hindcast data were widely validated and proved to be reliable.

Extreme events selection
For each location, wave height peaks were selected through a Peak Over Threshold (POT) approach, and in particular by using a 90 time moving window. This approach works as follows. First, the whole series of H s is spanned through a time moving window of given width; second, when the maximum of the data within the window happens to fall in the middle of the window itself, it is retained as a peak; finally, in order to get rid of the peaks which are not related to severe sea states, a first H s threshold is chosen and only peaks exceeding this threshold are retained for further analysis.
In this study, for each location the width of the moving window was set equal to one day, meaning that the inter-arrival time 95 between two successive storms is at least equal to one day. The threshold was fixed as the 95 th percentile of the resultants peaks. This ensured to maintain a uniform approach for all the locations, efficiently capturing the different features of the local wave climates. Beside the significant wave heights, we retained as well the waves mean incoming directions corresponding to the peaks (θ m ), which were used for analysing the outcomes of the clustering algorithm. Finally, for each peak we extracted the mean sea level pressure field (MSLP) and surface wind fields for several time lags (0, 6, 12 24, 36, and 48 hours earlier 100 with respect to the peak's date), over the whole MR. Wind fields were used to classify the selected peaks due to their parent WP, as described in the following section; MSLP fields were used instead for the post-processing and climatological analysis of the results.

Extreme events classification: definition of weather patterns
The classification of extreme events is based on surface wind fields (ū w ) observed in the whole MR during the hours before and 105 concomitant to the time of the peaks. In order to define the spatial and temporal domains to be taken into account, we looked at the correlation maps between the wind velocities and the H s peaks for different time lags. Correlations were evaluated over a sub-grid of the atmospheric hindcast, with nodes spaced of 0.5 • both in longitude and latitude. To compute the correlation between H s andū w series is not straightforward, as the former variable is scalar and the latter is directional. To tackle this issue, we followed the procedure suggested by Solari and Alonso (2017). Given a time lag ∆t, the wind at every node (i, j) is 110 defined by its zonal and meridional components (u x , u y ) (i,j,∆t) ; the correlation between H s peak series and the time lagged surface wind speed at any given node is then estimated as the maximum of the linear correlations obtained by projecting the wind speed series in all the possible directions: where ρ i,j,∆t is the resulting correlation for node (i, j) at time lag ∆t, ρ refers to linear correlation function, u (i,j,∆t,θ) is 115 the surface wind speed projected along direction θ according to Eq (2): in this way not only a maximal correlation is obtained for every node, but also the direction corresponding to the maximal correlation, estimated as: The correlation maps computed with Eq. (1), allowed to evaluate the spatial domain and the time lags for whichū w is significantly correlated to (i.e., directly affecting) the resulting wave peaks at a given location.
Once the spatial and time domain of the wind fields producing the peak wave conditions were defined, the wind fields were used for clustering and classifying the extreme events. To this end, the k-means algorithm was used, fed with the normalized wind fields. k-means is aimed at partitioning a N-dimensional population into k sets (clusters) on the basis of a sample, in 125 order to minimize the intra-cluster variance (MacQueen et al., 1967). The normalization of the wind fields sought to reduce the influence of the intensity of the wind speed on the classification, so that only the spatial form of the field and its time evolution were taken into account. Note that the values of H s did not play any role in the classification of the peaks but the identification of the point in time of the wind fields.

130
Once the wind fields, and therefore peak H s series, were grouped into k clusters, the MSLP corresponding to the events within each cluster were averaged and the position of the lowest pressure was recorded, for all the ∆t taken into account. This allowed us to track the paths of the averaged low pressure systems corresponding to each cluster, defining in turn the respective WP.
At a second time, the dynamics of the systems were compared with those of the cyclones typically detected in the Mediterranean sea (Trigo et al., 1999;Lionello et al., 2016), while the frequency of occurrence of the events of different clusters were 135 compared with the outcomes of Sartini et al. (2015). The number of clusters needed to group the series into was defined for every location by looking at the outcome of the cluster analysis: when an increase from k to k + 1 clusters did not further lead to a new clearly differentiated WP, the research was stopped and k was used for the cluster analysis of that particular location.

Extreme value analysis
The EVA were performed independently over the subsets of H s peaks resulting from the cluster classification. We followed 140 the methodology proposed by , where the threshold for the POT analysis is estimated as the one maximizing the p value of the upper tail Anderson-Darling test. Then, once the threshold is identified (i.e. a subset of the peaks within a given WP is defined), the three parameters of the GPD are estimated through the L-moments method, and the return-period (T r ) quantiles of the variable under investigation are computed. In this work, in order to estimate the overall T r -H s curve and its confidence intervals from the GPDs fitted to each WP, a bootstrapping approach was implemented ( Table 2). The algorithm 145 in Table 2 shows a pseudocode summarizing the bootstrapping procedure. First, N boot series of H s , each N years long, are generated for every WP. Second, the series generated for the different WPs (N W P per location) are combined in order to obtain one single N years long series for each of the N boot simulations. Third, an empirical relation between T r and H s (i.e. an empirical cumulative distribution function or ECDF) is estimated from each one of the N boot series. Lastly, expected value and confidence intervals of H s are estimated from the N boot ECDFs for several return periods.

150
The method assumes a Poisson-GPD model for each WP and that the realizations of different WP are independent from each other. This independence hypothesis was evaluated by estimating the correlation between the annual number of peaks associated to each WP. Nyears is the number of years simulated; it must be larger that the maximum return period to be analyzed Nsimu is the number of events in Nyears obtained for the i th WP Table 2. Computational scheme of empirical extreme values curves and its confidence intervals For a given location, the overall work-flow can be summarized as follows: selection of a series of H s peaks through a POT approach 155 selection of the wind field data to be employed in the clustering algorithm (i.e. ∆t and spatial domain)

Results and Discussion
Once the series of H s peaks is selected, the first step of the proposed methodology requires to define the domains ofū w in time and space due to the outcomes of the correlation analysis between the two parameters. Indeed, as mentioned in Chapter 2.1, the wind is reasonably expected to play the mayor role for the observed wave parameters at the investigated locations.
170 Figure 2 shows the correlation maps for B7 for different time lags, along with the directions leading to the maximum values of correlation in each node. It is interesting to see howθ for ∆t=0 hours are distributed along the nodes characterized by similar values of ρ. Actually, even though the values ofθ come from a purely statistical analysis (i.e., they were computed with Eq. (3)), their spatial distribution follows that of a typical cyclone. Velocities happen to be uniformly oriented along the nodes characterized by the higher values of ρ, close-to-tangential to a circle centred on the nodes showing instead lower values of ρ.

175
This allows to get a first insight on the predominant process most likely affecting the wave climates of the investigated locations, as it will be discussed further on this paper. On the contrary, the analysis of the correlations between H s andū w reveals that the areas characterized by similar values of ρ are not uniformly distributed in the neighbourhood of the points considered. It is therefore difficult to uniquely contour the nodes to be taken into account for the successive analysis. As regards the time step, correlations rapidly decrease for ∆t longer than 12 hours, after which no evidence of significant ρ can be observed. The Once the spatial and temporal limits of the wind fields to be used in the k − means were defined, we performed a sensitivity analysis over the resultant average MSLP belonging to each cluster, among the total amount of tested clusters (k). Two clusters 185 were needed to detect different systems at all the sites but Alghero and La Spezia, where the local extreme waves could be related to a single pattern. For the other locations, to increase k did not lead to systems significantly diverging from those already defined. The reason for such a small number of resulting WPs has to be found in the nature of the H s employed in the analysis. Indeed, these are associated to extreme sea states, which are most likely driven by atmospheric phenomena developing along well-defined and fixed tracks.
190 Figures 3 and 4 show the averaged pressure fields corresponding to both the clusters and ∆t in B4 and B7 hindcast points.
Here, two main systems can be clearly distinguished: i) a low moving SW-NE towards the Balkan area (say WP#1) and ii) a low moving NW-SE (henceforth referred to as WP#2).    The paths of WP#1 and WP#2 show interesting similarities with well known cyclones typically forming and departing from two of the most active cyclogenetic regions in the MR, respectively the lee of the Atlas mountains and the lee of the Alps (Trigo et al., 1999). The MSLP composites related to WP#1 and WP#2 are consistent with those highlighted in previous research, for instance Lionello et al. (2006), showing the synoptic patterns associated with extreme significant wave heights in different 200 regions of the Mediterranean Sea (see in particular the MSLP fields reported in Fig. 122, panel A) and D) for WP#2 and WP#1, respectively). In particular WP#2 seems to follow the Genoa system path, that usually moves down to the Albanian and Greek coasts, while that of WP#1 shows characteristics analogous to the Sharav depression, moving north-eastward toward the Greek region (Flocas and Giles, 1991;Trigo et al., 1999Trigo et al., , 2002. At a second time, the MSLP related to the three highest events for both the locations and the WPs were individually analyzed, 205 in order to evaluate their homogeneity with respect to the overall MSLP averages. Results are shown in Fig. 5 and Fig. 6.  From the MSLP charts of Fig. 5 and Fig. 6, it can be noticed how low pressures are in good agreement in terms of their locations, and in turn to the average MSLP fields reported in Figures 3 and 4. Actually, this especially applies in case of the WP#2 events: the location of the low pressure of the cyclone is similar for the analyzed storms, though there are differences in terms of the absolute value of the pressure and slightly in the shape of the cyclone too (the values of the color scale were 210 modified to appreciate better the locations of the lows). This is something to be expected, as it partially explains the different intensities of H s (of course, local effects have to be taken into account as well). However, there is an important degree of variability within each family of events, in particular as regards WP#1.
The paths highlighted for WP#1 and WP#2 characterize the lows of the WPs detected in all the investigated sites (see the supplementary material). A summary of the lows position at the two different ∆t can be appreciated in Fig 7,  It is interesting to see how the WP#2 low get across the investigated sites in a precise chronological order, crossing first the northernmost locations and then those next to the south Balkan area where the cyclone actually ends its run. As such, we took as a reference four buoys affected by WP#2 at different times, evaluating the time lag between their respective peaks generated 220 by such system. We considered the extreme series of B4, computing for each storm the time lag between peak at the buoy and peaks occurring in B2 and B1 (occurring earlier) and B6 (occurring later); distributions of the time lags are shown in Fig. 8. couple of days, the center of the low travelling for approximately 1600 kilometres, with a resulting speed of ∼33 km h −1 (see Fig. 9). Both the lifetime of the identified cyclone and the speed it moves at are compatible with the features of the cyclones most frequently encountered in the MR (Lionello et al., 2016). Unfortunately, as regards WP#1, an equally clear path cannot be detected, since the average lows apparently arise simultaneously in most of the points taken into account. As previously pointed out, the MSLPs related to this pattern show a higher variability with respect to those characterizing the events of WP#2, thus 230 they would require further deepening and more detailed investigations. Therefore, we did not characterize the mean evolution of the MSLP fields related to the events of WP#1. Low pressure center -24hr Low pressure center 0hr Low pressure center +24hr Ref. buoy location Figure 9. WP#2: average MSLP evolution with respect to the reference dates of the events in B4 (underlined with the red triangle) The characterization of the two systems is reflected in the frequency of occurrence of the storms, with peaks belonging to different subsets showing as well distinctive seasonality. From the results shown in Fig. 10, it can be noticed how WP#2 peaks mainly occur in winter, whereas the events of WP#1 are characterized by two milder intra-annual peaks of occurrence, spread 235 among the spring and autumn months. The intra-annual cycle of the WP#1 events further suggest a direct link with the Sharav cyclones, which show similar seasonal fluctuations; as regards the WP#2 peaks, even though the storms of the Genoa low are more uniformly distributed along the year, the most intense events precisely occur during winter, as it happens in the above mentioned locations (details can be found in Lionello et al., 2016). Figure 11 summarizes the results of the monthly frequency of occurrence of the extremes in all the investigated locations, 240 grouped according the WP. For each location, frequencies were normalized in the 0-1 space with respect to the total amount of peaks, in order to be able to compare outcomes defined over different ranges.   It can be noticed how the relative weight of WP#1 events on the overall peaks distributions increases moving south. Points in the northernmost locations (B1 and B2) apparently are not influenced by the WP#1 system, and their extreme waves are actually linkable just to WP#2. Moving to the southernmost locations (B5 and B6), no significant differences in the seasonality 245 of the two patterns' events can be appreciated; in these two buoys none of the two systems shows a prevailing frequency of occurrence for the induced storms.
The aforementioned behavior may be justified by looking at the location of the investigated points (see Fig. 1): lows moving north-eastward directly run over B5 and B6, marginally affect B7, B8, B4 and B3, whereas they do not interest at all B1 and B2.
However, position of the buoys with respect to the cyclones' paths is not the only relevant variable. Indeed, local bathymetry 250 and prevailing fetch characteristics may result in storms having, on average, unique characteristics, e.g. it may happen that peaks related to the same WP show distinct frequencies of occurrence, for instance the events related to WP#1 in B6 and B7.
In this latter case, the predominant parameter seems to be the fetch length, which for B7 is very limited with regards to the NE incoming waves (precisely related to the first weather circulation pattern).
The WPs that are defined in the present study show common characteristics with those qualitatively identified by Sartini 255 et al. (2015) in a seasonal variability analysis of extreme sea waves. In particular, the same WP was identified in B1 and B2, linking the extremes with the Gulf of Genoa cyclogenesis WP type. As Sartini et al. (2015) noted, even if cyclones in the Genoa Gulf are a constant feature over the whole year, those connected at the extreme events are the ones characterized by the lowest value in MSLP. These findings (i.e., higher values during the winter rather than in spring and summer time) were observed in the southern Thyrrenian Sea as well (for instance in B4) and in the Central Thyrrenian Sea (B3). In the latter case, we found 260 a more marked seasonal variation of the extremes, as the two resultant WPs are well separated between winter and autumn.
Intra-annual variability was observed also for B7, while the analysis carried out by Sartini et al. (2015) did not reveal this kind of behaviour; analogously, B5 and B8 buoy revealed the presence of two distinct WPs, while the previous analysis identified just one cyclogenesis system. These differences may be justified in the first place by the different peaks selection (a moving window for this study, while Sartini et al., 2015, use a partial duration series approach). Moreover, evaluation of the seasonality 265 follows completely different algorithms: the present study directly applies a clustering technique over the selected peaks, while the former analysis used the peaks to model a time dependent distribution, further characterizing their seasonality on the basis of the values of the distribution coefficient aroused from the fitting procedure.
Another interesting outcome regards the characteristics of the extremes differentiated due to the parent clusters following the k − means algorithm. Figure 12 shows the covariates H s and θ m of the selected peaks belonging to each WP. Looking at the scatter plots, it can be appreciated how the events belonging to different clusters are remarkably homogeneous in terms of wave bulk parameters, even though the latter were not used for clustering the peaks. In case of B4, where is clear a bimodal distribution with respect to the waves' incident direction (i.e. two peaks corresponding to SE and NW), the proposed methodology allows to differentiate the sea storms according to the directional frames of the local wave climate. On the other hand, when the waves' direction is more uniformly scattered over a given sector, it is not unusual that different climate forcing 275 result in the same wave direction; this can be noticed in B7, where θ m -H s scatters belonging to different patterns are partially overlapped, thus a directional classification is not straightforward and most probable would not be able to differentiate the storms due the cyclones generating them. Nevertheless, also in such a case the distributions of H s are homogeneous within each WP, with the more severe H s related to WP#2 system in both the locations.
Finally, Fig. 13 shows the T r -H s curves, comparing the results obtained directly from the whole set of peaks with those 280 obtained from the single-WP distributions, combined by means of algorithm given in Table 2 (i.e., omni-WP curves).
The omni-WP curves show a striking agreement with those carried out through the analysis of the whole dataset without WP classification; in both the locations B4 and B7 it can be even appreciated a narrowing of the confidence intervals, meaning a reduction in the total variance for the long-term estimates. Actually, the curve related to B4 show a slight deviation between the two approaches ( 30 cm over the 200 year wave); however, such small magnitudes imply relative errors of 4% with 285 respect to the omni-WP curves, and can be therefore considered as an uncertainty inherent in this kind of computations (see e.g., Resio, 1977, 1982, stating that reliable long term estimates can be carried out just up to three times the years the length of the original dataset). for B5, B6, B4, B7, B8 and B3 respectively). In conclsion, it is good to recall that thresholds for the EVA were selected in order for the peaks to come from a Generalized Pareto family (Solari and Alonso, 2017), thus guaranteeing the intra-cluster homogeneity of the data.

295
In this work, the extreme sea storms at eight hindcast points differently located along the Italian coasts were analyzed. The investigated locations are characterized by different conditions, both in terms of their local orography and exposure, and this is consequently reflecting on the respective sea storms showing, on average, peculiar characteristics. Nevertheless, the analysis here introduced reveal how the storms considered for all the locations can be related to particular WPs, resulting in homogeneous features both in terms of frequency of occurrence and significant wave heights. Such features suggest that the 300 extreme distributions of H s shall be singularly evaluated for each WP, the starting datasets being homogeneous and independent with respect to each other.
The methodology here introduced allows the classification of extreme wave events (or other oceanic variables) into homogeneous subgroups according to the circulation patterns most likely generating them. By focusing solely on extreme events, the proposed classification method results in a reduced number of circulation or weather patterns, which facilitates their physical 305 interpretation as well as their linkage with the climatology of the area.
The method, as presented here, does not contemplate the inclusion of trends and inter-or intra-annual cycles. However, the extension of the methodology in this direction is straightforward, as the methods previously developed for non-stationary analysis (see e.g., Sartini et al., 2015) could be applied without major complexities to each of the subsets that are obtained from the classification. On the contrary, it is not obvious how to proceed in the selection of the number of patterns to be considered.

310
Here this number was chosen following a qualitative analysis of the results, which was viable for the case study analysed, though this approach is not always feasible. Indeed, sometimes it might be necessary to (at least) resort to a sensitivity analysis of the results, as long as a quantitative methodology is not available for the definition of the number of clusters.
For the analyzed locations, the proposed methodology led to the identification of two different cyclonic systems, characteristic of the atmospheric circulation on the Mediterranean Sea, as the possible origin of the extreme wave events affecting the 315 Italian shores. Indeed, two well-known circulation patterns were highlighted: one characterized by a low departing from the mid western Europe and moving south-east (referred to as WP#2); the other forming in the lee of the Atlas and crossing the Mediterranean sea north-easterly (referred to as WP#1).
When extreme events are classified according to their meteorological origin, there is a great confidence of working with homogeneous samples, thus being in compliance with the main hypothesis underlying the EVA. Nevertheless, the divergences 320 arising between the outcomes of usual EVA scheme and that following the initial classification of the peaks are relatively limited: although in some cases a narrowing in the confidence intervals was achieved, in general the distribution obtained by following the two different approaches was very similar. However, it must be kept in mind that the classification could facilitate other aspects of the analysis not included in this work, as for instance the multivariate analysis of extreme events. In such a framework, to classify the wave fields according to the wind velocities lead to clusters of T p consistent with those of H s , as 325 the latter parameter is closely tied to the former one (especially in case of extreme sea states).