A meta-modelling strategy to identify the critical offshore conditions for coastal flooding

Abstract. High water level at the coast may be the result of different combinations of offshore hydrodynamic conditions (e.g. wave characteristics, offshore water level, etc.). Providing the contour of the "critical" set of offshore conditions leading to high water level is of primary importance either to constrain early warning networks based on hydro-meteorological forecast or observations, or for the assessment of the coastal flood hazard return period. The challenge arises from the use of computationally intensive simulators, which prevent the application of a grid approach consisting in extracting the contour through the systematic evaluation of the simulator on a fine design grid defined in the offshore conditions domain. To overcome such a computational difficulty, we propose a strategy based on the kriging meta-modelling technique combined with an adaptive sampling procedure aiming at improving the local accuracy in the regions of the offshore conditions that contribute the most to the estimate of the targeted contour. This methodology is applied to two cases using an idealized site on the Mediterranean coast (southern France): (1) a two-dimensional case to ease the visual analysis and aiming at identifying the combination of offshore water level and of significant wave height; (2) a more complex case aiming at identifying four offshore conditions (offshore water level and offshore wave characteristics: height, direction and period). By using a simulator of moderate computation time cost (a few tens of minutes), the targeted contour can be estimated using a cluster composed of a moderate number of computer units. This reference contour is then compared with the results of the meta-model-based strategy. In both cases, we show that the critical offshore conditions can be estimated with a good level of accuracy using a very limited number (of a few tens) of computationally intensive hydrodynamic simulations.


Introduction
Forty percent of the world population lives within 100 km of a coastline, an area that accounts for only about twenty percent of the land mass (UNEP, 2007).This urbanized area is characterized by both increasing assets and increasing coastal flooding risk induced either by extreme events and/or by sea-level rise (Nicholls et al., 2006).Some recent events like Katrina in 2005 or Xynthia in 2010 (see e.g.Bertin et al., 2012) illustrate the present-day coastal damages and injuries that can affect the coastal area.Katrina was one of the six most powerful hurricanes ever registered in the Atlantic, leading to 1836 deaths and damages of about 80 billion USD, whereas Xynthia was a mid-latitude storm that severely hit low-lying coasts located in the central part of the Bay of Biscay on the 27-28 February 2010, inducing 53 deaths and material damages estimated at more than one billion EUR.From a statistical point of view, the wave heights generated during the Xynthia event could not be considered extremes, but what makes this event "rare" is the combination of a high spring tide with a large storm surge reaching its maximum around the tide peak.The processes involved in such event are identified by Bertin et al. (2012) showing a strong influence of waves in such high storm surges.This recent event provides a vivid reminder of the complexity and of the strong coupling between different hydro-meteorological processes underlying the occurrence of storm surge, and possibly of large impacts on the coastal plain.
In this context, early warning and forecasting systems are key components for a coastal flooding risk assessment and management.These are based on hydro-meteorological observations and modelling, such as the initiatives in the North Sea Region (Safecoast, 2008), and allow anticipating the cooccurrence of such high-magnitude events and setting up accordingly preventive actions.To constrain these systems, the combinations of offshore conditions x that generate a "critical" high water level at the coast (the maximum water level at the coast and the critical one are respectively denoted ξ m and ξ c ) are of primary importance.The "critical" characteristic of ξ c can be judged regarding the potential associated losses and consequences.
The relationship between these critical combinations constitutes the critical frontier c , which separates the "safe" region from the "unsafe" one in the offshore conditions domain (e.g.combination of wave, tide and surge conditions), as schematically depicted in Fig. 1.For instance, the combination of offshore conditions outlined by "a" in Fig. 1 leads to a water level at the coast considered "safe", whereas the combination "b" leads to an unsafe water level, as located in the set, whose boundaries correspond to c .The combination "c" is exactly on this critical frontier, and its estimation constitutes the core issue of the present study.More formally, the objective is to estimate the contour of the set c ={x so that ξ m = ξ c }.
From a reliability analysis point of view, this frontier is referred to as "limit state" and determines the failure of an engineered system.It is used to estimate the probability of the system failure (see e.g.Haldar and Mahadevan (2000) for further details on system reliability analysis).Similarly as for reliability assessment, the critical frontier of offshore conditions can be used in a Monte Carlo-based approach to estimate the return period (i.e. the probability of occurrence) of the water level at the coast (see for instance the statistical approach called "response method" proposed by Garrity et al., 2006).Another modelling approach has recently been proposed by Chini and Stansby (2012) based on the statistical discretization of the offshore conditions (wave and water level), with the successive following steps: joint probability analysis of offshore conditions, discretization of extreme offshore conditions occurrence isolines, hydrodynamic modelling for each selected offshore combination and evaluation of the near-shore values (e.g.mean overtopping) in the twodimensional space of offshore wave and water level.
From a more methodological perspective, the knowledge of c can eventually support the implementation of innovative "inverse" risk assessment procedures, which have been recently proposed either for fluvial inundation (Cunderlink and Simonovic, 2007) or for coastal flooding (Idier et al., 2010).These still under-development methodologies basically rely on the inversion of the traditional approach of risk analysis by starting from a targeted risk level judged as "acceptable" in the studied territory and by ending up with the estimate of the corresponding return period based on the knowledge of the associated hydro-meteorological conditions.
This "critical" frontier c is usually not "explicit", i.e. in many cases it does not have a simple analytical mathematical formulation and its estimate usually relies on numerical modelling.In the field of coastal flood hazard assessment, several categories of models with different degrees of complexity depending on data requirements, resolution, physical processes and characteristics of the underlying equations are available (Fewtrell et al., 2011;Samuels et al., 2008).If we focus on the modelling of the water level at the coast (necessary step to provide final flooding hazard maps), several approaches can be used.A first class of models (class 1) is based on semi-empirical approaches, which provide total water level (tide, storm surge, wave set-up, run-up and swash) based on input parameters like offshore water level and wave conditions.An example is provided by the formula of Stockdon et al. (2006).A second model class (class 2) relies on more sophisticated attempts to model the physical processes, involving more advanced (less simplified) methods than class 1 models.As an example, spectral wave modelling (Booij et al., 1999) can be used to provide set-up at the coast, under some assumptions, but will not provide run-up information.Another example is the coupled wave action and shallow water model of McCabe and Sansby (2011), which actually allows estimating the run-up.A last model class (class 3) is based on more advanced methods including few simplifying assumptions and being more accurate but also more complicated.An example is the fully nonlinear weakly dispersive model developed by Bonneton et al. (2011), allowing to reproduce the free-surface (including, for instance, swash, runup, set-up, etc.).
A "crude" approach for estimating c would consist in running the model on a regular design grid defined in the forcing conditions space, i.e. corresponding to an intensive parametric analysis.To illustrate, let us consider a regular design grid composed of 30 configurations of offshore water level ξ o varying by a constant increment step between 0.25 and 1.50 m and 30 configurations of significant wave height H s varying by a constant increment step between 0.5 and 7 m.In total, 900 simulations should be run in this case.Then, for this specific two-dimensional case, the desired critical frontier, i.e. the one for which the maximum water level at the coast ξ m is ξ c = 1 m, can be extracted in the form of a contour using, for instance, packaged software such as Matlab© function contour(), the accuracy of the solution being dependent on the grid increment.
Yet, the afore-described approach may rapidly become prohibitive and impractical due to the computation time cost of a single numerical simulation, especially when using the model class 3 or even model class 2 (with computation time cost varying from a few minutes to hours).Let us consider a model with moderate computation time cost of 15 min; then, the afore-described example case would require less than 9 days of computation, which is achievable using a computing cluster composed of a moderate number of computer units (CPU).If the estimate of c was carried out for four offshore conditions (e.g.significant wave height H s , wave period T p , direction θ and offshore water level ξ o ) using a design grid with each dimension discretized into 30 elements, the total computation time cost would reach almost  This frontier represents the boundary between the "safe" and "unsafe" regions (red-coloured area), i.e. the boundary of the set of offshore forcing conditions, which lead to a "high level" of water level at the coast.8400 days, which is obviously not achievable and would require a computing cluster composed of at least 500 CPU to reach a computation time of about two weeks.
To tackle this challenge, two strategies can be proposed: appropriate grid computing architecture (e.g.Boulahya et al., 2007) or the use of meta-models (also named response surfaces or surrogate models; see for instance Forrester et al. (2008)).The latter approach basically consists in replacing the numerical model by a "costless-to-evaluate approximation".This has been proposed in a variety of other contexts: structural reliability problems (Bucher and Bourgund, 1990), environmental issues (Iooss et al., 2006), natural hazard assessments (Rohmer and Foerster, 2011), geological storage performance assessments (Rohmer and Bouc, 2010), etc.
The objective of the present paper is to explore the applicability of the meta-modelling techniques for identifying the critical frontier in the offshore condition domain, when using computation time consuming hydrodynamic models.To let the "crude" approach be achievable with a cluster composed of a moderate number of CPU (around 10), we propose to focus on a hydrodynamic model of class 2 characterized by a moderate computation time (of the order of 15 min).The critical frontier estimated in this manner is viewed as a reference to be compared to the one estimated using the meta-modelling approach.Both strategies are tested by applying them to an idealised site located on the Mediterranean coast in southern France considering two cases: (1) the first one is two dimensional to ease the visual analysis and aims at identifying the critical combinations x c ={ξ o ; H s } ; (2) the second one is more complex and aims at identifying the fourdimensional critical combinations x c ={ξ o ; H s ; T p ; θ}.
The remainder of the paper is organized as follows.In a first section, the general strategy relying on meta-modelling is described.We focus in a second section on kriging meta-models (Forrester et al., 2008).To improve the local accuracy of the approximation of the contour of interest, we combine the meta-modelling strategy with a sequential adaptive sampling approach relying on the recent advances of contour estimation from complex computer codes (for a recent overview, see Bect et al., 2012).In a third section, we describe the study site and the associated moderate-toevaluate (of the order of 15 min) numerical model to compute the maximum water level at the coast.The meta-modelling strategy is then tested on both test cases (two-and fourdimensional cases).

Meta-modelling strategy
In this section, we describe the methodology relying on the meta-modelling technique to estimate the critical frontier from a complex and time consuming hydrodynamic code, which is referred to as "simulator" and denoted f .

Principle of meta-modelling
This technique consists in replacing f by a mathematical approximation referred to as "meta-model" (also named "response surface" or "surrogate model").This corresponds to a function constructed using a few computer experiments (i.e. a limited number of time consuming simulations), and aims at reproducing the behaviour of the "true" model in the domain of model input parameters (here the offshore conditions) and at predicting the model responses with a negligible computation time.In this manner, any approach relying on intensive multiple simulations (such as global sensitivity analysis, in-depth uncertainty analysis or intensive parametric analysis for the case described in the present article) is made achievable at a "reasonable" computation time cost.

Methodology
The main steps of the methodology are summarized in Table 1.A meta-model is constructed and validated with an initial budget of training data (Steps 1 to 3).It is then updated (Step 4) using candidate points selected to improve the local accuracy in the regions that contribute the most to the estimate of c .Using this costless-to-evaluate function, c can be obtained from a fine grid on the offshore conditions space (Step 5) using for instance packaged software such as Matlab© function contourc().Steps 1 to 4 are detailed in the next paragraphs.

Step 1: setting the training data
The approximation is constructed by running f given a limited number n 0 of different model input parameters (i.e. of offshore conditions vectors x), named training samples.Hence, the objective is to create a mapping (named training data) between the offshore conditions x and the quantity of interest, namely the maximum water level at the coast ξ m .A trade-off should be found between maximizing the exploration of the offshore conditions domain and minimizing the number of simulations, i.e. a trade-off between the accuracy of the approximation and the computation time cost.
To fulfill such requirements, we propose to randomly select the training samples by means of the Latin hypercube sampling (LHS) method (McKay et al., 1979) in combination with the "maxi-min" space-filling design criterion (Koehler and Owen, 1996).

Step 2: construction of the meta-model
Using the training data, the approximation can be carried out relying on several types of meta-models, either using simple polynomial regression techniques, non-parametric regression techniques (Storlie et al., 2009), kriging modelling (Forrester et al., 2008), artificial neural networks (Papadrakakis and Lagaros, 2002), or polynomial chaos expansions (Ghanem and Spanos, 1991), etc.The choice of the meta-model type is guided by the a priori non-linear functional form of the simulator, as well as the number of input parameters.In the present study, we focus on kriging meta-modelling.This nonparametric technique presents several attractive features.It is flexible to any kind of functional form of the simulator.In particular, it introduces less restrictive assumptions on the functional form of the simulator than a polynomial model would imply; it is an exact interpolator, which is an important feature when the simulator is deterministic; it provides a variance estimate of the prediction, the latter being very useful to guide the selection of future training samples according to the target of the optimization problem (see Sect. 3).Yet, it should be noted that these advantages come at the expense of a parameter learning stage, which may pose additional computational difficulties (see e.g.Langewisch and Apostolakis, 2010).

Step 3: validation of the meta-model
Usually the differences between the approximated and the true model output (or residuals) are used to estimate the quality of the approximation.Yet, for interpolating metamodels (e.g.splines, kriging, etc.), the residuals at the sampled points are null and an alternative is to estimate the expected level of fit (i.e.quality of prediction) to a data set that is independent of the original training data that were used to construct the meta-model, i.e. to "yet-unseen" data.A first approach would consist in using a test sample of new data.
Though the most efficient, this might be often impracticable as additional simulator runs are costly to collect.A possible option to overcome such a situation, coined by Tukey (1954) as "uncomfortable science", is to rely on the cross-validation procedures such as the "leave-one-out" cross validation technique (e.g.Hastie et al., 2009).Further details are provided in Appendix A.

Step 4: adaptive sampling
Once constructed and validated, the meta-model can replace the simulator to extract the desired contour through an intensive parametric analysis on a regular fine design grid in the offshore conditions space.For instance, the choice of the initial number of training data is advocated by Ranjan et al. (2008) to reach at least 25 to 35 % of the total budget, but no exact rule exists.Yet, this approach may not be efficient in several cases: when the allowable budget of simulations n 0 is low; when the number of offshore conditions is high (i.e.requiring a large number of training data to explore the offshore condition domain); and when the relationship between x and ξ m is highly non-linear for instance.
Based on the basic idea that not all offshore conditions configurations will contribute to the estimate of c , sampling effort should then be concentrated in the regions of the offshore conditions, which provide the maximum information regarding c , i.e. both to improve the local accuracy of the approximation of c and to explore the design space in a global manner, where desired contour may also exist (see Sect. 3.2).Hence, to efficiently use the available resources, we propose to complete steps 1 to 3 by a sequential sampling approach to adaptively select the best training data candidates fulfilling the afore-mentioned requirements.
The definition of a selection criterion to solve this "contour estimate" problem has only been recently addressed (see Bect et al., 2012 and references therein).This type of optimization problem differs from the traditional problem of searching the global optimum (minimum or maximum) of a black-box function, which has given rise to numerous research works over the last two decades (see Forrester and Keane, 2009).One major difficulty is the nature of the solution of the "contour" optimization problem, which corresponds to a set of solutions, whereas it corresponds to a unique solution (or at least of finite number) for the traditional optimization problem.The comparison of the existing criteria is beyond the scope of this article, and we focus on the sampling criterion developed by Ranjan et al. (2008) (see Sect. 3).
To summarize, the meta-model is initially constructed and validated based on a first training set.The meta-model is then iteratively updated using the candidate selected based on the sampling criterion, and the procedure is continued until the allowable number of simulations is exhausted.

Adaptive sampling using kriging meta-model for contour estimate
In this section, we focus first on kriging metamodelling (Forrester et al., 2008), for which selection criteria adapted to contour extraction from computer codes have been recently defined to sequentially guide the sampling effort (Bect et al., 2012).Next, we more specifically focus on the selection criterion developed by Ranjan et al. (2008).

Introduction to kriging meta-model
We introduce here the basic concepts of kriging metamodelling, which can be viewed as an extension to computer experiments of the kriging method used for spatial data interpolation and originally developed by Krige (1951) for mining applications.For a more complete introduction to kriging meta-modelling and full derivation of equations, the interested reader can refer to Sacks et al. (1989); Jones et al. (1998); Jones (2001); Forrester et al. (2008).
The kriging model considers the deterministic (i.e.not random) response of the simulator ξ m = f (x) as a realization of a Gaussian stochastic process F so that f (x)=F (x,ω), where ω belongs to the underlying probability space .In the following, we use the notation F (x) for the process and F (x,ω) for one realization.The process F results from the summation of two terms: f 0 (x), the deterministic mean function, which is usually modelled by a constant or a linear model and represents the trend of f ; -Z(x), the Gaussian-centred stationary stochastic process (with zero mean and covariance described below), which describes the deviation (i.e.departure) of the model from its underlying trend f 0 .
The stochastic process Z is characterized by the covariance matrix C, which depends on the variance σ 2 Z and on the correlation function R, which governs the degree of correlation through the use of the vector of length-scale parameters ω between any input vectors.The covariance between Z(u) and Z(v) is then expressed as , where u = (u 1 ; u 2 ; . . .; u d ) and v = (v 1 ; v 2 ; . . .; v d ) are two input vectors of dimension d.
A variety of correlation (and covariance) functions has been proposed in the literature (see e.g.Stein, 1999).The commonly used model is the Gaussian correlation function defined as follows: where the vector ω determines the rate at which the correlation decreases as one moves in the i-th direction (with i from 1 to d).Intuitively, if u=v, then the correlation is 1, whereas if the distance between both vectors tends to the infinity, then the correlation tends to 0. In this article, we chose to focus on the Matérn covariance family.The Matérn correlation function presents more flexibility than Gaussian correlation function, as it allows regulating the degree of differentiability of the underlying random process in addition to the length-scale parameters.This correlation model is described in further details in the Appendix B.
Let us define X D the design matrix composed of the vectors of offshore conditions x selected in step 1 of the methodology (i.e. the training samples) to be simulated, so that X D =(x (1) ;. x (2) ; . . .; x (n0) ) and ξ D the vector of simulated maximum water levels at the coast associated with each selected training samples, so that ξ D =(ξ Under the afore-described assumptions, the distribution of the maximum water level for a new input vector of offshore conditions x * follows a Gaussian distribution conditional on the design matrix X D and of the corresponding simulated water levels ξ D with expected value ξm for the new configuration x * and variance s 2 respectively defined by Eqs. ( 2) and (3):  The conditional mean in Eq. ( 2) is used as a predictor, i.e. the "best estimate".The conditional variance in Eq. ( 3) can be used to estimate the mean square error of the predictor and to deduce a confidence interval associated with the prediction.The regions of the offshore conditions space where few data are available are underlined with higher variance, so that Eq. ( 3) also provides a local indicator of the prediction accuracy useful to guide sampling effort as described in Sect.3.2.
The above Eqs.( 2) and ( 3) are categorized as "ordinary kriging" and are the most common version of kriging used in engineering (Forrester et al., 2008).A more general form of kriging equations exists, known as "universal kriging", and allows computing the deterministic mean function f 0 as a polynomial regression (generally of low-order) with unknown coefficients.For sake of clarity, these equations are presented in Appendix C.
In practice, the parameters of the kriging model corresponding to the constant (or the coefficients of regression) used to model the mean function f 0 , the variance σ 2 Z and the length scale parameters ω are determined through maximum likelihood estimation procedure (details can be found in Santner et al., 2003).In this article, we use the package named "DiceKriging" developed by Roustant et al. (2012) of the R software (R Development Core Team, 2011).In particular, this package implements the efficient likelihood maximization algorithm of Park and Baek (2001) to overcome numerical instabilities, multimodality and dimensionality issues associated with kriging model parameters estimation.

Definition of the selection criterion
The sampling strategy described in Sect. 4 requires the definition of an appropriate selection criterion.Following the same spirit as the developments done by Jones et al. (1998) for the optimization (maximizing or minimizing) of expensive black-box computer codes, Ranjan et al. (2008) defined a selection criterion adapted to contour estimation based on the following improvement function I for any offshore conditions vector x: where ε is proportional (for some constant α) to the mean square error using Eq. ( 3) and defines a neighbourhood (the first term in the "min" function being the Euclidean distance) around the targeted contour c associated with ξ c .Jones et al. (1998) and Jones (2001) demonstrated that merely maximizing I would be inefficient and advocated to select the best candidate by maximizing the expected improvement function E(I ) over the offshore conditions domain using the fact that the simulator output f (x) follows a Gaussian distribution with moments ξm and s 2 respectively defined by Eqs. ( 2) and (3).E(I ) is the summation of three terms A 1 , A 2 and A 3 as expressed by Eq. (5a, b, c) for any offshore conditions vector x: where u ± = (ξ c − ξm (x) ± ε(x)) s(x); φ is the Gaussian probability density distribution and is its cumulative form.Ranjan et al. (2008) provides an intuitive interpretation of each of the three terms: 1.The first term A 1 dominates the expression when f (x) is close to ξ c , hence corresponding to the points lying close to the regions of the offshore conditions specified by ξ c ± ε(x), i.e. in the vicinity of c ; 2. A 2 dominates when f (x) is further away from ξ c so that the sampling is guided outside the regions fulfilling ξ c ± ε(x), where the uncertainty in the prediction is high (i.e.where the prediction variance is high); 3. the last term A 3 guides the sampling only within the regions fulfilling ξ c ± ε(x), where the uncertainty in the prediction is high.
The three terms taken together allows to guide sampling in the vicinity of c where little information is available (A 1 and A 3 ) and to explore outside the vicinity of c (term A 2 ) in regions scarcely sampled where the contour may also exist.
Thus, E(I ) provides a selection criterion to balance local (in the vicinity of c ) and global exploration of the offshore conditions domain.
The interested reader can refer to the recent works of Bect et al. (2012) for the definition of other selection criteria formulated following either similar principles using an improvement function, or a modified version of the traditional integrated mean square error, or the Bayesian formalism.
In practice, the best candidates for simulation are selected in Step 4.1 of the methodology (see Table 1) by maximizing Eq. ( 4).In the present article, the package "KrigInv" developed by Picheny and Ginsbourger (2012) of the R software (R Development Core Team, 2011) is used.

Application to the identification of critical offshore conditions
In this section, we present the application of the metamodelling strategy on an idealised site (description in Sect.4.1 and associated model in Sect.4.2) considering two cases (respectively Sects.4.3 and 4.4): (1) the first one is two dimensional to ease the visual analysis and aims at identifying the critical combinations x c ={ξ o ; H s } ; (2) the second one is more complex and aims at identifying the fourdimensional critical combinations x c ={ξ o ; H s ; T p ; θ}.

Study site
The application on the idealized site is considered for demonstration purposes only.The study site located on the Mediterranean coast is characterised by a lido, which is of primary importance both at an environmental and economic level for the region (see Fig. 2).This lido is protected from the sea, characterised by significant touristic activities.At the seafront, there are pedestrian areas, which have already been flooded, at least during two storms: 6-8 November 1982 and the 4 December 2003.This pedestrian area is limited by a sea-wall having a height of about 50 cm above the pedestrian street as schematically represented in Fig. 3.In this context, the critical maximum water level at the coast, including storm surges, tide, wave set-up and run-up, should not exceed the foot of the sea-wall, which is located at 2.15 m above the vertical reference (IGN).In this manner, the sea water is prevented from invading the area through the sea-wall openings, and specifically at the observation point (Fig. 2), which is just in front of connexion roads with the lido.

Model set-up and parameters
First, we should keep in mind that the water level at the coast results from the offshore water level ξ o (tide and storm surge) and the run-up resulting from two dynamical processes, which are the wave set-up η and the swash S. Here, we set a modelling strategy to account for these three contributions (ξ o ; η; S).We focus the analysis on a numerical model of class 2 (see introduction) based on the following assumptions.First, the sea level induced by storm surges and tide is assumed uniform in the local area.Second, the wave set-up is computed using the code SWAN (Booij et al., 1999).It solves the wave action equation and provides wave spectra and wave set-up.At the offshore boundary (see Fig. 2b), the offshore conditions x={ξ o ; H s ; T p ; θ} are imposed.
The bathymetric and topographic data used for the modelling have been obtained through lidar measurements (data provided by SMNLR), as well as bathymetric surveys (data provided by SHOM).The grid size is 2 m and the 2-D domain represented in Fig. 2b.Finally, to compute the swash-induced elevation S, we use the formulation of Stockdon et al. (2006).
Based on the assumption of reflective beach, the analysis of  the equations proposed in Stockdon et al. (2006) shows that we can, as a first approximation, assume that S is equal to the set-up η.Thus the maximum sea-level ξ m at the beach can be assumed to approximately equal ξ o +2η.All processes and parameters are schematically depicted in Fig. 3. Noteworthy for modelling simplicity purposes, we assume that the lido is not connected to the sea, and thus we neglect the influence of this lido on the submersion phenomena.

Two-dimensional test case
In this section, we tackle the problem of estimating the critical frontier c considering the combination of two offshore conditions x c =(ξ o ; H s ) leading to the critical water level at the coast ξ c =2.15 m (at the observation point).We assume that the offshore wave direction and period are constant and fixed at their most frequent values of respectively 135 • N and 7 s, which have been observed in the studied area region.
Due to the moderate computation time cost of 15 min (in average depending on the input parameter values), the identification of the critical frontier can be solved, in a first approach, using a cluster of 10 CPU (2 GHz Pentium 4PC) considering 900 input parameters configurations selected within the domain [0.25; 1.50 m]×[0.5;7 m] (each dimension being discretized in 30 elements).The total computation time reached less than one day to finally extract c (red solid line, Fig. 4b).
In a second approach, we simulate ξ m considering 10 different offshore conditions randomly selected within the domain [0.25; 1.50 m]×[0.5;7 m] based on a LHS technique (see Sect. 2, step 1).On this basis, a kriging model is constructed using the Matérn covariance function (ν=5/2; see Appendix B) and a linear trend (see Sect. 3).This metamodel is then validated, and the leave-one-out cross validation procedure provides a coefficient of determination R 2 CV = 99.7 % (see Appendix, Eq.A1), which demonstrates the high level of prediction quality of the kriging model.Figure 4a depicts the relationship between the 10 simulated water level ξ m (named "observed") and the corresponding estimations using a kriging model constructed following the procedure described in Sect.2.3.The closer the dots are to the first bisector, the better the quality of prediction.
The best estimate of c , as provided by Eq. ( 2), is depicted by the green solid line in Fig. 4b.This allows the comparison and the reference contour.We can note that the reference contour lies within the region bounded by the pair of contours (green dashed lines, Fig. 4b) computed at confidence level of 95 %.These correspond to the kriging mean (Eq.2) ± twice the square root of the variance (Eq.3).However, there is a poor agreement between the reference contour and the best estimate with several discrepancies.Besides, the width of the pair of contours computed at confidence level of 95 % appears quite large.
On this basis, the sequential sampling strategy described in Sect. 3 is performed so that 10 additional input parameter configurations are selected in the vicinity of the targeted contour defined assuming a constant α = 0.5 (see Eq. 4). Figure 5 shows the two first iterations of the sequential sampling strategy.Figure 5a provides the mapping of the expected improvement function values (scaled between 0 and 1) in the offshore conditions space: the best candidate for simulation corresponds to the maximum of this function.Figure 5b provides the updated estimate of c (green solid line), i.e. the one estimated using a kriging model constructed on training data enriched with the selected training point.Interestingly, the selection of a training point implies the decrease of the width of the pair of contours at level of confidence 95 % (see iteration no. 2 in Fig. 5d).
To test the efficiency of using the adaptive sampling strategy, we estimate c using a "crude" meta-modelling approach consisting in selecting training samples only relying on LHS to explore the offshore conditions space.Figure 6a provides the estimation of c using a kriging model with 20 training data selected from LHS, whereas Fig. 6b provides c extracted using a kriging model with 10 training samples selected from LHS completed by 10 training data selected using the adaptive sampling strategy.Noteworthy, the adaptively selected training samples are preferably located in the vicinity of the contour, indicating that both terms A 1 and A 3 in Eqs. ( 5a) and (5c) (linked with the local search; see Sect.3.2) dominate the selection criterion.
Clearly, the best estimate using the adaptive sampling strategy deviates very little from the reference.Interestingly, the total computation cost reached 5 h (20 simulations × 15 min) using a single CPU to be compared to about 1 day using the computing grid architecture of 10 CPU.The training data of the kriging model constructed with no adaptive sampling should be completed at least with 20 additional training data, i.e. based on a total number of 40 training data, to reach an agreement between the best estimate and the reference contour as good as the one reached with adaptive sampling strategy.

Four-dimensional test case
In this section, we tackle the problem of estimating the critical frontier c considering the four offshore conditions x c =(ξ o ; H s ; T p ; θ) associated with ξ c = 2.15 m at the observation point.This problem is more complex than the one tackled in Sect.4.3 as the input space to be explored is four dimensional, hence far larger.Furthermore, contrary to Sect.4.3, estimating the reference contour in four dimensions using a grid of moderate number of CPU is hardly achievable, as outlined in the introduction.
To demonstrate the efficiency of the meta-modelling strategy, we propose to extract a sub-set of the four-dimensional c in the {ξ o ; H s } space considering a fixed wave period T p =7 s and wave direction θ=135 • N. In this manner, the "reference" contour, as calculated using the computing grid architecture, can be re-used for comparison.
An initial four-dimensional kriging model is constructed using 20 training data randomly selected within the following offshore conditions domain [0.25 ; 1.5 m]×[0.5 ; 7 m]×[2.5 ; 16 s]×[40 • N ; 200 • N] and using the LHS technique.The cross-validation procedure provides R 2 CV =91.6 % (see Appendix, Eq.A1), which demonstrates a satisfactory level of prediction quality of the kriging model.
Similarly to Sect.4.3, two meta-modelling strategies are conducted: the first one considers no adaptive sampling and merely selects training samples using LHS. Figure 7a depicts the comparison between the sub-set of the four-dimensional c estimated in the {ξ o ; H s } space considering a fixed wave period T p = 7 s and wave direction θ= 135 • N (green solid line) and the reference contour (red solid line).Two training sample sizes are used: 10 training samples in addition to the 20 initial training ones (Fig. 7a) and 20 training samples in addition to the 20 initial training ones (Fig. 7c).For the first sample size, the visual inspection of Fig. 7a shows a rather poor agreement, the reference contour being even outside the region bounded by the pair of contours estimated at confidence level of 95 %.For the second training sample size, the agreement of the estimate with the reference contour is equivalent to the one between the estimate using a kriging model constructed with 20 initial training samples and 10 additional adaptively selected ones (Fig. 7b).The accuracy of the approximation using the adaptive strategy is improved by increasing the number of samples (Fig. 7d).
Figure 8 provides the matrix of scatter plots showing the distribution of the selected training data in the fourdimensional domain of offshore conditions.This indicates that the second term A 2 (see Eq. 5b), linked with the global exploration of the design space, dominates the selection.The exploration of the domain preferably operates by fixing the first and second coordinates in the vicinity of the bounds (H s = 7 m and ξ o = 1.683 m) and by exploring the two remaining dimensions (T p and θ).
A second validation test is performed as follows.The four-dimensional kriging model (constructed with 20 initial training samples and 20 additional adaptively selected ones, Fig. 7b, bottom) provides the best estimate of the critical frontier c .A set of new test data has then been generated by randomly selecting 60 different combinations in the fourdimensional domain of onshore conditions, which satisfy ξm (x) = ξ c = 2.15 m, i.e. located on the best estimate of c .For each selected combination, a hydrodynamic simulation is effectively run.The numerically computed water levels at the coast are then compared to ξ c = 2.15 m.The root-meansquare error reaches 0.0107 m, hence demonstrating the "satisfactory" prediction quality of the kriging meta-model.Interestingly, this good performance is achieved by running the numerical model only 40 times, hence corresponding to a total computation cost of about 12 h using a single CPU.This should be compared to the computation time cost of about two weeks that would have been achieved if we had used a computation grid architecture composed of at least 500 CPU.
Finally, though the results are satisfactory using the idealized site on the Mediterranean coast, it should be underlined that for any real-case application, carefully addressing the robustness of the sequential sampling procedure with respect to the procedure of estimation of the covariance parameters of the kriging meta-model remains the key ingredient of the proposed meta-modelling strategy.

Concluding remarks and further works
In the present article, we described a methodology to overcome the computation challenge of extracting the critical frontier, corresponding to the set of offshore conditions leading to high water level at the coast, from complex computational, time-consuming, hydrodynamic simulations using a fine design grid.Such information is of primary importance either to constrain early warning systems based on hydrometeorological forecast or observations, or to contribute to the return period calculation of the hazard level at the coast.Besides, it can eventually support the implementation of innovative "inverse" risk assessment methodology as recently proposed for fluvial inundation (Cunderlink and Simonovic, 2007) or for coastal flooding (Idier et al., 2010).
The proposed strategy relies on the kriging metamodelling combined with an adaptive sampling approach using the expected improvement function developed by Ranjan et al. (2008) for contour extraction from complex computer codes.The strategy was applied on a simplified study site at the Mediterranean coast in southern France.The efficiency of the meta-modelling strategy was demonstrated by comparing the results: (1) from a strategy relying on computing grid architecture composed of a moderate number of CPU; (2) from a "crude" meta-modelling approach with no adaptive sampling.For both test cases (two and four dimensional), we showed that the critical offshore conditions can be estimated with a good level of accuracy compared to the reference contour using a very limited number (of a few tens) of computationally intensive hydrodynamic simulations.
Yet, the computation provided in this study was performed for a single observation point at the coast, whereas risk management of coastal flooding requires the spatial extent of the inundation.In future research, it would be worthwhile to address this spatial issue relying for instance on meta-models with spatially distributed outputs and taking advantage of the recent advances in the meta-modelling community (see e.g.Marrel et al., 2011).
vector of generalised least square estimates of β.
where H D =[h(x (1) ) ; h(x (2) ) ; . . .; h(x (n0) )] T is the design matrix of n 0 rows and p columns, and c(x*) is the covariance vector between the test candidate x* and the training samples.Note that the above equations can be easily written using the correlation matrix R D using its relationship between C D and σ 2 Z .The specific case, where the basis functions reduce to a unique constant function, corresponds to the "ordinary kriging" as described in Eqs. ( 2) and (3). 28

Figure 1 .Fig. 1 .
Figure caption 1 2 n 0 where r(x * ) is the correlation vector between the test candidate x * and the training samples; R D is the correlation matrix of the training samples X D and I is the unit matrix of size n 0 .

Fig. 2 .Figure 3 .Fig. 3 .
Fig. 2. (A) Location of the study site and observation point where the calculations are carried out; (B) SWAN 2-D domain model of the study site.

Figure 4 .
Figure 4. A) leave-one-out cross-validation for the construction of the initial kriging model; Black dots are the water level estimations.B) Best estimate of  c in the { o  H s } domain associated with  c =2.15m (green solid line) using a kriging model constructed with 10 training data (black dots) selected randomly from a LHS technique.The associated pair of contours at confidence level 95% (defined as the kriging mean (Eq.2) +/-twice the square root of the variance (Eq.3)) is represented by dashed green lines.The reference contour (estimated using computation grid architecture) is represented by a red solid line.

Fig. 4 .
Fig. 4. (A) leave-one-out cross-validation for the construction of the initial kriging model; black dots are the water level estimations.(B) Best estimate of c in the {ξ o ; H s } domain associated with ξ c = 2.15 m (green solid line) using a kriging model constructed with 10 training data (black dots) selected randomly from a LHS technique.The associated pair of contours at confidence level 95 % (defined as the kriging mean (Eq.2) ± twice the square root of the variance (Eq.3)) is represented by dashed green lines.The reference contour (estimated using computation grid architecture) is represented by a red solid line.
Figure 5. A) Expected improvement function values (scaled between 0 and 1) for the first 3 iteration.The best candidate to be selected for simulation is depicted by a green dot.The 4 reference contour is represented by a red solid line; B) The critical frontier c estimated using 5 a kriging model constructed with the initial 10 training data and the selected training data.The 6 associated pair of contours at confidence level 95% (defined as the kriging mean (Eq.2) +/-7 twice the square root of the variance (Eq.3)) is represented by dashed green lines.C) 8 Expected improvement function values for the second sampling iteration; D) The critical 9 frontier c estimated using a kriging model constructed with the initial 10 training data and 10 the selected training data for the two first sampling iterations.11 12

Figure 6 .Fig. 6 .Figure 7 .Fig. 7 .Figure 8 .Fig. 8 .
Figure 6.Best estimate of the two-dimensional c (green solid line) in the {o  Hs} domain.3Thecorresponding pair of contours at confidence level of 95% (defined as the kriging mean 4 (Eq.2) +/-twice the square root of the variance (Eq.3)) is represented by green dashed lines; 5The red solid line is the reference contour.A) The training data is composed of an initial 6 budget of 10 samples (black dots) and 10 additional samples (green dots) both selected using 7 LHS technique; B) The training data is composed of an initial budget of 10 samples (black 8 dots, as for the case with no adaptive sampling) and 10 additional samples (green dots) 9 adaptively selected.10 11

Table 1 .
Sequential strategy for extracting the critical frontier.
c using a fine design grid in the offshore conditions space.