When choosing an appropriate hydrodynamic model, there is always a compromise between accuracy and computational cost, with high-fidelity models being more expensive than low-fidelity ones. However, when assessing uncertainty, we can use a multifidelity approach to take advantage of the accuracy of high-fidelity models and the computational efficiency of low-fidelity models. Here, we apply the multilevel multifidelity Monte Carlo method (MLMF) to quantify uncertainty by computing statistical estimators of key output variables with respect to uncertain input data, using the high-fidelity hydrodynamic model XBeach and the lower-fidelity coastal flooding model SFINCS (Super-Fast INundation of CoastS). The multilevel aspect opens up the further advantageous possibility of applying each of these models at multiple resolutions. This work represents the first application of MLMF in the coastal zone and one of its first applications in any field. For both idealised and real-world test cases, MLMF can significantly reduce computational cost for the same accuracy compared to both the standard Monte Carlo method and to a multilevel approach utilising only a single model (the multilevel Monte Carlo method). In particular, here we demonstrate using the case of Myrtle Beach, South Carolina, USA, that this improvement in computational efficiency allows for in-depth uncertainty analysis to be conducted in the case of real-world coastal environments – a task that would previously have been practically unfeasible. Moreover, for the first time, we show how an inverse transform sampling technique can be used to accurately estimate the cumulative distribution function (CDF) of variables from the MLMF outputs. MLMF-based estimates of the expectations and the CDFs of the variables of interest are of significant value to decision makers when assessing uncertainty in predictions.

Throughout history, coastal zones have been attractive regions for human settlement and leisure due to their abundant resources and the possibilities they offer for commerce and transport. Nevertheless, living in coastal zones has always come with the risk of coastal flooding hazards, for example, from storm surges as well as wave run-up and overtopping. Hydrodynamic models can simulate these hazards, but these predictions are often uncertain

We take an alternative approach and instead compute statistical estimators using the relatively novel multilevel multifidelity Monte Carlo (MLMF) method, developed in

MLMF does not aim to improve the accuracy relative to using a standard Monte Carlo method on the high-fidelity model but to instead use a lower-fidelity model to accelerate the approach and thus make uncertainty studies computationally feasible. Therefore the key to the successful application of MLMF is choosing an accurate high-fidelity model and an appropriate lower-fidelity model, which reasonably approximates the high-fidelity one. Coastal flood modelling is therefore an ideal field on which to apply MLMF because there exist a large number of high-fidelity but computationally expensive full physics models such as XBeach

In our work, we choose the depth-averaged finite-volume-based coastal ocean model XBeach as our high-fidelity model because it can parameterise unresolved wave propagation, such as wind-driven wave fields, and has been successfully used numerous times in the coastal zone to simulate wave propagation and flow including, for example, in

The aim of this work is to explore how MLMF can be applied to complex hydrodynamic coastal ocean models to investigate within a reasonable timeframe, the impact of a variety of uncertain input parameters, such as wave height and bed slope angle, whilst maintaining accuracy relative to the standard Monte Carlo method. We apply MLMF to both idealised and real-world test cases, some of which would have been impractical and unrealistic to run using standard Monte Carlo methods due to huge computational costs. In many of these test cases, we conduct a valuable spatial uncertainty analysis of the coastal flooding, by calculating the expected value of output variables simultaneously at multiple locations. Like other Monte Carlo type methods, MLMF quantifies uncertainty by computing estimators of the expected value of key output variables with respect to uncertain input parameters. However, in this work we also modify the inverse transform sampling method from

Example illustration of how MLMF's multifidelity multilevel approach using SFINCS and XBeach models on different grid resolutions results in computational cost savings. Note that the € symbol indicates the order of magnitude of the computational cost for a single simulation with this model at this grid resolution; i.e. €€ indicates

The remainder of this work is structured as follows: in Sect.

As discussed in Sect.

Using a coarse grid and/or simpler model gives an estimate which is cheap to compute but (more) incorrect and thus has an error. This error can be corrected by estimating the difference between the low- and high-fidelity models and/or the different resolutions and adding these on to the cheaply computed expectation. Key to the approach is the observation that estimating the difference requires fewer simulations than computing the full estimate, because the variance of the correction is (hopefully) smaller than the variance of the outputs. For the different grid resolutions, the correction is done by the telescoping sum of the multilevel Monte Carlo method (MLMC), while for the different fidelity models, the correction is done by control variate formulae. The challenge is composing these approaches so that we can do both, which is what MLMF seeks to do.

The theory for MLMF is the focus of Sect.

MLMF seeks to improve the efficiency of uncertainty analyses by running fewer simulations at the more expensive finer resolutions than at the cheaper coarser resolutions and by running fewer high-fidelity model simulations than low-fidelity ones (see Fig.

Flowchart of the multimodel approach to MLMF using HF (XBeach) and LF (SFINCS).

To fix ideas, we consider a hypothetical scenario, where the variable of interest is the water elevation height at a given location after a given time and the uncertain parameter is the friction coefficient which we assume follows a normal distribution. The desired grid resolution in our model is

We denote the MLMF estimator for the water depth at the finest grid resolution

Distribution of outputs generated by XBeach for the hypothetical scenario at the finest resolution and at the second finest resolution considered, as well as the distribution of the difference between these output values. Note that the distribution for the difference between the output values is much narrower, meaning fewer samples are required to get a good estimate of the mean. For reference, in this hypothetical scenario the variance at resolution

Combining multilevel estimators with the multifidelity control variate (Eq.

The other terms in Eq. (

To make calculating the variance of the water depth estimator simpler, we follow standard practice throughout and independently sample the values for the friction coefficient for each

Because

Note that when using the modified estimator of Eq. (

Finally, using Eqs. (

In order to obtain the optimum values for

Calculating Eq. (

Given the theory outlined above, the MLMF algorithm used in this study is summarised in Algorithm

Multilevel multifidelity Monte Carlo method.

Estimate the variance and cost of the MLMF estimator, as well as the correlation and cost ratio between the HF and LF models at user-specified levels using an initial estimate for the number of simulations. The same set of random numbers must be used for the HF and LF models.

Start with

Define optimal

If the optimal

If the optimal

If the algorithm has not converged and

If algorithm converged or

In this section so far, we have described the standard MLMF framework outlined in

To resolve this, in this work we develop our own novel technique to find the cumulative distribution function (CDF) from the MLMF outputs, using a modified version of the inverse transform sampling method from

In other words, suppose in our hypothetical scenario we have 100 values for the water elevation at a given location after a given time. Then this expression simply says that the value

In this work, we construct our own Python MLMF wrapper around both SFINCS and XBeach to implement the MLMF algorithm. This wrapper can be shared on distributed cores of an HPC (high-performance computing) cluster to increase efficiency. Given the use of distributed cores, any times quoted in this work are the total simulation times multiplied by the number of cores used. The different steps performed when running the models in the wrapper are illustrated in the flowchart of Fig.

For the models themselves, we use XBeach version 1.23.5526 from the XBeachX release and use the surfbeat mode to simulate the waves approaching the beach

We can now apply the outlined MLMF algorithm to both idealised and real-world coastal flooding test cases to calculate the expectation of an output variable at multiple locations based on uncertain input data. Note that throughout, for simplicity, we only consider one uncertain input parameter per test case (see Sect.

In the first 1D test case, the water level is estimated at various locations as a result of a propagating non-breaking wave entering a domain under an uncertain Manning friction coefficient (Sect.

For our first test case, we consider the 1D case of a non-breaking wave propagating over a horizontal plane from

The advantage of this test case is that, due to the horizontal slope and the constant velocity condition at the inlet, the inviscid shallow-water equations can be solved analytically with the following result:

Before running the full MLMF algorithm, we run a small test using a spatially uniform Manning friction coefficient of 0.0364

Comparing the final water elevation from using SFINCS and XBeach at

Comparing the real and modified correlation values between SFINCS and XBeach to find water elevation at specific locations in the non-breaking wave test case. Note that each colour represents a specific output location.

For our MLMF simulation, we use grids with

Summary of average time taken to run SFINCS and XBeach at each level for the non-breaking wave test case. As can be seen from Eqs. (

We can thus proceed to the next steps of the MLMF algorithm and compare our MLMF results to the analytical estimate (recall this is an estimate of the expected value of the true solution rather than the true expected value because of the uncertainty in

Error (RMSE) with respect to the analytical estimate for the final water elevation at the locations of interest in the non-breaking wave test case, as the resolution level becomes finer. The

The error to the analytical estimate shown in Fig.

Error (RMSE) between the MLMF result and the Monte Carlo (MC) result for the analytical estimate for the final water elevation at the locations of interest in the non-breaking wave test case, as the resolution level becomes finer. The

All the test case results shown so far in this section use the same accuracy tolerance of

Error (RMSE) between the MLMF result and the Monte Carlo (MC) result as the tolerance value

Optimum number of XBeach (HF) simulations required by MLMF (Eq.

Factor of total LF simulations (

As discussed in Sect.

CDFs generated from MLMF outputs using the modified inverse transform sampling method (Eq.

Finally, throughout this section, we have assumed that either we can approximate the expected value of the true solution or we can approximate the expected value of the XBeach simulation by using Monte Carlo with large numbers of simulations at fine resolutions. However, if MLMF is to be of use, we need to apply it to cases where the “true” value is not known. In these cases, the only parameter the practitioner can use to check accuracy is the tolerance value

Comparing the computational cost required to achieve tolerance

For our second test case, we consider the 1D Carrier–Greenspan test case, first introduced in

In this section, we evaluate the uncertainty associated with the linear bed slope and assume it has a normal distribution, with slope

Comparing the maximum elevation height achieved at every point in the domain over the entire simulation when using SFINCS and XBeach at

When a wave runs up a slope, often the quantity of most interest is not the water depth at a particular location in time but, instead, the (maximum) run-up height. Thus, for this test case, our quantity of interest is the maximum run-up height over the whole simulated period. Here we take the run-up height to be the water elevation above a fixed datum in the last wet cell in the domain (water depths higher than 0.005 m). We first test how the maximum elevation height over the whole domain depends on the resolution and model used in the simulation. Figure

Summary of average time taken to run SFINCS and XBeach at each level for the Carrier–Greenspan test case. As can be seen from Eqs. (

Comparing the real and modified correlation values between SFINCS and XBeach to determine the maximum run-up height in the Carrier–Greenspan test case.

For our MLMF simulation, we use grids with

Running the next steps in the MLMF algorithm, we can compare our MLMF results to the analytical estimate and to the Monte Carlo result estimated using 400 000 simulations of XBeach at the finest resolution (512 mesh cells in the

Error (RMSE) in the maximum run-up height as the resolution level becomes finer for the Carrier–Greenspan test case. The

Error (RMSE) between the MLMF result and the Monte Carlo (MC) result for the Carrier–Greenspan test case as the tolerance value

So far in this section we have only considered a single tolerance value. Therefore, we re-run this test case using different tolerance values

Optimum number of XBeach (HF) simulations required by MLMF (Eq.

Factor of total LF simulations (

As in the previous test case, we can apply the modified inverse transform sampling method to the MLMF output (from using

CDFs generated from MLMF outputs using the modified inverse transform sampling method (Eq.

Finally, as discussed in the previous test case, in reality, the “true” value of the quantity of interest is not always known, and the only parameter available to check accuracy is the tolerance value

Comparing the computational cost required to achieve tolerance

The test cases considered so far in this work have been relatively simple 1D idealised test cases. For our final test case, we consider the real-world test case of a dune system near Myrtle Beach, South Carolina, USA (see Fig.

Location of area of interest in the Myrtle Beach test case. Source: © Google Maps 2021.

Bed level data for original non-extended domain of the Myrtle Beach test case with locations of interest marked with a circle. The locations are colour-coded, and these colours are used to represent them throughout this section. Note that

As in the previous test cases, we use XBeach as the HF model and SFINCS as the LF model in the MLMF algorithm. To simulate the waves in XBeach we use the surfbeat model mode with the JONSWAP (Joint North Sea Wave Project) wave spectrum

Varying cross-shore grid resolution based on the bed level in the extended XBeach domain. In the original non-extended domain, the cross-shore resolution is 2

As this is a real-world study, we must also consider tides. These tides can have a large impact on coastal flooding, and thus, for this test case, we evaluate the uncertainty in the maximum tide height

Maximum elevation height (relative to a fixed datum) from an example XBeach simulation for the Myrtle Beach test case, showing overtopping. This has been simulated using the grid resolution from Fig.

With the setup of the test case complete, we now consider the MLMF setup. For the MLMF simulation, we use grids with

Comparing the maximum water depth achieved at the eight locations of interest using SFINCS and XBeach at two different resolutions, for the Myrtle Beach test case. A maximum tide height of

Table

Summary of average time taken to run SFINCS and XBeach at each level for the Myrtle Beach test case. As can be seen from Eqs. (

As with the previous test cases, before running the full MLMF algorithm, we first analyse the values of key MLMF parameters determined in Step 1 of the algorithm (Algorithm

Comparing the real and modified correlation values between SFINCS and XBeach to determine maximum water elevation at eight specific locations in the Myrtle Beach test case.

Before running the full MLMF algorithm, we also consider how to assess the accuracy of the MLMF algorithm for this test case. This is a complex computationally expensive real-world problem for which there is no analytical solution and for which approximating a “true” solution using the standard Monte Carlo method at the finest resolution considered is impractical (each simulation of XBeach at this resolution takes on average 3 d). Therefore, to assess accuracy, we use the following general theoretical formula for the root mean squared error (RMSE):

Comparing the different rates at which variance of the MLMF estimator (Eq.

Given these promising results, we can now run the full MLMF algorithm (Algorithm

More significantly, Fig.

Spatial representation of the expected value of the maximum elevation height estimated using MLMF with a tolerance of

Optimum number of XBeach (HF) simulations required by MLMF (Eq.

Optimum number of XBeach (HF) simulations required by MLMC divided by the optimum number required by MLMF for the Myrtle Beach test case. Here

Factor of total LF simulations (

As in previous test cases, we can apply the modified inverse transform sampling method to the MLMF outputs (here produced using

CDFs generated from MLMF outputs using the modified inverse transform sampling method (Eq.

Finally, we also consider how different tolerance values

Comparing the computational cost required to achieve tolerance

This works aims to be a proof of concept demonstrating that MLMF can be used for coastal flooding. Thus, whilst in real-world cases there will be more than one uncertain input, to meet this aim it is sufficient to consider only one uncertain input parameter per test case. Adding more uncertain inputs would increase the variance of the outputs, and thus all methods would require larger numbers of simulations and be more computationally expensive. Note, however, that the methodology outlined in Sect.

Furthermore, for all methods in this work, we assess the impact of uncertain input parameters by randomly sampling values from a user-chosen distribution and then running the models with these parameter values. This again meets the aim of this work but is the simplest sampling approach. Nevertheless, the flexibility of MLMC and MLMF means that they can also be combined with other more sophisticated sampling techniques that can further reduce the number of model simulations needed. These complex techniques are out of scope for this work, but we remark briefly upon them here. One such technique is Latin hypercube sampling

There are also other common techniques to improve the efficiency of assessing uncertainty such as the Markov chain Monte Carlo (MCMC) method and using machine learning techniques as emulators. As with the sophisticated sampling techniques, these can also be combined with MLMC and/or MLMF to improve the methods further: both multilevel Markov chain Monte Carlo algorithms

We conclude this section by observing that, although there are more sophisticated techniques to assess uncertainty than that applied in this work, the flexibility of the MLMF algorithm means that it can easily be combined with other more complex statistical approaches, leveraging the advantages of both approaches. Whilst these combined approaches are beyond the scope of this work, using these techniques on coastal problems is an interesting and promising avenue for further research.

In this work, we have presented the first successful application of MLMF in the field of coastal engineering and one of the first successful applications of this method in any field. Using both idealised and real-world test cases, we have shown that MLMF can significantly improve the computational efficiency of uncertainty quantification analysis in coastal flooding for the same accuracy compared to the standard Monte Carlo method. In particular, we have demonstrated that this enables uncertainty analysis to be conducted in real-world coastal environments that would have been unfeasible with the statistical methods previously applied in this field. Using our new modified inverse transform sampling technique, we are also able to accurately generate the cumulative distribution function (CDF) for the output variables of interest, which is of great value to decision makers. Furthermore, the expected values and CDFs of output variables can be computed at multiple locations simultaneously with no additional computational cost, demonstrating the flexibility of MLMF. In future work, this will enable the construction of large-scale maps showing the expected value and CDF of variables of interest at all locations in the domain, facilitating accurate and timely decision-making. Furthermore, we have highlighted the benefits of using a multifidelity approach and shown that using SFINCS as an LF model and XBeach as an HF model makes MLMF notably more computationally efficient than MLMC for the same or higher accuracy. Multifidelity approaches thus represent a very rewarding avenue for further research, and our new model-independent easily applicable MLMF wrapper written as part of this work will greatly facilitate this research.

Finally, this efficient uncertainty quantification can be used in the future for risk estimation. The latter assumes that the same scenario happens repeatedly over a given time period (e.g. rain events over a year) and requires frequency information (e.g. how many times does a certain location get flooded per time period). Thus, the information gathered by using MLMF to probabilistically quantify the variation/uncertainty in the different scenarios (e.g. rainfall events) can be used in future work for risk estimation.

Generally, a multifidelity approach uses a low-fidelity model to generate surrogate approximations for the outputs of a high-fidelity model. If applied correctly, the resulting multifidelity estimator is then as accurate as the equivalent high-fidelity one. There exist a number of different multifidelity approaches

Equation (

Note that

The multilevel Monte Carlo method (MLMC) was first introduced in

MLMC accelerates the Monte Carlo method by considering the problem at different levels of resolution in a multilevel environment. It then uses linearity of expectations to transform this multiresolution expectation to a single expectation at the finest level

Here

Equivalently to Eq. (

Here

A key factor when using the MLMC estimator is determining the optimum number of simulations to run at each level

However, this formula requires initial estimates of

Note further that when we are estimating multiple outputs (i.e. when we consider multiple locations), we must calculate

We have now outlined MLMC and conclude with Algorithm

Multilevel Monte Carlo method.

Start with

Estimate the variance

Define optimal

If the optimal

If

If

The relevant code for the MLMF framework presented in this work is stored at

MCAC contributed to the conceptualisation of the project, formal analysis, investigation, methodology, visualisation, writing of the original draft, and reviewing and editing of the text. TWBL contributed to the conceptualisation of the project, methodology, visualisation, writing of the original draft, and reviewing and editing of the text. RTM and FLMD contributed to the conceptualisation of the project and reviewing and editing of the text. CJC and MDP contributed to the conceptualisation and supervision of the project and reviewing and editing of the text.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mariana C. A. Clare's work was funded through the EPSRC (Engineering and Physical Sciences Research Council) CDT (Centre for Doctoral Training) in the Mathematics of Planet Earth (grant no. EP/R512540/1). Colin J. Cotter and Matthew D. Piggott acknowledge funding from the Engineering and Physical Sciences Research Council (EPSRC) (grant nos. EP/L016613/1, EP/R007470/1 and EP/R029423/1). Tim W. B. Leijnse and Robert T. McCall acknowledge funding from the Deltares research programme “Natural Hazards”, and Ferdinand L. M. Diermanse acknowledges funding from the Deltares research programme “Risk Assessment and Management”.

This paper was edited by Animesh Gain and reviewed by two anonymous referees.