After decades of study and significant data collection of time-varying swash on sandy beaches, there is no single deterministic prediction scheme for wave runup that eliminates prediction error – even bespoke, locally tuned predictors present scatter when compared to observations. Scatter in runup prediction is meaningful and can be used to create probabilistic predictions of runup for a given wave climate and beach slope. This contribution demonstrates this using a data-driven Gaussian process predictor; a probabilistic machine-learning technique. The runup predictor is developed using 1 year of hourly wave runup data (8328 observations) collected by a fixed lidar at Narrabeen Beach, Sydney, Australia. The Gaussian process predictor accurately predicts hourly wave runup elevation when tested on unseen data with a root-mean-squared error of 0.18 m and bias of 0.02 m. The uncertainty estimates output from the probabilistic GP predictor are then used practically in a deterministic numerical model of coastal dune erosion, which relies on a parameterization of wave runup, to generate ensemble predictions. When applied to a dataset of dune erosion caused by a storm event that impacted Narrabeen Beach in 2011, the ensemble approach reproduced

Wave runup is important for characterizing the vulnerability of beach and
dune systems and coastal infrastructure to wave action. Wave runup is
typically defined as the time-varying vertical elevation of wave action
above ocean water levels and is a combination of wave swash and wave setup
(Holman, 1986; Stockdon et al., 2006). Most parameterizations of wave runup use deterministic equations that output a single value for either the maximum runup elevation in a given time period,

The resulting inadequacies of a single deterministic parameterization of wave runup can cascade up through the scales to cause error in any larger model that uses a runup parameterization. It therefore makes sense to clearly incorporate prediction uncertainty into wave runup predictions. In disciplines such as hydrology and meteorology, with a more established tradition of forecasting, model uncertainty is often captured by using ensembles (e.g., Bauer et al., 2015; Cloke and Pappenberger, 2009). The benefits of ensemble modelling are typically superior skill and the explicit inclusion of uncertainty in predictions by outputting a range of possible model outcomes. Commonly used methods of generating ensembles include combining different models (Limber et al., 2018) or perturbing model parameters, initial conditions, and/or input data (e.g., via Monte Carlo simulations; e.g., Callaghan et al., 2013).

An alternative approach to quantify prediction uncertainty is to incorporate scatter about a mean prediction into model parameterizations. For example, wave runup predictions at every time step could be modelled with a deterministic parameterization plus a noise component that captures the scatter about the deterministic prediction caused by unresolved processes. If parameterizations are stochastic, or have a stochastic component, repeated model runs (given identical initial and forcing conditions) produce different model outputs – an ensemble – that represent a range of possible values the process could take. This is broadly analogous to the method of stochastic parameterization used in the weather forecasting community for sub-grid-scale processes and parameterizations (Berner et al., 2017). In these applications, stochastic parameterization has been shown to produce better predictions than traditional ensemble methods and is now routinely used by many operational weather forecasting centres (Berner et al., 2017; Buchanan, 2018).

Stochastically varying a deterministic wave runup parameterization to form an ensemble still requires defining the stochastic term – i.e., the stochastic element that should be added to the predicted runup at each model time step. An alternative to specifying a predefined distribution or a noise term added to a parameterization is to learn and parameterize the variability in wave runup from observational data using machine-learning techniques. Machine learning has had a wide range of applications in coastal morphodynamics research (Goldstein et al., 2019) and has shown specific utility in understanding swash processes (Passarella et al., 2018b; Power et al., 2019) as well as storm-driven erosion (Beuzen et al., 2017, 2018; den Heijer et al., 2012; Goldstein and Moore, 2016; Palmsten et al., 2014; Plant and Stockdon, 2012). While many machine-learning algorithms and applications are often used to optimize deterministic predictions, a Gaussian process is a probabilistic machine-learning technique that directly captures model uncertainty from data (Rasmussen and Williams, 2006). Recent work has specifically used Gaussian processes to model coastal processes such as large-scale coastline erosion (Kupilik et al., 2018) and estuarine hydrodynamics (Parker et al., 2019).

The work presented here is focused on using a Gaussian process to build a data-driven probabilistic predictor of wave runup that includes estimates of uncertainty. While quantifying uncertainty in runup predictions from data is useful in itself, the benefit of this methodology is in explicitly including the uncertainty with the runup predictor in a larger model that uses a runup parameterization, such as a coastal dune erosion model. Dunes on sandy coastlines provide a natural barrier to storm erosion by absorbing the impact of incident waves and storm surge and helping to prevent or delay flooding of coastal hinterland and infrastructure (Mull and Ruggiero, 2014; Sallenger, 2000; Stockdon et al., 2007). The accurate prediction of coastal dune erosion is therefore critical for characterizing the vulnerability of dune and beach systems and coastal infrastructure to storm events. A variety of methods are available for modelling dune erosion, including simple conceptual models relating hydrodynamic forcing, antecedent morphology, and dune response (Sallenger, 2000); empirical dune-impact models that relate time-dependent dune erosion to the force of wave impact at the dune (Erikson et al., 2007; Larson et al., 2004; Palmsten and Holman, 2012); data-driven machine-learning models (Plant and Stockdon, 2012); and more complex physics-based models (Roelvink et al., 2009). In this study, we focus on dune-impact models, which are simple, commonly used models that typically rely on a parameterization of wave runup to model time-dependent dune erosion. As inadequacies in the runup parameterization can jeopardize the success of model results (Overbeck et al., 2017; Palmsten and Holman, 2012; Splinter et al., 2018), it makes sense to use a runup predictor that includes prediction uncertainty.

The overall aim of this work is to demonstrate how probabilistic data-driven
methods can be used with deterministic models to develop ensemble predictions, an idea that could be applied more generally to other numerical models of geomorphic systems. Section 2 first describes the Gaussian process model theory. In Sect. 3 the Gaussian process runup predictor is developed. In Sect. 4 an example application of the Gaussian process predictor of runup inside a morphodynamic model of coastal dune erosion to build a hybrid model (Goldstein and Coco, 2015; Krasnopolsky and Fox-Rabinovitz, 2006) that can generate ensemble output is presented. A discussion of the results and technique is provided in Sect. 5 followed by conclusions in Sect. 6. The data and code used to develop the Gaussian process runup predictor in this paper are publicly available at

Gaussian processes (GPs) are data-driven, non-parametric models. A brief introduction to GPs is given here; for a more detailed introduction the reader is referred to Rasmussen and Williams (2006). There are two main approaches to determine a function that best parameterizes a process over an input space: (1) select a class of functions to consider, e.g., polynomial functions, and best fit the functions to the data (a parametric approach); or (2) consider all possible functions that could fit the data, and assign higher weight to functions that are more likely (a non-parametric approach) (Rasmussen and Williams, 2006). In the first approach it is necessary to decide on a class of functions to fit to the data – if all or parts of the data are not well modelled by the selected functions, then the predictions may be poor. In the second approach there is an infinite set of possible functions that could fit a dataset (imagine the number of paths that could be drawn between two points on a graph). A GP addresses the problem of infinite possible functions by specifying a probability distribution over the space of possible functions that fit a given dataset. Based on this distribution, the GP quantifies what function most likely fits the underlying process generating the data and gives confidence intervals for this estimate. Additionally, random samples can also be drawn from the distribution to provide examples of what different functions that fit the dataset might look like.

A GP is defined as a collection of random variables, any finite set of which
has a multivariate Gaussian distribution. The random variables in a GP represent the value of the underlying function that describes the data,

Whereas a univariate Gaussian distribution is defined by a mean and variance
(i.e., (

These concepts of GP development are further described using a hypothetical
dataset of significant wave height (

There are many different types of covariance functions or kernels. One
of the most common, and the one used in this study, is the squared
exponential covariance function:

The goal is to determine which of these functions actually fit the observed
data points (training data) in Fig. 1a. This can be achieved by forming a posterior distribution on the function space by conditioning the prior with the training data. Roughly speaking, this operation is mathematically equivalent to drawing an infinite number of random functions from the multivariate Gaussian prior (Eq. 3) and then rejecting those that do not agree with the training data. As mentioned above, the multivariate Gaussian offers a simple, closed form solution to this conditioning. Assuming that our observed training data are noiseless (i.e.,

As stated earlier, in Eq. (4) and Fig. 1c there is an assumption that the training data are noiseless and represent the exact value of the function at the specific point in input space. In reality, there is error associated with observations of physical systems, such that

In Eq. (7) there are three hyperparameters: the signal variance (

It is standard practice in the development of data-driven machine-learning
models to divide the available dataset into training, validation, and testing
subsets. The training data are used to fit model parameters. The validation
data are used to evaluate model performance, and the model hyperparameters are usually varied until performance on the validation data is optimized. Once
the model is optimized, the remaining test dataset is used to objectively
evaluate its performance and generalizability. A decision must be made about
how to split a dataset into training, validation, and testing subsets. There
are many different approaches to handle this splitting process; for example,
random selection, cross validation, stratified sampling, or a number of
other deterministic sampling techniques (Camus et al., 2011). The exact
technique used to generate the data subsets often depends on the problem at
hand. Here, there were two constraints to be considered: first, the
computational expense of GPs scales by

The MDA is a deterministic routine that iteratively adds a data point to the
training set based on how dissimilar it is to the data already included in
the training set. Camus et al. (2011) provide a comprehensive introduction to the MDA selection routine, and it has been previously used in machine-learning studies (e.g., Goldstein et al., 2013). Briefly, to initialize the MDA routine, the data point with the maximum sum of dissimilarity (defined by Euclidean distance) to all other data points is selected as the first data point to be added to the training dataset. Additional data points are included in the training set through an iterative process whereby the next data point added is the one with maximum dissimilarity to those already in the training set – this process continues until a user-defined training set size is reached. In this way the MDA routine produces a set of training data that captures the range of variability present in the full dataset. The data not selected for the training set are equally and randomly split to form the validation dataset and test dataset. While alternative data-splitting routines are available, including simple random sampling, stratified random sampling, self-organizing maps, and

In 2014, an extended-range lidar (light detection and ranging) device (SICK LD-LRS 2110) was permanently installed on the rooftop of a beachside building (44 m a.m.s.l. – above mean sea level) at Narrabeen–Collaroy Beach (hereafter referred to simply as Narrabeen) on the southeast coast of Australia (Fig. 2). Since 2014, this lidar has continuously scanned a single cross-shore profile transect extending from the base of the beachside building to a range of 130 m, capturing the surface of the beach profile and incident wave swash at a frequency of 5 Hz in both daylight and non-daylight hours. Specific details of the lidar setup and functioning can be found in Phillips et al. (2019).

Narrabeen Beach is a 3.6 km long embayed beach bounded by rocky headlands.
It is composed of fine to medium quartz sand (

Individual wave runup elevation on the beach profile was extracted on a
wave-by-wave basis from the lidar dataset (Fig. 2c) using a neural network
runup detection tool developed by Simmons et al. (2019). Hourly

Histograms of the 8328 data samples extracted from the Narrabeen
lidar:

To determine the optimum training set size, kernel, and model hyperparameters, a number of different user-defined training set sizes were trialled using the MDA selection routine discussed in Sect. 2.3. The GP was trained using different amounts of data and hyperparameters were optimized on the validation dataset only. It was found that a training set size of only 5 % of the available dataset (training dataset: 416 of 8328 available samples; validation dataset: 3956 samples; testing dataset: 3956 samples) was required to develop an optimum GP model. Training data sizes beyond this value produced negligible changes in GP performance but considerable increases in computational demand, similar to findings of previous work (Goldstein and Coco, 2014; Tinoco et al., 2015). Results presented below discuss the performance of the GP on the testing dataset which was not used in GP training or validation.

Results of the GP

Observed 2 % wave runup (

As discussed in Sect. 1 scatter in runup predictions is likely a result of unresolved processes in the model such as wave dispersion, wave spectrum, nearshore morphology, or a range of other possible processes. Regardless of the origin, here this scatter (uncertainty) is used to form ensemble predictions. The GP developed here not only gives a mean prediction as used in Fig. 4, but it specifies a multivariate Gaussian distribution from which different random functions that describe the data can be sampled. Random samples of wave runup from the GP can capture uncertainty around the mean runup prediction (as was demonstrated in the hypothetical example in Fig. 1d). To assess how well the GP model captures uncertainty, random
samples are successively drawn from the GP, and the number of

We use the dune erosion model of Larson et al. (2004) as an example of how the GP runup predictor can be used to create an ensemble of dune erosion
predictions, and we thus provide probabilistic outcomes with uncertainty bands
needed in coastal management. The dune erosion model is subsequently
referred to as LEH04 and is defined as follows:

In June 2011 a large coastal storm event impacted the southeast coast of
Australia. This event resulted in variable alongshore dune erosion at Narrabeen Beach, which was precisely captured by airborne lidar immediately
pre-, during, and post-storm by five surveys conducted approximately 24 h apart. Cross-shore profiles were extracted from the lidar data at 10 m
alongshore intervals as described in detail in Splinter et al. (2018),
resulting in 351 individual profiles (Fig. 6). The June 2011 storm lasted 120 h. Hourly wave data were recorded by the Sydney Waverider buoy located in

June 2011 storm data.

For each of the 351 available profiles, the pre-, during, and post-storm dune toe positions were defined as the local maxima of curvature of the beach profile following the method of Stockdon et al. (2007). Dune erosion at each profile was then defined as the difference in subaerial beach volume landward of the pre-storm dune toe, as shown in Fig. 6c. Of the 351 profiles, only 117 had storm-driven dune erosion (Fig. 6b). For the example demonstration presented here, only profiles for which the post-storm dune toe elevation was at the same or higher elevation than the pre-storm dune toe are considered, which is a basic assumption of the LEH04 model. Of the 117 profiles with storm erosion, 40 profiles met these criteria. For each of these profiles, the linear slope between the pre- and post-storm dune toe was used to project the dune erosion calculated using the LEH04 model.

The LEH04 dune erosion model (Eq. 9) has a single tuneable parameter, the transport coefficient

An example at a single profile (profile 141, located approximately half-way
up the Narrabeen embayment as shown in Fig. 6b) of time-varying ensemble dune erosion predictions is provided in Fig. 7. It was previously shown in Fig. 5 that only 10 random samples drawn from the GP

Example of LEH04 used with the Gaussian process

Pre- and post-storm dune erosion results for the 40 profiles using 10 000 ensemble members and

Observed (pink dots) and predicted (black dots) dune erosion volumes for the 40 modelled profiles, using 10 000 runup models drawn from the Gaussian process and used to force the LEH04 model. Note that the 40 profiles shown are not uniformly spaced along the 3.5 km Narrabeen embayment. The black dots represent the ensemble mean prediction for each profile, while the shaded areas represent the regions captured by 66 %, 90 %, and 99 % of the ensemble predictions.

In Sect. 4.2, the application of the GP runup predictor within the LEH04 model to produce an ensemble of dune erosion predictions was based on 10 000 ensemble members and a

Results of the stochastic parameterization methodology for

The key utility to using a data-driven GP predictor to produce ensembles is
that a range of predictions at every location is provided as opposed to a
single erosion volume. The ensemble range provides an indication of
uncertainty in predictions, which can be highly useful for coastal engineers
and managers taking a risk-based approach to coastal hazard management.
Figure 9b–d displays the percentage of dune erosion observations from the 40 profiles captured within ensemble predictions for

Results in Fig. 9 and Table 1 demonstrate that there is relatively little difference in model performance when more than 10 to 100 ensemble members are used, which is consistent with results presented previously in Fig. 5 that showed that only 10 random samples drawn from the GP

Quantitative summary of Fig. 9, showing the optimum

Studies of commonly used deterministic runup parameterizations such as those proposed by Hunt (1959), Holman (1986), and Stockdon et al. (2006), amongst others, show that these parameterizations are not universally applicable and there remains no perfect predictor of wave runup on beaches (Atkinson et al., 2017; Passarella et al., 2018a; Power et al., 2019). This suggests that the available parameterizations do not fully capture all the relevant processes controlling wave runup on beaches (Power et al., 2019). Recent work has used ensemble and data-driven methods to account for unresolved factors and complexity in runup processes. For example, Atkinson et al. (2017) developed a model of models by fitting a least-squares line to the predictions of several runup parameterizations. Power et al. (2019) used a data-driven, deterministic, gene-expression programming model to predict wave runup against a large dataset of runup observations. Both of these approaches led to improved predictions, when compared to conventional runup parameterizations, of wave runup on the datasets tested in these studies.

The work presented in this study used a data-driven Gaussian process (GP)
approach to develop a probabilistic runup predictor. While the mean
predictions from the GP predictor developed in this study using
high-resolution lidar data of wave runup were accurate (RMSE

Uncertainty in wave runup predictions within dune-impact models can result in significantly varied predictions of dune erosion. For example, the model of Larson et al. (2004) used in this study only predicts dune erosion if runup elevation exceeds the dune toe elevation and predicts a non-linear relationship between runup that exceeds the dune toe and resultant dune erosion. Hence, if wave runup predictions are biased too low then no dune erosion will be predicted, and if wave runup is predicted too high then dune erosion may be significantly over predicted. Ensemble modelling has become standard practice in many areas of weather and climate modelling (Bauer et al., 2015), as well as hydrological modelling (Cloke and Pappenberger, 2009), and more recently has been applied to coastal problems such as the prediction of cliff retreat (Limber et al., 2018) as a method of handling prediction uncertainty. While using a single deterministic model is computationally simple and provides one solution for a given set of input conditions, model ensembles provide a range of predictions that can better capture the variety of mechanisms and stochasticity within a coastal system. The result is typically improved skill over deterministic models (Atkinson et al., 2017; Limber et al., 2018) and a natural method of providing uncertainty with predictions.

As a quantitative comparison, Splinter et al. (2018) applied a modified
version of the LEH04 model to the same June 2011 storm dataset used in the
work presented here with a modified expression for the collision frequency
(i.e. the

The GP approach is a novel approach to building model ensembles to capture uncertainty. Previous work modelling beach and dune erosion has successfully used Monte Carlo methods, which randomly vary model inputs within many thousands of model iterations, to produce ensembles and probabilistic erosion predictions (e.g., Callaghan et al., 2008; Li et al., 2013; Ranasinghe et al., 2012). As discussed earlier in Sect. 4.3, the GP approach differs from Monte Carlo in that it explicitly quantifies uncertainty directly from data, does not use deterministic equations, and can be computationally efficient.

For coastal managers, the accurate prediction of wave runup as well as dune
erosion is critical for characterizing the vulnerability of coastlines to
wave-induced flooding, erosion of dune systems, and wave impacts on adjacent
coastal infrastructure. While many formulations for wave runup have been
proposed over the years, none have proven to accurately predict runup over a
wide range of conditions and sites of interest. In this contribution, a
Gaussian process (GP) with over 8000 high-resolution lidar-derived
wave runup observations was used to develop a probabilistic
parameterization of wave runup that quantifies uncertainty in runup predictions. The mean GP prediction performed well on unseen data, with a RMSE of 0.18 m, which is a significant improvement over the commonly used

Coastal dune-impact models offer a method of predicting dune erosion
deterministically. As an example application of how the GP runup predictor
can be used in geomorphic systems, the uncertainty in the runup parameterization was propagated through a deterministic dune erosion model
to generate ensemble model predictions and provide prediction uncertainty.
The hybrid dune erosion model performed well on the test data, with a
squared correlation (

This work is an example of how a machine-learning model such as a GP can profitably be integrated into coastal morphodynamic models (Goldstein and Coco, 2015) to provide probabilistic predictions for nonlinear, multidimensional processes, and drive ensemble forecasts. Approaches combining machine-learning methods with traditional coastal science and management models present a promising area for furthering coastal morphodynamic research. Future work is focused on using more data and additional inputs, such as offshore bar morphology and wave spectra, to improve the GP runup predictor developed here, testing it at different locations and integrating it into a real-time coastal erosion forecasting system.

The data and code used to develop the Gaussian process runup predictor in this paper are publicly available at

The order of the authors' names reflects the size of their contribution to the writing of this paper.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Advances in computational modelling of natural hazards and geohazards”.

Wave and tide data were kindly provided by the Manly Hydraulics Laboratory under the NSW Coastal Data Network Program managed by OEH. The lead author is funded under the Australian Postgraduate Research Training Program.

This research has been supported by the Australian Research Council (grant nos. LP04555157, LP100200348, and DP150101339), the NSW Environmental Trust Environmental Research Program (grant no. RD 2015/0128), and the DOD DARPA (grant no. R0011836623/HR001118200064).

This paper was edited by Randall LeVeque and reviewed by two anonymous referees.