Following the disruption to European airspace caused by the eruption of Eyjafjallajökull in 2010 there has been a move towards producing quantitative predictions of volcanic ash concentration using volcanic ash transport and dispersion simulators. However, there is no formal framework for determining the uncertainties of these predictions and performing many simulations using these complex models is computationally expensive. In this paper a Bayesian linear emulation approach is applied to the Numerical Atmospheric-dispersion Modelling Environment (NAME) to better understand the influence of source and internal model parameters on the simulator output. Emulation is a statistical method for predicting the output of a computer simulator at new parameter choices without actually running the simulator. A multi-level emulation approach is applied using two configurations of NAME with different numbers of model particles. Information from many evaluations of the computationally faster configuration is combined with results from relatively few evaluations of the slower, more accurate, configuration. This approach is effective when it is not possible to run the accurate simulator many times and when there is also little prior knowledge about the influence of parameters. The approach is applied to the mean ash column loading in 75 geographical regions on 14 May 2010. Through this analysis it has been found that the parameters that contribute the most to the output uncertainty are initial plume rise height, mass eruption rate, free tropospheric turbulence levels and precipitation threshold for wet deposition. This information can be used to inform future model development and observational campaigns and routine monitoring. The analysis presented here suggests the need for further observational and theoretical research into parameterisation of atmospheric turbulence. Furthermore it can also be used to inform the most important parameter perturbations for a small operational ensemble of simulations. The use of an emulator also identifies the input and internal parameters that do not contribute significantly to simulator uncertainty. Finally, the analysis highlights that the faster, less accurate, configuration of NAME can, on its own, provide useful information for the problem of predicting average column load over large areas.

Volcanic ash is a significant hazard to aircraft, as well as human life, by reducing
visibility and causing both temporary engine failure and permanent engine
damage

In the event of an eruption, the decision to fly is informed by information
provided by one of the nine Volcanic Ash Advisory Centres (VAACs). The VAACs
issue hazard maps of predicted ash cloud extents based on forecasts from
volcanic ash transport and dispersion simulators (VATDs). After the
large-scale disruption caused by the 2010 Eyjafjallajökull eruption new
guidelines were brought in by EUROCONTROL (the European Organisation for the
Safety of Air Navigation) which require predictions of ash concentration
values as well as ash cloud extents. However, there are large uncertainties
in the VATD ash concentration forecasts. These uncertainties arise from a
number of sources, including incomplete or inaccurate knowledge of the
specific volcanic eruption (source uncertainty) and meteorological conditions
and other sources of parameter and forcing function uncertainty, as well as
particular physical processes being simplified or omitted (structural
uncertainty) in any particular simulator. Currently, no systematic estimation
of the resulting uncertainty is performed. This is a major limitation of the
operational system and as such there is the danger of making incorrect
decisions due to misjudging the accuracy of the simulator predictions.

There have been many studies investigating the processes that control the
long-range dispersion of volcanic ash. The majority of these studies focus on
a small number of simulator inputs or parameters and change the parameters
one at a time (OAT) to assess their impact on the predictions of volcanic ash
transport. These studies test the difference between the simulator output
from a control or baseline case and the output from the perturbed cases. This
approach is appealing as it always calculates the change in the simulator
away from a well-known baseline. Examples of studies that use this approach
are

Finally, the analysis cannot contribute to a formal overall assessment of uncertainty: uncertainty in application of computer models includes many sources, including parameter uncertainty, measurement uncertainty, uncertainty about missing processes or about limitations in modelled processes, and so on. OAT testing does not allow a formal methodology for assessing parameter uncertainty in a way that can be combined with these other sources. The emulation method that is presented in this paper gives assessments of uncertainty that can be combined easily with other sources (although actually performing such an assessment and combination is beyond the scope of this paper).

Performing sensitivity tests that cover a wide range of parameters and parameter values for a complex simulator, such as a VATD simulator, is expensive in both time and money because to perform such an analysis requires many simulator evaluations and hence very large computation time. This makes uncertainty quantification impractical as one can only afford a limited amount of simulator runs. Uncertainty and sensitivity analyses as well as calibration require a large number of runs. In this study we introduce the use of emulation to understand the sensitivity of an operational VATD simulator to source and internal simulator parameters.

An emulator is a simple statistical approximation of a complicated and
(typically) computationally expensive function, such as a computer simulator,
that can be evaluated almost instantly over the whole parameter space. The
emulator provides and an associated uncertainty for this prediction (this can
take the form of a full probability distribution or an expected value and
variance). This enables the quantification of the impact of each simulator
parameter on the prediction of the dispersion of volcanic ash. This approach
has been used successfully in tsunami modelling

The relationship between the simulator output and real-world observations does not have to be considered in order to build an emulator; the “observations” used to build the emulator are observations of simulator output, not real-world measurements.

Emulators have several main uses in analysing computer simulators. They can be used for calibration to determine which parameters lead to simulator output that reasonably matches observed data. They can also be used for forecasting the future behaviour of the system in question. Finally, as in this paper, they can be used as a research tool to better understand the simulator, the role of the parameters and the interactions between them and to help guide future research priorities.

Building emulators becomes more difficult when relatively few simulator evaluations (the “data” that are used to fit emulators) are available. In many cases, however, there will be a faster and more approximate simulator available. This is true for Numerical Atmospheric-dispersion Modelling Environment (NAME). A large number of runs of this more approximate simulator can be used to build a reliable emulator (for this simulator), and then a relatively small number of evaluations of the more accurate simulator can be used to refine this into an emulator for the accurate simulator. This approach, called multi-level emulation, is powerful but much less common in the literature. In this paper, the multi-level emulation method is adopted.

The aim of this paper is to demonstrate the potential of the multi-level
emulation approach applied to a VATD simulator. We use NAME, developed at the UK Met
Office

The paper is structured as follows. Section

NAME is the VATD simulator used by the London VAAC. It is a Lagrangian
particle dispersion model originally developed in response to the 1986 Chernobyl
disaster. Particles, each representing a mass of volcanic ash, are
released from a source. These particles are advected by 3-D wind fields
provided by forecasts or analyses from a numerical weather prediction (NWP)
model. The effect of turbulence is represented by stochastic additions to the
particle trajectories based on estimated turbulence levels. NAME also
includes parameterisations of sedimentation, dry deposition and wet
deposition which are required to simulate the dispersion and removal of
volcanic ash. The ash concentrations are calculated by summing the mass of
particles in the model grid boxes and over a specified time period. It is
important to note that some processes affecting the eruption plume are not
represented in NAME or not included in the NAME configurations used in this
study. Missing processes include aggregation of ash particles, near source
plume rise and processes driven by the eruption dynamics
(e.g.

To predict the transport and dispersion of ash, information about the
volcanic eruption is required. These are known as eruption source parameters (ESPs)
and include plume rise height, mass eruption rate (MER), vertical profile of
the plume emissions, particle density and particle size distribution (PSD). ESPs
are required to initialise the NAME simulations. Full details of the NAME
setup used by the London VAAC can be found in

The case study chosen here is that of 14 May 2010. This is during the later
phase of the Eyjafjallajökull eruption (14 April–23 May). Although this
later phase of the eruption did not have as much impact on the aviation
industry, it is very well observed using ground-based instruments
(e.g.

Summary of the parameters, default values and ranges used in this study.

n/a is not applicable.

Six ESPs and 12 internal simulator parameters were selected to represent the main uncertainties affecting the simulation of the dispersion of the volcanic ash in the NAME simulator. A short description of each parameter is given below along with an associated plausible range. The range represents our assessment of uncertainty on the value of each parameter. It is within these ranges that the training runs of the simulator will be performed in order to build the emulators. The uncertainty assessments were found through a small expert elicitation exercise in which information from relevant literature was combined with expert knowledge of NAME and its parameterisation schemes. Table 1 summarises the parameters and their plausible ranges. In this study we do not consider the impact of the meteorological data used to drive NAME. More detailed expert judgements on the relative plausibility of parameter choices are not required to build an emulator, although if available they could be used to adjust the training design.

This section describes in detail the parameters specific to the eruption source and how they are perturbed in the runs used to build the statistical emulator.

Plume height governs the height at which the ash particles are emitted into
the atmosphere. This can have a large impact on the horizontal and vertical
structure of the ash cloud as atmospheric wind speed and direction vary with
height. Therefore to simulate realistic dispersion following an eruption it
is necessary to know this height as accurately as possible. During the
2010 Eyjafjallajökull eruption information about the plume height was available
from the Iceland Meteorological Office's C-band radar based at Keflavík
Airport. However, there are time periods when no radar data were available.
This was due to a variety of factors including the plume being obscured by
meteorological cloud, missing radar scans and the fact that when the plume
height was below 2.5 km it could not be detected due to the orography in the
local area. When no observational plume height is available the last observed
value persists until a new observation is made. In this study we will be
using the data from the Keflavík radar

Currently there is no direct method of measuring how much mass is being
emitted from an erupting volcano. Therefore many VAACs use an empirical
relationship between the observed plume height and the eruption rate. There
are number of relationships in the literature relating these two quantities
(e.g.

In the simulations used here, only fine ash is represented with diameters
ranging from 0.1 to 100

The PSDs used in the simulations to build the emulator were formulated as
follows.

By default, the London VAAC modelling procedure assumes that ash particles
are spherical and have a density of 2300 kg m

The long-range transport of volcanic ash can be described by two sets of processes. The first set, advection and dispersion, represents the motion of the particles. The second set, loss processes, models how the ash is removed from the atmosphere. This section describes in detail the parameterisations and associated parameters in NAME that represent the two sets of processes.

The default input source PSD used in NAME by the London VAAC.

In NAME particles are advected in three dimensions by winds usually provided by a NWP model, with turbulent dispersion simulated by a random walk technique which represents the turbulent velocity structures in the atmosphere. Particles are advected each time step with the change in position involving contributions from the resolved wind velocity, the turbulence and the unresolved mesoscale motions.

The diffusion due to free tropospheric turbulence is specified by a
diffusivity,

Low-frequency horizontal eddies with scales that lie between the resolved
motions of the input meteorological data and the small three-dimensional
turbulent motions represented in the turbulence parameterisation scheme are
parameterised separately by the unresolved mesoscale motion scheme

This section describes the parameters associated with the processes that remove ash from the atmosphere. The loss processes represented in NAME are wet deposition and dry deposition (including sedimentation). Within NAME these losses are applied on a particle basis (i.e. the mass of each particle is reduced each time step).

Wet deposition is the process of ash depletion by precipitation in the
atmosphere. Two main processes are involved: washout, where material is
“swept out” by falling precipitation, and rainout, where ash is absorbed
directly into cloud droplets as they form by acting as cloud condensation
nuclei. Both of these processes are parameterised in NAME using a bulk
parameterisation. The removal of ash from the atmosphere by wet deposition
processes is based on the depletion equation

In NAME the wet deposition scheme is only used if the precipitation rate is
greater than a threshold value, ppt_crit. This acts as a filter to light
drizzle. The reason for applying this threshold is that historically there
has been an excessive light drizzle issue in the global version of the UK Met
Office NWP model

Dry deposition is the process by which material is removed from the
atmosphere by transport to, and subsequent uptake by, the ground in the
absence of precipitation. Dry deposition in NAME is parameterised through a
deposition velocity,

Sedimentation of ash is represented in NAME using a sedimentation velocity,

The true PSD of ash particles emitted during an
eruption includes extremely large particles that fall to the ground very
quickly. For forecasting the effects of the eruption on aviation only the
particles that will be transported large distances need to be considered.
These particles form the distal ash cloud. The fraction of the total emitted
ash that remains in this cloud is defined as the DFAF.
DFAF is difficult to determine as it requires accurate measurements
of the PSD and understanding of any aggregation
processes occurring. It is also possible for DFAF to vary over time and in
different parts of the ash cloud. Estimates of DFAF for the 2010 Eyjafjallajökull
eruption range from 0.7 to 18.5 %

In this study attention is focused on the ash cloud on 14 May 2010. The
simulator has been set up to provide ash predictions every hour at a
resolution of 0.375

NAME is not a fast simulator (each run of the simulator for this study took between half an hour and an hour), so it is not possible to evaluate it for very many different parameter sets. The number of NAME runs that were feasible was potentially insufficient to build the statistical models of interest.

Location of geographical regions used for comparison for each hour by longitude and latitude of the region corners.

However, a fast approximation of our standard NAME configuration could be
constructed by reducing the number of particles released in the simulator
from 10 000 per hour to 1000 per hour. This reduction means that “fast”
simulations take between 10 and 20 min to complete. This is a significant
decrease in running time but still not quick enough to apply standard global
sensitivity analysis techniques such as the Morris method (e.g.

For the fast simulator runs, 1500 parameter sets were chosen using a maximin
Latin hypercube design

Relationship between the slow simulator and fast simulator output
for

Each of the 75 regions exhibits one of three types of difference between the
two simulators. In some regions, the two simulators gave almost identical
results. In some regions, the two simulators gave very highly correlated
results, but not identical (i.e. one simulator's output is close to simply
being a multiple of the other's). In the remaining regions (typically those
with relatively little ash predicted) the output of the two simulators is
positively correlated, but not nearly so similar. Examples of the first and
third relationships can be seen in Fig.

Before proceeding, some notation should be introduced. A particular parameter
set is denoted by

The slow simulator is denoted by

In this notation, the goal is then to use the evaluations

An emulator is a simple statistical approximation of an expensive function

it evaluates quickly;

it is expressive enough to provide good approximations to the simulator and to allow meaningful prior judgements;

it predicts that

Here,

Building an emulator therefore involves using a collection of simulator runs

identify the basis functions

estimate the

fit the residual function

Computer simulator applications often involve a mixture of observed simulator
runs and expert knowledge, making a Bayesian framework a natural choice to
build emulators. However, specification of a full joint probability
distribution for the problem is difficult and often leads to computational
challenges. With enough simulator evaluations, a successful method for
fitting emulators has been to use a standard (non-Bayesian) least-square
regression (that is, with no prior) to estimate the

In this application, there are enough evaluations to build an emulator for
the

As with a full Bayes analysis, the Bayes linear analysis combines prior
judgements with observations (of the simulator) through simple equations.
Bayes linear analysis does not, however, require expert judgements to be
specified as a full joint prior probability distribution for all variables.
Rather, the experts need only to specify expectations, variances and
covariances for a few relevant quantities. Similarly, rather than a joint
posterior probability distribution, Bayes linear analysis leads to adjusted
expectations, variances and covariances for relevant quantities. Given a
vector of data

One-dimensional example of an emulator. The points represent the six
evaluations of

If

Then, given the results

As explained in the previous section, this study involves the further
simplification of replacing the Bayes linear determination of some of the
quantities with estimates from a least-square regression. The

Having built an emulator for the fast simulator in
Eq. (

Such a model will typically have some unknown parameters, and the Bayes
linear approach can be used again to learn about these parameters and hence
provide an adjusted expectation and variance for

Further, the coefficients

In this application, it turned out that this could be further simplified to

This model requires prior expectations, variances and covariances for the

With these choices, the Bayes linear adjustment for a new

It is important to check that an emulator is performing well before using it
to make predictions. There are several possible reasons an emulator would be
poor. The form of the mean function (that is, the component

The mean function plays a large role in these emulators. The usual diagnostics from linear models can be valuable in assessing the adequacy of the chosen mean function.

The coefficient of determination,

A simple and effective method of validation is leave-one-out validation. In
this procedure, all but one of the observed simulator runs are used to build
an emulator, and this emulator is used to predict the one run that was left
out. For

Number of outputs for which each parameter was judged active (and hence
included in the emulator for that output). Recall that

If this proportion of successful prediction is much lower than 95 %, this might signal a fundamental problem with the mean function and/or the form of the correlation function, but it can often simply signal a poor choice of correlation length. If the correlation length is too high, then the emulator variance will be too low and hence many observations of the simulator will be judged “too far” from the emulator predictions. In contrast, if the correlation length is too low, then the emulator will not be able to capture many patterns of local variation from the mean function that may be present (specifically, any such patterns that exist over distances much higher than the correlation length). It is often possible to tune the correlation length so that the proportion of successful validations is around 95 %.

We first consider the choice of basis functions

Inactive parameters can be identified by stepwise selection using criteria
such as adjusted

The result was that the chosen

Least-square regression was used to fit the above linear models. Each of
these linear models now gives an estimate for

The next step is to link the fast simulator to the slow simulator and use the
runs

Validation plot for the emulator (for the fast simulator) of the
first output. Emulator expected value for the parameter sets in

The emulator for the fast simulator is linked to that of the slow simulator
through Eq. (

With this model, the adjusted expectation and variance

Validation followed a similar method to that for the fast emulator. In this case, over the 75 regions, the proportion of successful predictions from the validation again ranged from 94.5 to 99 %.

For most emulators, the

The adjusted

Of all the parameters, the plume height drives the output most strongly,
followed by the MER and the precipitation threshold. In all
regions, the

Interactions between the parameters (that is, the terms of the form

NAME ash column loading for parameter choices with the highest and
lowest expected ash column loadings in the first geographical region at
00:00 UTC on 14 May. The contours are as in Fig.

The presence of these interactions makes interpretation of the parameters'
impact more difficult. For example, ignoring the interactions, it would
appear that increasing plume height increases column loading and increasing
the precipitation threshold increases column loading. Typically this is true,
but the presence of the negative interaction term means, firstly, that the
magnitude of these increases will not be as large as one might first expect
and, secondly, that there are some (

It should be pointed out that the above conclusions are not based directly on
(and cannot be made from) the values in Table

Average of the expected values of selected

The emulators provide insight into which areas of the parameter space will
lead to high values of simulated ash column loading and which areas will lead
to low values of ash column loading. As an extreme case, the parameters
giving the lowest and highest predictions of ash column loading can be
identified. This was done for the first hour of 14 May, giving two parameter
sets at which the simulator was evaluated. The results of these simulator
evaluations can be seen in Fig.

This is a function of the current NAME parameterisation of free tropospheric turbulence (i.e. the fact that NAME uses the same parameter everywhere).

Now, only a small region of this parameter space will lead to simulations that resemble the observations on this day. The emulators can be used to identify this region of parameter space. Since emulators can be evaluated very quickly, predictions and their associated uncertainty can be generated for very many candidate parameters, and all predictions that are very far from the observations can be rejected. This procedure, called “history matching”, focuses on the plausible regions of parameter space and allows more accurate emulators to be built within them. This is because in a reduced parameter space, the form of the emulator can be changed to better model the behaviour in that subspace, without being concerned about global behaviour. In such a region, previously inactive parameters may once again become active, and more illuminating insights can be made. Performing this analysis for NAME is beyond the scope of this paper, but will be covered in a second study.

In this paper it has been shown that a Bayes linear emulation approach can be used to identify source and internal model parameters that contribute most to the uncertainty in the long-range transport of volcanic ash in a complex VATD simulator. The approach presented is applicable to other complex simulators that have long computation times and many parameters contributing to the overall prediction uncertainty. This approach uses latin hypercube sampling of the plausible parameter ranges determined through expert elicitation. All parameters are varied in each simulator run and therefore information about the importance of the parameters and their interaction can be investigated simultaneously. This gives a much more realistic estimate of the uncertainty than using one-at-a-time tests and provides much more useful information to model developers and those planning observational campaigns.

Here 1700 simulator runs have been used to build 75 emulators representing the average ash column loading in regions on 14 May 2010. These simulator evaluations comprised 1500 fast simulator runs and 200 slow simulator runs. The analysis demonstrated the strength of using approximate simulators to determine the general trend of a simulator and provide plausible priors, before using a relatively small number of accurate simulator runs to refine the emulator. Bayes linear methods were used to reduce computational complexity and the need for detailed prior judgements that we may not be certain of.

For this case the most important parameters are plume height, mass eruption
rate, free troposphere turbulence levels and precipitation threshold for wet
deposition. There is also a strong negative relationship between plume height
and free troposphere turbulence and between plume height and precipitation
threshold. This means that, for example, although increasing these parameters
individually typically increases column loading, increasing both parameters
at the same time does not increase column loading as much as would be
expected looking only at the individual parameters. These conclusions should
be tested in other situations to assess how widely they hold. An assessment
of the impact of meteorological uncertainty is also required but this is
beyond the scope of this study. This information can be used to inform future
research priorities (e.g. the addition of a more complex free tropospheric
turbulence scheme which varies spatially and temporally (see

This study has shown the range of possible ash column loading distributions possible from sampling the parameter space determined by the ranges elicited from simulator experts. Only a small region of this parameter space will lead to simulations that resemble the observations on this day. Emulators can be used to identify this region of parameter space as they can be evaluated very quickly. The resulting predictions and their associated uncertainty can be generated for very many candidate parameters, and all predictions that are very far from the observations can be rejected. This procedure, known as “history matching”, focuses on the plausible regions and allows more accurate emulators to be built within them. This analysis is beyond the scope of this paper. This will form the basis of a future study but could further inform the parameter perturbations used in an operational ensemble. The approach presented here could be easily applied to other case studies, simulators or hazards. Furthermore, an ensemble of emulator evaluations could be used to produce probabilistic hazard forecasts.

The SEVIRI satellite data and NAME simulation output are
available in the University of Reading Research Data Archive at

Recall that for a Bayes linear calculation, one needs prior specifications of
expectations, variances and correlations of all unknown quantities. For the
fast emulator, the

The parameter

Using the linking model in Eq. (

Thus, the link between

Instead, plausible values for

Suppose an emulator

First, we have

The adjusted variance for

The adjustment for the residual

Then, for any

For new

From experience, polynomial terms are often suitable choices. For each of the
75 outputs, linear models were built with (i) first-order (linear) terms
only; (ii) second-order (quadratic) and first-order terms, with interactions;
(iii) third-order (cubic) and lower-order terms, with first-order
interactions. Explicitly, these are the models

The adjusted

The models with only first-order terms were inadequate in many cases,
leading to low

The second-order models were very good (

The third-order models provide no noticeable improvements over second-order models.

The second stage of emulation is the removal of inactive parameters. In the
linear model for any given output quantity, most of the parameters have
little impact. Emulators can be improved by focusing on a few important
parameters and leaving the rest out of the mean trend entirely. This involves
adding a small “nugget” of variance into the emulator, uncorrelated with
everything else. This nugget represents the fact that now the emulator does
not exactly predict the simulator output even at parameters already sampled,
because some parameters have been ignored. For example, if only parameters

A policy of stepwise elimination was followed for each output: at each step,
each parameter was removed in turn, and the change in

In a standard emulation this would conclude the removal of inactive
parameters, but since in this case the fast emulator is to be linked to the
slow emulator, it is important to check that there are no parameters being
removed that are much more important for the slow emulator. For this reason,
the same stepwise selection was performed using the 200 runs of the slow
simulator (ignoring the link with the fast emulator). This procedure selected
the same parameters in most cases, occasionally with one difference. It is
likely this is caused by small quasi-random differences in the

Finally, since parameters

Since 1700 is a large number of runs, it is reasonable to make the
simplification that the least-square estimates for

Predictions of the remaining 200 runs using the emulator built from the
first 1500 were accurate for all the outputs: an example can be seen in
Fig.

This section contains the list of steps performed in the multi-level
emulation approach used in this application and the appropriate reference
within the paper for details.

Generate a large number of fast simulator evaluations, with parameters
chosen by an appropriate space-filling design such as Latin hypercube
sampling (Sect.

Generate a smaller number of slow simulator evaluations. The fast simulator
should be evaluated at these parameters as well (Sect.

Choose basis functions

Determine and remove inactive parameters (Sect.

Estimate the “nugget” created by the removal of inactive parameters
(Appendix

Use least-square regression to estimate

Choose, through judgements and exploration, a suitable correlation function
to use for

Using the fast simulator runs, the estimates from the least-square
regression, and the chosen correlation function, construct a Bayes linear
adjustment for the fast emulator (Sect.

Use diagnostic techniques to assess the validity of the fast emulator and
tune parameters in the correlation function (Sect.

Link the fast and slow emulators (Sect.

Using this link, the fast emulator, and the slow simulator evaluations,
perform a Bayes linear adjustment for the slow emulator
(Sect.

Use diagnostic techniques to assess the validity of the slow emulator and
tune parameters in the correlation function (Sects.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Risk and uncertainty estimation in natural hazards”. It does not belong to a conference.

We thank Andy Hart from Food and Environment Research Agency for useful discussions and helping us to conduct the expert elicitation. Natalie Harvey and Nathan Huntley gratefully acknowledge funding from NERC grant NE/J01721/1 Probability, Uncertainty and Risk in the Environment. We also acknowledge the helpful comments of the editor and reviewers in improving the clarity of the paper. Edited by: Thorsten Wagener Reviewed by: Francesca Pianosi and one anonymous referee