Combining radiative transfer calculations and a neural network for the remote sensing of volcanic ash using MSG/SEVIRI

. After the eruption of volcanoes all over the world the monitoring of the dispersion of ash in the atmosphere is an important task for satellite remote sensing since ash represents a threat to air trafﬁc. In this work we present a novel method that uses thermal observations of the SEVIRI imager aboard the geostationary Meteosat Second Generation satellite to detect ash clouds and determine their mass column concentration and top height during day and night. This approach requires the compilation of an extensive data set of synthetic SEVIRI observations to train an artiﬁcial neural network. This is done by 5 means of the RTSIM tool that combines atmospheric, surface and ash properties and runs automatically a large number of radiative transfer calculations for the entire SEVIRI disk. The resulting


Introduction
Volcanic ash is a threat for air traffic, as it can damage aircraft engines and lead to flame-outs, see e.g. Miller and Casadevall (2000). The eruption of the volcano Eyjafjallajökull in 2010 had a huge impact on the European air traffic causing a massive cancellation of flights (see e.g. Budd et al., 2011;Langmann et al., 2012). As a consequence, the interest in tracking volcanic 20 ash clouds in case of a similar scenario increased. Satellite images are particularly useful in this context because of their large field of regard and often high temporal or spatial resolution. In particular, observations of the SEVIRI radiometer aboard Meteosat Second Generation (MSG) are very well suited due to its 12 spectral channels (four solar, seven thermal and a mixed channel) that scan the Earth from above the Equator every 15 minutes with a spatial sampling distance of 3 km at nadir.
Various methods to detect volcanic ash plumes in satellite images have been developed in the last years (Prata, 1989b;Francis 25 et al., 2012;Mackie and Watson, 2014;Piscini et al., 2014;Guéhenneux et al., 2015, among others). Many of them exploit thermal observations since they provide both daytime and nighttime coverage and because they show a particular signature of many volcanic ash types, the "reverse" absorption feature (Wen and Rose, 1994;Pavolonis et al., 2006). It states that the brightness temperature difference BT D(λ 1 − λ 2 ) between the SEVIRI channel centred at λ 1 = 10.8 µm and the SEVIRI channel centred at λ 2 = 12.0 µm has the opposite sign as the same BTD for ice clouds, thus enabling the identification of 30 volcanic ash contaminated pixels.
Our approach focuses on the MSG/SEVIRI sensor and aims at gaining information about volcanic ash clouds through the application of an artificial neural network (NN). Before a NN using the supervised learning approach can be used to solve this remote sensing problem, training data has to be provided for its learning phase during which the NN learns to approximate the desired volcanic ash properties from the given input data / satellite observations. For volcanic eruptions, there 35 are no documented situations in which the actual three dimensional (in space) or four dimensional (in space and time) mass distribution of the volcanic aerosol is known. In-situ measurements are rare, can probe only a limited part of the ash clouds at selected locations and at particular times, and are difficult to compare with passive satellite observations. Training data have been composed by combining satellite images with models/simulations/retrievals, e.g. Gray and Bennartz (2015, the polar orbiting MODIS reflectometer and HYSPLIT trajectories), Picchiani et al. (2011, MODIS and look-up tables), Piscini et al. 40 (2014, MODIS and other retrievals), Zhu et al. (2020, SEVIRI and CALIOP lidar observations). As an alternative, simulated satellite data can be used. These consist of simulated brightness temperatures (BTs) a satellite instrument viewing the Earth's atmosphere would measure in given situations with and without generic volcanic ash layers. Using these synthetic obsevations, a suitably designed NN can be applied to generalize the relationship between input (simulated satellite observations) and output (volcanic ash properties). After training, the NN should be able to derive the latter for a given set of measured BTs. This 45 principle has been utilised for a retrieval of the mass column density and the cloud top height of volcanic ash. The resulting algorithm, called VADUGS (Volcanic Ash Detection Using Geostationary Satellites), is presented in this manuscript. It has been developed in the aftermath of the impressive eruption of the Icelandic Eyjafjallajökull volcano in 2010 according to the The goal of the VADUGS retrieval algorithm is the derivation of volcanic ash cover (VAC), volcanic ash top height (VATH) and volcanic ash mass column concentration (VAMC).
The main advantages of using simulated satellite observations for the training of a volcanic ash retrieval is that 1) all atmospheric properties, including of course volcanic and liquid/ice cloud properties are exactly known, 2) the data set can 85 cover the entire SEVIRI disk and all possible combinations of meteorological clouds, volcanic ash, time of the year / of the day and viewing geometries, 3) in principle, various ash types might be included. The main drawback is that radiative calculations, although realistic, might still not encompass the full complexity of real observations.

MSG/SEVIRI
Meteosat Second Generation (MSG) is a series of four satellites operated by EUMETSAT that has become operational in 2004 90 and lasts until now. The satellites, MSG1 to MSG4, have been renamed after launch to Meteosat-8 to . Their main instrument is the Spinning Enhanced Visible and Infrared Imager (SEVIRI). This radiometer has 11 spectral channels from the visible to the infrared with a spatial sampling of 3 km at the subsatellite point and a broadband High Resolution Visible (HRV) channel with a spatial sampling distance of 1 km at nadir. The thermal channels are centred at 6.2 µm and 7.3 µm (strong water vapour absorption), 8.7 µm, 10.8 µm and 12.0 µm (window channels), as well as 9.7 µm (ozone absorption) and 13.4 µm 95 (carbon dioxide absorption). The operational service provides full disk Earth data every 15 min and the rapid scan service

VADUGS: an artificial neural network 100
An artificial NN (Rumelhart et al., 1986) consists of a set of neurons that exchange information with each other with the goal to derive a set of output variables given a set of known input quantities. The technical implementation of the NN in this study follows very closely Kox et al. (2014) since the NN developed there proved to be very effective in detecting ice clouds and in the determination of their properties.
The neurons in the NN are structured in three layers: (a) the input layer, (b) one hidden layer, and (c) the output layer.

105
Correspondingly, neurons in these layers are called input neurons, hidden neurons and output neurons. Input neurons transport the information used for the detection of ash and the determination of its properties into the NN, i.e. each observation and ancillary data is assigned a single neuron. Output neurons contain the information about the desired ash properties, while the hidden neurons collect, combine and process data forwarded by the input neurons and fire the results to the output neurons, where the output quantities are produced. The number of hidden neurons is selected in analogy to the ice cloud retrieval COCS 110 (Kox et al., 2014) that proved to yield accurate results for the remote sensing of ice clouds with the same spaceborne sensor MSG2/SEVIRI and with 10 input and two output variables. As in Kox et al. (2014), 600 neurons for the hidden layer have been adopted as a trade-off between accuracy and CPU time consumption. Similarly, VADUGS uses 17 input variables (see Sect. 4.2) and two output variables, VAMC and VATH. A feed-forward NN is implemented where all connections between neurons are in the forward direction (from input layer to output layer through the hidden layer), while connections within a layer or backward connections are forbidden. A numeric tunable weight is assigned to each neuron connection. Every neuron processes the output from all neurons in the preceding layer weighted with the corresponding connection weights through a non-linear activation function, the logistic function in our case. Thus, the NN can learn patterns and approximate functions in a sort of multi-dimensional non-linear fitting by means of a limited number of neurons. Training the NN determines the value of all the connections weights through the backpropagation technique. The input variables are fed into the NN, which computes 120 a vector of output variables using its current weights. The total squared difference error between the estimated output and the corresponding vector of expected output is propagated backwards through the NN to update each weight using gradient descent in order to minimise the total error. The training stops when no reduction of the sum of the quadratic deviations is observed.
With this technique an exhaustive set of training examples containing both input and output variables must be available.

Validation metrics 125
The probability of detection (POD) and the false alarm rate (FAR) are used as validation metrics to assess the accuracy of the detection performance of VADUGS, i.e. of VAC, while the mean percentage error and mean absolute percentage error are used to measure the accuracy of VATH and VAMC. These metrics are used later on in the validation section (Sec. 4.4).
The probability of detection (POD) measures how efficiently VADUGS detects volcanic ash. It is given by

130
where the number of true positives, N T P , are all points correctly classified as ash and the number of false negatives, N F N , all missed ash clouds (see Tab. 1). The denominator, N T P + N F N , is thus the total number of points with ash clouds. The false alarm rate (FAR) measures how large fraction of the ash free points are falsely classified as being ash by VADUGS. It is given 135 where the number of false positives, N F P , are all points falsely classified as ash, i.e. the false alarms and the number of true positives, N T N , all points correctly identified as ash free. The denominator, N F P + N T N , is thus the total number of points with no ash clouds. The corresponding simulation data (Sect. 3.1.7) are used as a reference when calculating the POD and FAR. Table 1 clarifies the quantities used to calculate the POD and FAR.
The mean percentage error (MPE) and mean absolute percentage error (MAPE) are used to measure the accuracy of the 140 retrieved ash quantities retrieval with respect to the truth. The MPE is given by where T is the truth (Sect. 3.1.7, simulated observations), E the estimated value by VADUGS and N the number of simulations used. The MPE gives information about the direction of the deviations, i.e. whether VADUGS tends to overestimate or and gives information about the average magnitude of the errors relative to the expected truth values. A value of 0.0 % means no deviation from the truth and a perfect correlation.
Finally, the standard deviation (RMSE, root mean squared error) of VADUGS with respect to the simulated truth is given by

Simulated satellite observations
In this section the details of the radiative transfer calculations are explained.

Radiative transfer calculations
Accurate and realistic one dimensional (1D) simulations of the satellite observations with realistic and representative atmo-155 spheric conditions are crucial for the successful training of the NN and especially for its successful application to real data.

Scene selection
The MSG/SEVIRI grid consists of more than 13 million pixels, including space. In order to homogeneously cover the full Earth disk with a moderate amount of simulations and in view of the coarser spatial resolution of the atmospheric data from Atmospheric gas absorption properties are computed through the low resolution band models developed for the LOW-TRAN 7 atmospheric transmission code (Pierluissi and Peng, 1985) that uses an exponential sum fit with a resolution of 20 cm −1 . We implemented 15 spectral grid points for each MSG2/SEVIRI thermal channel (Sect. 2.1) under consideration of the corresponding spectral response function. The code was adopted from SBDART (Ricchiazzi and Gautier, 1998).

180
In IFS, a cloud-overlap algorithm computes the relative position of clouds across model levels and considers both cloud cover and cloud water content to realistically represent clouds on the spatial scale of the model (ca. 32 km for a spatial resolution of 0.25 • in latitude and longitude). The maximum-random overlap assumption (Morcrette and Fouquart, 1986) used by ECMWF, particularly relevant for shortwave but also for longwave radiation, states that adjacent layers overlap maximally, while cloud layers separated by cloud-free layers overlap randomly. This procedure has to be translated into a 1D cloud structure (variability 185 only in the vertical direction) since the 1D radiative transfer model can only handle cloud layers that are completely cloudy and because the spatial resolution of the satellite pixels (3 km at nadir, see Sect. 2.1) is a factor of 10 smaller than the model resolution. To this end, the layer-dependent cloud fraction is used to set up a random cloud structure consisting of 0 or more cloud layers, so that for each atmosphere layer a cloud fraction value of 0 or 1 is given. Adjacent cloudy layers are grouped and it is assumed that the cloud fractions overlap as much as possible in each group. The status of each layer is determined 190 by the random position for an intersection through the corresponding group: All layers of the group are completely cloudy, if their cloud cover fraction exceeds the intersection position. Figure 1 shows an example for the determination of cloud layers.
Two adjacent layers with cloud fractions of 37.5% and 50% make up the first cloud group. If the value of the intersection chosen at random is 25%, marked with a bold vertical line, then both layers have a higher cloud cover value and therefore are cloudy, indicated by the bold layer number to the left. For the second group, with layers 5 to 8, three layers are cloudy for an 195 intersection value of 50%, while layer 10, being the only one in the third group, has a value of 18.75% below an intersection value of 75%. Cloud effective radii for each cloud layer are computed using two different parameterizations, one for water clouds and one for ice clouds. The applied water cloud parameterization is described by Bugliaro et al. (2011): For layer i, r eff,L,i is the effective radius, c L,i the liquid water content, N = 150 · 10 6 the droplet density, and ρ H2O,i = 1000 kg m −3 the water density at 4 • C.
T i is the temperature and c I,i the ice water content. Typographyical errors in the formulas of both publications were corrected.
To avoid the implementation of perfect clouds, noise is introduced by the multiplication of the effective radii with uniform random values between 0.9 and 1.1.

210
To determine the optical properties of water clouds the parameterisation by Hu and Stamnes (1993) is applied, with effective radius limit values of 2.5 µm and 60 µm. Optical properties of ice clouds are calculated using the parameterization of Yang et al. (2000); Key et al. (2002), extended by B. Mayer to cover the infrared spectral range (see Emde et al., 2016), for five ice particle habits (solid column, hollow column, aggregate, rosette-6, plate) encompassing the range of 2.85 to 108.10 µm. Ice crystal shapes are selected randomly. The method has been developed over many years; it has been applied to mimic polar stratospheric clouds (Bass, 2003), sea salt aerosols (Irshard et al., 2009), Saharan dust (Peters et al., 2007) and later by Reed et al. (2017Reed et al. ( , 2018 to volcanic ash. The Aerosol Refractive Index Archive containing these spectral refractive indices and other literature values can be found at 230 http://eodg.atm.ox.ac.uk/ARIA/index.html. A short description of this method used to generate the Eyjafjallajökull refractive index used in this paper follows, more detail are contained in the references. The fresh Eyjafjallajökull sample used was collected at the ground by Dr. Evgenia Ilysinkaya in Iceland on 17 April 2010 at 18:20 local time downwind and~6 km from the source. The sample is very fine as the eruption was phreatomagmatic at the time, and should be a good analogue to the long range transported ash at around this time (see also details of the aerosol cell given below).

235
A brief overview of the main experimental components is given in Fig. 2. The ash sample is first dried to remove any excess water and transferred for dispersal in a flow of nitrogen buffer gas. The aerosol is then injected into the 75 l aerosol cell and multi-pass optics are used to measure the optical transmission in the range 526-30,000 cm −1 with a Bruker FTS and detector with a 1.9 cm −1 resolution. A description of the aerosol cell can be found in McPheat et al. (2001), while the dispersal method that lofted the aerosol in the cell (particularly efficient for aerosol sizes smaller than 1 µm) is described in Reed et al. (2017). Aerosol size distributions are measured via different methods, an optical particle counter (radii 0.15-10 µm) and a scanning mobility particle sizer (radii 0.005-0.44 µm). For a detailed experimental description of the experiment, see Reed et al. (2017).
Gas absorption features in the infrared spectra are minimised by using a nitrogen buffer gas but some residual carbon dioxide and water lines still remain. These lines are removed by a line-by-line gas retrieval of the transmission spectra using the Reference Forward Model (RFM Dudhia, 2017), leaving the aerosol transmission signal.

245
In this experiment the forward model represents the aerosol cell transmission T (λ): where β(λ) is the volume extinction coefficient, x the path length through the test cell at the wavelength λ. The extinction cross-section σ ext can be calculated, if we assume a particle scattering model and know the particle size distribution, from 250 where r is the particle radius, m(λ) the complex refractive index and n(r)dr the number of particles per unit volume between radii r and r + dr. A damped harmonic oscillator model is used to represent the complex refractive index m(λ). Information  Deguine et al. (2020), similarly to those used here. With respect to m i , the peak in Fig. 3 is lower than in the data given by Reed et al. (2018) and Deguine et al. (2020).

265
Considering the spectral response functions of MSG2/SEVIRI2, Fig. 3 shows the expected "reverse" absorption feature (Wen and Rose, 1994;Pavolonis et al., 2006) mentioned in the introduction: Higher values of m i in the 10.8 µm channel with respect to the 12.0 µm channel indicate stronger absorption and thus smaller BTs in the first spectral interval than in the second one. Noteworthy is also the fact that absorption is higher at 8.7 µm than at 12.0 µm and that the maximum absorption takes place very close to the peak of the ozone channel centred at 9.7 µm.

270
Optical properties are computed with an early version of MOPSMAP (Gasteiger and Wiegner, 2018) based on the T-matrix code of Mishchenko and Travis (1998) and the improved geometric optics method (IGOM) code by Yang et al. (2007). Since volcanic ash is usually aspherical (e.g. Riley et al., 2003), we assume prolate spheroids with a distribution of aspect ratios from 1.2 to 5 with a median value around 2.1. Furthermore, two logarithmic normal size distributions are assumed with a standard deviation of 2. The distributions extend from 0.08 µm to 12.1 µm and their mode radii equal 0.4 µm and 2 µm, respectively.

275
The size distribution range is thus in good agreement with literature (e.g. Chuan et al., 1981;Hobbs et al., 1981;Prata, 1989a;Turnbull et al., 2012). Mass density amounts to 2600 kg m −3 (e.g. Wilson et al., 2012). Since the IGOM code doesn't support refractive indices smaller than one, the real part of the refractive index had to be limited to the supported range. The effect of this assumption on the mass extinction coefficient is investigated for spherical particles in Appendix A.

Surface properties 280
The relevant surface property needed for realistic radiative transfer simulations in the infrared spectral range is the surface emissivity. According to the time selected for the ECMWF profiles (Sect. 3.1.2) the suitable monthly data from the surface emissivity data set is selected to ensure consistency. For land pixels it is obtained from Seemann et al. (2008) for the year 2010, where monthly surface emissivity maps are available globally at ten wavelengths located at 3.6, 4.3, 5.0, 5.8, 7.6, 8.  The software package libRadtran (Emde et al., 2016;Mayer and Kylling, 2005) is used to realistically simulate satellite radiances and to compute results of satellite measurements in the form of BTs. All radiative transfer simulations make use of a C version of the 1D solver DISORT (Stamnes et al., 1988;Buras et al., 2011) with 16 streams. Input data relevant to the radiative transfer model in the thermal range have been described in the previous sections: satellite zenith angle, surface properties like temperature and surface emissivity, vertical profiles of temperature and gas concentrations, water and ice cloud properties like 305 height, water content, and particle effective radius, as well as aerosol layer properties like height, aerosol optical properties, and mass concentration. Gas absorption is considered through LOWTRAN (see Sect. 3.1.2).

RTSIM
To manage and automate the generation of data sets, the program RTSIM was developed. RTSIM is coded in Python 3 and based on a corresponding module providing classes and functions as an interface to the underlying data base and the radiative transfer 310 model. After start-up, all input data is accessed via its data management class. All environment parameters are prepared as input and suitable configuration commands are passed to the radiative transfer model. RTSIM supports independent processing modules, which can be used simultaneously, as well as communication through client applications during runtime. The radiative transfer solver is called two times (with and without ash profile) when the atmosphere (Sect. 3.1.2) does not contain clouds. When the atmosphere contains cloud layers the radiative transfer solver is called four times (with and without ash profile for the atmosphere with and without clouds). In case ice clouds are present, ice crystal habit is selected randomly but is the same for the two calculations with clouds (with and without ash). Every call to the radiative transfer solver produces 330 seven thermal SEVIRI observations for the 41 viewing geometries.

The VADUGS retrieval algorithm
This section describes the training data set used for the VADUGS implementation and the NN and its training.

The training data set
The training data set is compiled using the RTSIM tool (Sect. 3.1.7). To optimally cover different seasonal conditions we select

345
Although the entire simulation data set provides a realistic and extensive basis for the training of a NN, three additional constraints have been implemented to make sure that a clear relationship between input and output data is provided to the NN.
Thus, the results of the simulations have been filtered according to the following criteria: If one of these conditions is fulfilled, the corresponding VAMC and VATH are set to zero. The first one aims at identifying only pixels that are distinct from ice clouds, whose typical signature in split window channels is a negative BT D(10.8 − 12.0) (e.g. Inoue, 1985), while the second and third condition shall exclude all low level liquid water clouds (BT D(8.7 − 10.8) ≥ 0 K is typical for the ice phase, e.g. Baum et al., 2000).

370
The BTDs containing window channels alone (8.7, 10.8 and 12.0 µm channels) are meant to yield the physics of thin layers in the atmosphere. Of course, other BTDs, like the most used BTD(10.8-12), can be implicitely obtained by the NN through combination of the available ones.
BTDs with the CO 2 channel (centred at 13.4 µm) are supposed to transport direct information about ash layer height, since the vertical weigthing function of this channels has a broad peak in the troposphere (and for this reason is used for CO 2 slicing, 375 see e.g. Menzel et al., 1983). Finally, BTDs with the ozone channel (centred at 9.7 µm) carry information about the ozone yearly cycle and its geographic distribution. Due to large possible variations in stratospheric ozone columns during the course of the year and to the considerable variability in surface temperatures, especially at high latitudes, this channel must be handled with care in order to prevent misdetections, as was for instance the case in (Ewald et al., 2013) with respect to cirrus cloud detection. Although the NN is able to induce relationships between input variables during training, the provision of brightness 380 temperatures differences is supposed to facilitate and speed up the learning process since these quantities explicitly contain the physics of the problem. However, the usage of a NN prevents us from defining a priori thresholds to the BTDs like e.g. in Yu et al. (2002) and Francis et al. (2012). Furthermore, skin temperature (from ECMWF, Sect. 3.1.2) together with a land/sea flag are selected as input in order to describe surface emission properties. Surface emissivity is neglected here since its variability is not very strong and because it is difficult to obtain daily/hourly values of this quantity for a possible near real-time application.

385
Finally, viewing zenith angle helps considering the slant path of radiation through the atmosphere. This makes a total of 17 input variables.
Output variables are, as already mentioned, VAMC and VATH (two output variables), the topology of the VADUGS NN and the backpropagation procedure have been presented in Sect. 2.2. The final training data set described above has been randomly mixed and fed into the NN.

Applying the neural network
For the application of the NN, BTs from all thermal channels are needed according to the previous section, but also a land/sea flag, that we consider fixed in time, and a viewing zenith angle map, that can also be considered as fixed if the spacecraft is located above 0 • E over the equator and fluctuations around this point are neglected. Nevertheless, skin temperature must be provided still. This quantity has been extracted from ECMWF analysis data for the training (Sect. 3.1.2), but it might be in towards Great Britain, with some small patches reaching the continent. On 17 May 2010 16 UTC a relatively large ash cloud is located above the North Sea, where it has been probed by both the UK FAAM aircraft (Marenco et al., 2011) and the German DLR Falcon (Schumann et al., 2011). In general, dark red colours are typical for the ash plume close to the vent indicating high ash concentrations, while the red colours become fainter further away from the volcano, as expected through sedimentation and/or interactions with water clouds. From time to time single ash contaminated pixels are found: in many cases, through the 415 visual evaluation of the full 15 min temporal evolution of the ash, they can be ascribed to thin diluted ash cloud patches, in some cases they seem to be probably false alarms. Thus, the algorithm is able to detect the Eyjafjallajökull ash cloud for which it has been developed in a very plausible way.

Validation
Two approaches are selected for validation of the NN. First, the NN's performance is evaluated against simulated observations 420 to assess how well the retrieval has learnt the relationships between input and output variables contained in the training data set.
The second approach consists in the evaluation of the NN against CALIOP observations of volcanic ash to assess the quality of the retrieval in real situations.

Simulated validation data set
A second data set is simulated along the lines of the training data set (Sect. 4.1). Since most of the input data samples are

Ash detection
The detection of volcanic ash clouds is performed by applying a threshold to the mass load retrieval. Although the values of POD and FAR for a threshold of 0.1 g m −2 are very good, it has to be noticed that the validation data set is not well balanced with respect to ash loaded and ash free samples, since the latter make up a very large fraction of them. In addition, many ash free samples also show BTD(10.8-12.0) values close to 0 K. While negative BTDs are thought to stem mainly from ash clouds, some of them also contain meteorological clouds. Thus, to further simplify the task of the NN 445 we introduce an additional a-priori filter: Pixels satisfying this empirical condition are set to VAMC=0 g m −2 and VATH is undefined. This condition applies to 2,789,235 samples of the validation data set, 2,777,879 of them being already ash free and 11,356 ash loaded, with ash column concentrations ranging from 0 to 5 g m −2 . This ensures that 81.1 % of the ash free samples are already correctly classified before the 450 application of VADUGS, at the cost of 11.3 % of the ash loaded samples being missed. With the help of this additional filtering, the overall POD sinks to 0.84 but the FAR also sinks to 0.05.
Finally, considering the POD as a function of true VAMC in Fig. 9 shows that VADUGS, even if the concentration threshold of 0.1 g m −2 is applied, is able to detect 60 to 70 % of the ash samples with true VAMC≤0.1 g m −2 . POD increases with VAMC and for VAMC = 0.5 g m −2 it already reaches its maximum around 90 %. On the other hand, a POD of 100 % is never reached, even for large VAMC, both because these cases might resemble thick (ash/meteorological) clouds or because they might be affected by the presence of water/ice clouds.

Validation of ash concentration and height against simulated observations
For the validation of VAMC and VATH we consider only ash-laden samples as a function of µ = cos ϑ, the cosine of the satellite viewing zenith angle. Six subsets are defined, with all samples included in subset #0, samples with 0 < µ ≤ 0.2 Results of this validation are summarised in Tab. 2 for three subgroups: where true VAMC is a) smaller than 0.5 g m −2 (only thin ash layers) or b) smaller than 2.0 g m −2 (thin and thick ash layers) or c) smaller than 5.0 g m −2 (all simulated ash layers angles produce a MAPE of 180 % and MPE of 104 %. Thus, VADUGS works best for moderate viewing zenith angles but always struggles with the determination of the correct VAMC, usually leading to an overestimation. Considering the first two subgroups (VAMC < 0.5 g m −2 and VAMC < 2.0 g m −2 ), we see that for thin ash (VAMC < 0.5 g m −2 ) uncertainties (MAPE, MPE) are largest, indicating that the determination of VAMC for thin ash layers is most difficult. In general, MAPE and MPE for VAMC < 2.0 g m −2 and VAMC < 5.0 g m −2 are comparable, with slightly better results for the latter. However, RMSE are 475 (much) lower (between 0.39 and 0.58 g m −2 ) and the correlation coefficients are higher (up to 0.39) for VAMC < 2.0 g m −2 , meaning that the most reliable results are for ash columns larger than 0.5 g m −2 and smaller than 2.0 g m −2 . Again, subsets #3 and #4 are the ones with the lowest MAPE and MPE not only for the full data set but also for the other two VAMC subgroups.
For VATH, validation results are collected in Tab. 3 where the dependency of the VADUGS errors on true VATH for three 480 intervals -true VATH < 8 km, 12 km and 14 km -is described. Again, we start the discussion with the last subgroup (VATH < 14 km) that contains the full validation data set. The general features correspond to those for VAMC. In particular, the large scatter of VADUGS values is evident, although correlation coefficients are higher for VATH (up to 0.54) than for VAMC.  i.e. when going from subset #2 to #5, with lowest underestimations for low ash clouds (VATH < 8 km). In subgroup #2 MPE for VATH < 8 km amounts to -42 %, to -25 % for subset #3, -9 % for subset #4 and +17 % for subset #5. In the three subsets #3-#5 correlation is highest for the low VATH regime (Pearson coefficients between 0.28 and 0.33). Nevertheless, MAPE is 495 always close to 60 % and reaches 70 % for subset #5.
Since thin ash clouds are difficult to detect and the determination of their properties is complex, we investigate the accuracy of VATH when VAMC is larger than 0.3, 0.9 and 2.0 g m −2 and VATH is below 12 km in Tab. 4. In almost all cases, Pearson correlation coefficients increase with increasing VAMC threshold, i.e. it is easier for VADUGS to assess VATH when the ash cloud is thick. For VAMC > 2.0 g m −2 , correlation coefficients larger than 0.7 can be achieved. At the same time, MAPE lies 500 between 46 and 59 % and MPE between -44 and +23 % for the clouds with VAMC > 2.0 g m −2 . Furthermore, underestimation of VATH in subsets #1-#3 turns into an overestimation in subsets #4 and #5. However, for all viewing angle subsets and all VATH subgroups MAPE is always relatively high between 40 and 65 %.
Generally, VADUGS struggles with the determination of the correct mass load and top height when applying it to observations filtered according to Eqs. 13. However, with a more stringent filtering that identifies simulated ash contaminated (that is meant to isolate cases where the ash signal is stronger in the simulated observations) and considering the findings 510 above, VADUGS performs significantly better. Figure 10 shows scatter plots between true and retrieved VAMC for true VAMC in the range 0.0-2.0 g m −2 (see Tab. 2) for this reduced validation data set. Pearson coefficients are clearly higher than with Winker et al., 2009) version 4.10 level 2 aerosol products including data on stratospheric ash layers (Kim et al., 2018) are used.
They are obtained from CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) retrievals. The final spatial resolution  15th, 1830-1840 -59.6 to -40.4 -57.5 to -66.0 300 16th, 1551-1605 -48.8 to -39.3 -24.4 to -27.7 162 16th, 1729-1743 -60.9 to -44.6 -42.7 to -50.7 187 17th, 0300-0313 -40.6 to -62.0 -27.7 to -37.8 251 17th, 1455-1510 -44.4 to -37.3 -12.1 to -14.4 82 18th, 0204-0218 -35.4 to -64.4 -12.2 to -25.9 199 selected subset 1081 (VAMC > 0.05 g m −2 for CALIOP/VADUGS) after processing is 5 km horizontally and 60 m vertically. Samples are discarded if the extinction quality control flag (qc) is neither one (indicating that the lidar ratio is directly retrieved from the data) nor zero (initial lidar ratio leads to stable extinction retrievals). For the remaining samples the optical depth at 532 nm due to volcanic ash is derived from the extinction profile 525 using only the samples classified as ash. For samples with qc=0, a correction is made to take care of the difference between the default lidar ratio of ash (44 sr) used in version 4.10 and that obtained from the direct (qc=1) retrievals (58 sr). The mass loading is estimated from the optical depth using a mass extinction coefficient of 0.69 m 2 g m −1 Winker et al., 2012), whereas the highest ash layer determines the ash cloud height.
For the comparison, the Puyehue-Cordón Caulle eruption in 2011 is considered: six daytime and nighttime orbits of CALIPSO 530 over ash clouds in the southern Atlantic between 15 and 18 June 2011 are collected (Tab. 5). VAMC range between 0 g m −2 and 1.7 g m −2 and VATH between 10 and 15 km. Fig. 12 shows an example with the CALIPSO data (blue) plotted alongside the corresponding VADUGS retrievals (red); VAMC from CALIPSO has an uncertainty of a factor~2 (light blue). Note that this reference data set is rather limited in the number of samples as well as in the variability of the observations, which are only of a single eruption, only above sea, can be assumed to have rather similar atmospheric conditions, and with ash layers only 535 above and around the tropopause. Nevertheless, this eruption represents a good testbed for VADUGS that has been developed for Eyjafjallajökull.
The CALIOP retrievals are compared to VADUGS results for the temporally closest SEVIRI2 image. A parallax correction is applied to the latter such that the top of the ash layer is observed.
The example in Fig. 12 shows that the VAMC retrieval is generally in agreement considering the uncertainty of the CALIOP 540 data; a tendency of VADUGS towards overestimations is visible, as discussed in the previous section. The cloud top derived by VADUGS is lower than the reference values when the ash layer is thin. A reasonable VATH retrieval is performed at one of the thicker parts of the observed ash cloud (at a latitude around -48 • ). This bahaviour basically confirms the analysis in the previous section based on simulated satellite observations. threshold on 0.1 g m −2 used before has been lowered in order to increase the number of ash conatminated pixels). The scatter plot Fig. 13  The histogram in Fig. 14 shows the VATH distribution retrieved from CALIOP and VADUGS. The CALIOP measurements peak around 12 to 13 km. The distribution of VADUGS-retrieved VATH peaks at 0 km and has a flank reaching up to 19 km, with a notable minor peak at about 9 to 12 km. 86 % of the retrieved VATH are < 8 km, and only 14 % are larger. This underlines the results of the example (Fig. 12) that VADUGS is able to retrieve the correct VATH for thick ash clouds, but 555 generally underestimates it. However, one has also to consider here the fact that VATH determination above or close to the tropopause is particularly challenging due to the fact that the atmospheric temperature profile can be constant or increase with height, while below the tropopause, where VADUGS has been mainly trained for, temperature decreases with height.

Implementation at DWD
VADUGS has been implemented at DWD and runs operationally 24/7. Like in the example in Fig. 6 the ice cloud algorithm 560 COCS (Kox et al., 2014) has been first implemented to identify ice cloud pixels that may shade the ash from the satellite's view. This algorithm is currently being replaced by its successor (Strandgren et al., 2017). VADUGS and COCS are driven by a shell script, handling the input and output data and the call of the algorithms. The retrieval results are rasterised and saved as cf conform netcdf. The data transfer rate to the cockpits of aeroplanes is limited. Thus, the raster data is polygonised in order to keep the essential information but to reduce the size of the data to be transferred to the cockpit. The VADUGS 565 results can be also visualised at the meteorological workplace of DWD and are therefore available for the forecasters at the central prediction centre and the regional aviation weather advice centres of DWD. ecFlow is used for the operational 24/7 call and monitoring of the job. Ecflow is a workflow package which has been developed to run a large number of programs in a controlled environment, providing restart and monitoring capabilities (via web page or email). It is used at DWD to run all operational suites on the high performance computer. ecFlow is developed and maintained by ECMWF.

570
VADUGS has been trained with surface temperature from the IFS model operated by ECMWF (Sect.3.1.2). However, for the 24/7 operation at DWD surface temperatures from the DWD weather model ICON (ICOsahedral Non-hydrostatic Zängl et al., 2015) are used. Every model has its own spatial bias characteristics, depending beside other factors on the surface type.
The operational verification at DWD indicates that differences in the surface temperature could be in the order of 1-2 K. Thus, a sensitivity study was performed in order to estimate the effect of uncertainties in surface temperature on the ash concentration 575 and top height derived from VADUGS when surface temperatures increased by +1 K or +2 K. Observations on 10 May 2010 during the Eyjafjallajökull eruption in the triangle enclosed by the Faroe Islands in the South, the Shetland Islands in the West and the northernmost tip of Great Britain were reprocessed for this purpose (Fig. 15, top). This day is characterized by a thick ash plume over the North Atlantic ocean moving from the North-West to the South-East with small concentrations over the land surfaces. The differences in VAMC between the normal run and the run with surface temperatures increased by +1 K and +2 K 580 are about 0.015 g m −2 (<1%) and 0.03 g m −2 (<3%) for a +1 K and +2 K increase in surface temperature (Fig. 15,

Conclusions
In the aftermath of the Eyjafjallajökull eruption in 2010 the VADUGS algorithm for the remote sensing of volcanic ash from thermal MSG/SEVIRI observations (Kox et al., 2013) has been developed at the Deutsches Zentrum für Luft-und Raumfahrt (DLR, German Aerospace Centre) using a machine learning approach leaning on a corresponding algorithm for ice cloud remote sensing (Kox et al., 2014). This paper illustrates the VADUGS development and assesses the performance of the 595 current version that is run operationally at the Deutscher Wetterdienst (DWD, German Weather Service) since 2015.
Unlike other spaceborne ash retrievals, VADUGS is based on a NN trained with simulated thermal satellite observations. In this approach the true ash properties (ash mass column concentration and ash cloud top height) to be retrieved by the algorithm are known exactly and the geographic and temporal distribution of the ash observations can be selected arbitrarily to allow for an application of the method to the entire MSG disk at each time of the day and of the night, and during each month of 600 the year. For the creation of the training data set, the tool RTSIM has been implemented to automatically combine different surface and atmospheric quantities based on historical numerical weather model results and other data sets as inputs for the radiative transfer simulation code libRadtran. The latter then produces realistic thermal observations for various meteorological conditions and with or without liquid and ice water clouds. RTSIM can be easily used and adapted to produce in future new additional training data sets.

605
Ash detection is performed by setting a threshold to the retrieved mass column concentration: Pixels with BTD(10.8-12.0) ≤ −0.6 K (pre-filtering) and retrieved VAMC > 0.1 g m −2 are assumed to be ash contaminated. This results in a POD of 0.84 and a FAR of 0.05. Furthermore, VADUGS can detect 60 to 70 % of all ash loaded samples with true VAMC smaller than 0.1 g m −2 .
However, VAMC from VADUGS scatters considerably and errors (MAPE) between approximately 80 and 300 % are observed.
Furthermore, the best results are obtained for moderate viewing zenith angles, corresponding to volcanic ash clouds in mid-latitudes. For VATH, a large scatter (3-5 km) is produced as well, although MAPE is always smaller than 70% and MPE is even smaller (-10 to -20 % for viewing zenith angles smaller than approximately 50 • ). Usually, low ash clouds with true VATH < 8 km are retrieved with higher correlations and smaller underestimations of -9 % for viewing zenith angles approximately between 40 • and 50 • . However, when VAMC is large (>2.0 g m −2 ), correlation usually increases for all clouds, thus enabling a better distinction between low and high ash clouds. This emerges also from the comparison with the spaceborne CALIOP lidar and RMSE (0.41 g m −2 ) show that VADUGS is able to distinguish between thinner and thicker ash pixels although cloud top height is usually strongly underestimated.
In general, the application of an ice cloud retrieval like in e.g. Kox et al. (2014) or Strandgren et al. (2017) is recommended in order to identify pixels where volcanic ash cannot be observed due to the presence of high clouds.

620
The principles of VADUGS can been applied for the creation of retrievals for other sensors with corresponding thermal channels, e.g. de Laat et al. (2020) adapted VADUGS to Himawari. Similar channels can also be found on the imager aboard GOES-R (Schmit et al., 2005), Fengyun-4A (Yang et al., 2016) or the upcoming Meteosat Third Generation spacecraft (e.g. Durand et al., 2015).
VADUGS is developed specifically for the Eyjafjallajökull 2010 eruption. This motivates the use of corresponding volcanic 625 ash properties; specifically, only the refractive index of Eyjafjallajökull ash is applied. However, satellite retrievals are sensitive to the refractive index (Wen and Rose, 1994;Western et al., 2015;Ishimoto et al., 2021), which can vary significantly between different volcanos (Reed et al., 2018;Deguine et al., 2020). Just recently, methods to derive volcanic ash refractive indices of different ash types have been presented (Prata et al., 2019;Piontek et al., 2021c). Furthermore, ash microphysics (shape and size) also affects scattered and emitted radiation and can vary from one volcanic eruption to the other and also as a function of 630 the distance from the source. Using the data set by Piontek et al. (2021c) constitutes a major improvement of the successor to VADUGS (Piontek et al., 2021b,a).
The operational application of VADUGS at DWD can thus provide airlines and other users with a useful spaceborne volcanic ash product that can be used to monitor volcanic ash evolution.
Appendix A: Refractive index

635
As explained in Sect. 3.1.3, the real part of the refractive index of Eyjafjallajökull ash had to be cut at 1 in the spectral range between approximately 8 and 10 µm for the calculation of the corresponding extinction coefficients. This is due to the fact that the IGOM code does not support m r < 1. In order to estimate the impact of this assumption on the computed extinction coefficients two calculations, one with cut and one without, have been performed for spherical particles. The result in Fig. A1 shows that only in the spectral range between 8 and 9.2 µm differences exist. Thus, only the 8.7 µm channel is affected and in 640 this sensor band the inaccuracy of the mass extinction coefficient amounts to 1 % at most.