Ground motion prediction maps using seismic microzonation 1 data and machine learning

. Past seismic events worldwide demonstrated that damage and death toll depend on both the strong ground motion 9 (i.e., source effects) and the local site effects. The variability of earthquake ground motion distribution is caused by local 10 stratigraphic and/or topographic setting and buried morphologies, that can give rise to amplification and resonances with 11 respect to the ground motion expected at the reference site. Therefore, local site conditions can affect an area with damage 12 related to the full collapse or loss in functionality of facilities, roads, pipelines, and other lifelines. To this concern, the near 13 real time prediction of damage pattern over large areas is a crucial issue to support the rescue and operational interventions. A 14 machine learning approach was adopted to produce ground motion prediction maps considering both stratigraphic and 15 morphological conditions. A set of about 16'000 accelometric data and about 46'000 geological and geophysical data were 16 retrieved from Italian and European databases. The intensity measures of interest were estimated based on 9 input proxies. The 17 adopted machine learning regression model (i.e., Gaussian Process Regression) allows to improve both the precision and the 18 accuracy in the estimation of the intensity measures with respect to the available near real time predictions methods (i.e.,


Introduction
Spatial distributions of ground motion induced by seismic events should be properly estimated to support risk mitigation policies over large areas.Moreover, seismic risk analysis, extended to spatially distributed anthropic systems, presents new challenges in characterising the seismic risk input, regarding the spatial correlation of the ground motion values.The ShakeMaps (Wald et al., 2021), provided by the US Geological Survey, is used globally for post-earthquake emergency management and response, engineering analyses, financial instruments, and other decision-making activities.Moreover, in Italy post-event ShakeMaps are delivered by the National Institute of Geophysics and Volcanology (Michelini et al., 2019, https://doi.org/10.5194/nhess-2021-282Preprint.Discussion started: 4 October 2021 c Author(s) 2021.CC BY 4.0 License.http://shakemap.rm.ingv.it/shake4/).Such ShakeMaps are based on Ground Motion Prediction Equation (GMPE; Bindi et al., 2011, among the others) and data recorded from accelerometric stations when available.
Recently, artificial intelligence-based procedures were proposed to produce near real time ground motion in terms of acceleration time histories (Jozinović et al., 2021, Tamhidi et al., 2021) and Intensity Measure (briefly, IM; Kubo et al., 2020, among the others).In general, ground motion maps were generated using earthquake source parameters (location, magnitude, and the finite fault if available), IM (Peak Ground Acceleration, Peak Ground Velocity, and Spectral acceleration, briefly named PGA, PGV, and Sa, respectively) at the recording accelerometric stations and the mean shear wave velocity in the upper 30 m, VS30, as a proxy to account for site lithostratigraphic amplifications.Bearing in mind that shaking maps become available only when the first location and magnitude estimation are available, Jozinović et al. (2021) propose to use waveforms to predict the ground motion intensity by means of a Machine Learning (briefly, ML) approach (i.e., it utilizes only a training set of earthquake waveforms recorded at a pre-configured network of recording stations).Moreover, ML has been adopted to produce seismic amplification factors maps, as in the Japan case study proposed by Kim et al. (2020), rather than to provide ground motion maps.Finally, Zhou et al. (2020) propose a seismic topographic effect prediction model.

Overall, the above-mentioned works have pointed out what follows:
-hypocentral depth (H), epicentral distance (R), and magnitude (M) are widely used to estimate ground motion over large areas considering the source effect; moreover, H, R, and M are provided few minutes after an earthquake; -VS30, the fundamental frequency of the deposit (f0), and the depth to the engineering bedrock (H800) are the key-parameters which well gauge the effect of local sub-soil conditions on the seismic wave propagation (i.e., lithostratigraphic effect); -elevation (h), topographic gradients (hx and hy, where x and y are two orthogonal directions), and second-order topographic gradients (hxx and hyy) are proxies which allow to describe the morphological effects on the seismic amplification phenomena.
In this view, this work focuses on the improvement of ground motion prediction over large areas by using ML technique.The main task of this work is to suggest a procedure including all the main key-parameters together (i.e., H, R, M, VS30, h, hx, hy, hxx, hyy).
Damage pattern induced by seismic events is related to both geological/geomorphological conditions and vulnerability of structures and infrastructures (Brando et al., 2020;Fayjaloun et al., 2021;Mori et al., 2020bMori et al., , 2019)).The ground motion prediction (i.e., seismic site response) is generally evaluated by means of numerical simulations which are time consuming and require well detailed models capable of properly represents sub-soil and topographic conditions (see for example, Bouckovalas and Papadimitriou, 2005;Falcone et al., 2020aFalcone et al., , 2020bFalcone et al., , 2018;;Gatmiri and Arson, 2008;Gazetas, 1982;Luo et al., 2020;Moscatelli et al., 2020b;Pagliaroli et al., 2014;Pitilakis et al., 1999;Régnier et al., 2016Régnier et al., , 2018)).Hence, ML approach was adopted to: i) implement H, R, and M parameters available few minutes after a seismic event; ii) include both lithostratigraphic (VS30) and morphological effects (h, hx, hy, hxx, hyy); iii) capture the spatial correlation at short distances (hundreds of meters) due to local site effects, which is essential for reliable hazard assessments.The main results of these elaborations are ground motion prediction maps (i.e., PGA, PGV, Sa) with the resolution of 50 x 50 meters, which can reproduce the variability captured by advanced numerical modelling.
Performances, presented in terms of Root Mean Square Error (RMSE) and residuals (i.e., difference between the base-10 logarithms of observed and predicted values of PGA, PGV, and Sa), are compared to the results proposed by other studies (Jozinović et al., 2021;Michelini et al., 2019, Bindi et al., 2011).
For the seismic sequence that hit Central Italy in 2016-2017, ML results and maps are shown in § 3.2 and § 4, respectively.
The seismic events in Central Italy of 2016 and 2017, began in August 2016 with epicentres located between Latium, Marche, and Umbria Regions.The first strong shock occurred on August 24, 2016, at 3:36 a.m. and had a magnitude of 6.0, with its epicentre located along the Tronto River valley, between the small municipalities of Accumoli and Arquata del Tronto.Two powerful replicas took place on October 26, 2016, with epicentres on the Umbria-Marche border, the first shock with magnitude 5.4 and the second with magnitude 5.9.On October 30, 2016, the strongest quake was recorded, with a moment magnitude of 6.5 with its epicentre in Umbria Region.On January 18, 2017, a new sequence of four strong tremors with a magnitude greater than 5 (with a maximum of 5.5) and epicentres located in Abruzzi Region took place.This set of events caused a total of about 41'000 displaced persons, 388 injured, and 303 deaths.
In detail, the paper refers to the 2 mainshocks of August and October, according to the available data (for the mainshock of October 30, 2016, much more accelerometric data are available and it is therefore possible to make more detailed analyses).
Referring to the seismic event occurred in Central Italy on October 30, 2016, a test is proposed in § 3.2 in terms of residuals of the ground motion IMs (i.e., PGA, PGV, and Sa).Ground motion prediction maps, referred to the Central Italy event occurred on August 24, 2016, (i.e., the first destructive event of the Central Italy seismic sequence for which a great amount of studies have been published) are shown in § 4 to enlighten the capability of the proposed ML approach to gauge the ground motion variability at the urban scale.Moreover, with reference to § 4.1, the ground motion profiles, based on the proposed ML approach, are compared with results obtained by means of two completely different methodologies: 2D numerical modelling of seismic site response (Gaudiosi et al., 2021;Giallini et al., 2020;Grelle et al., 2020) and with the mean values predicted by the Italian ShakeMaps (http://shakemap.rm.ingv.it/shake4/).Finally, in the § 4.2, referring to the seismic event occurred on October 30, 2016 (i.e., the strongest of the Central Italy seismic sequence), the spatial correlation structures of the maps are analyzed in terms of sill and range in order to provide the relation between local site effects and spatial resolution of ground motion maps.
In a nutshell, the novelty of this work is the use of the ML approach based on the analysis of a huge database of geological, geophysical, and geotechnical data, built with SM studies for the entire Italian territory.The quality and quantity of this database allow a robust application of ML including the prediction of local site effects (i.e., lithostratigraphic and morphological) on the seismic ground motion.

Input and output data for machine learning training and validation
The input and output data for the training of ML approach, were classified into three categories: seismological, geophysical, and morphological data.The ML approach was based on 15'779 seismological data referred to the log10 geometric mean of the horizontal component (geoH) referred to each IM (i.e., PGA, PGV, and Sa at 0.3 s, 1.0 s, and 3.0 s).Each value recorded by the accelerometric station, named output data in Table 1 (i.e., data to be reproduced by means of ML), represents an observed datum.In addition, Table 1 lists the used 9 predictors, named input data.Figs. 1 and 2 show the input and output data, respectively, adopted for the training phase of the selected ML approach and presented in this section.Furthermore, some data distributions seem to be imbalanced (e.g., magnitude, M, and elevation, h, Fig. 1).An imbalanced training input dataset is characterised by an unequal distribution of values.For instance, focusing on M distribution, it results that: about the 50% of the available data is in the range 4-5, each of the 3-4 and 5-6 ranges are characterised by the 20% of the available data, and only the 10% of the available data are referred to M > 6.Moreover, focusing on elevation distribution, it results that: each of the 0-300 m, 300-600 m, and 600-900 m ranges are characterised by the 30% of the available data, about the 10% of the available data is in the range 900-1'200 m, and only the 5% of the available data are referred to elevation higher than 1'200 m.
Consequently, when the ML algorithm learns the imbalanced data (see for example, Kubo et al., 2020) the learning focus is mainly on the fit of ground motions with magnitude lower than 6 or on the fit of site characterised by elevation lower than 1'200 m.With reference to the selected training input dataset, the imbalance seems to be caused by a sampling bias since no high magnitude ground motions were registered by the available accelerometric stations and since few accelerometric stations have been installed at high elevation where the exposition at seismic event is very low.Hence, it seems a hard task to improve the training dataset.In addition, distributions referred to topographic gradients and VS30 are characterised by few data with respect to steep slopes and high VS30 values.Anyway, how to handle the imbalanced dataset in the regression problem was out of the scope of this work.Consequently, referring to a range of an input datum, it is expected that the lower the amount of training data the higher the uncertainty.To this end, referring to the output data, maps of standard deviation are reported in § 4.1.(Luzi et al., 2016) and ITalian ACcelerometric Archive, herein ITACA, (Luzi et al., 2019).In detail, data referred to the Central Italy earthquake occurred on the 2016 and recorded by temporary network named 3A have been archived only in the ITACA database (http://itaca.mi.ingv.it/ItacaNet_30/#/home).It is worth noting that Greek and Turkish seismic events data were collected to consider earthquake characterised by M value greater than 6.5 and up to 7.6.
Moreover, earthquake characterised by H, R, and log10PGA value greater than 30 km, 400 km, and 2 (cm/s 2 ), respectively, were selected.It should be noted that the ITACA and ESM selected data are referred to the shallow active crustal region (i.e., SACR zone characterised by shallow events, H < 35 km, in agreement to Michelini et al., 2019).The distributions of

Geophysical data
Dynamic site condition was described by means of the time-averaged shear-wave velocity (VS) to a depth of 30 meters, the VS30 parameter.It is worth noting that the VS30 parameter has been successfully adopted to gauge lithostratigraphic effect on seismic wave propagation by Falcone et al. (2021) Finally, a GRASS GIS command r.slope.aspect(https://grass.osgeo.org)was used to generate the other morphological proxies (i.e., hx, hy, hxx, and hyy).Such command generates raster maps of first and second order partial derivatives from a raster map of true elevation values (i.e., AW3D30 data in this study).Fig. 1 shows the distribution of the morphological data, referred to the selected sites.

Method
The "Matlab Regression Learner App" tool (https://it.mathworks.com/help/stats/regression-learner-app.html) was employed to produce ground motion prediction maps using a supervised ML approach.With this application, users can choose the desired models among many different methods to automatically train and validate regression models.After training multiple models, they can be compared to choose the best one.The application includes commonly used regression methods such as linear regression models, decision trees, support vector machines, ensembles of tree models, and Gaussian Process Regression (GPR).
These models have all been tested; among them the best fitting performance in terms of RMSE was provided by the GPR model with exponential kernel (Table 2).GPR is a nonparametric, Bayesian approach to regression, which provides uncertainty measurements on the predictions.A detailed description of GPR method is outside the scope of this work.Suggested references for comprehensive descriptions of the GPR method are Rasmussen and Williams (2006) and chapter 6 of MathWorks (2019).
A k-fold cross-validation (k=5) method as described in chapter 24 of Mathworks ( 2019) was used on the training dataset to make validation sets.The fitting performance (in term of RMSE) on the validation set was considered as an indicator for the generalization ability of model.
After this training and cross-validation phase, described in § 3.1 and comparison in terms of residuals with the performance of the existing methods, an external test is presented in § 3.2.

Training and validation phase
The mean RMSE of the five cross-validation datasets were adopted to select the best ML approach.With reference to the tested ML approaches,  Referring to the best prediction model (i.e., GPR with exponential kernel), Fig. 3 shows the comparison between predicted and observed values.The performance of the GPR model is also presented in terms of mean value and standard deviation of the residuals' distributions (Table 3), where the residual is defined according to the Eq. ( 1) in agreement to what presented by other researchers (Bindi et al., 2011;Jozinović et al., 2021;Michelini et al., 2019).It should be noted that mean and standard deviation of the residuals' distributions referred to ShakeMap and GMPE were retrieved from the work of Jozinović et al.
(2021) to evaluate the performance of the ML approach suggested in this study.It is worth noting that the suggested ML approach provide the best performance with respect to the approaches proposed by the other studies in terms of both accuracy (mean value) and precision (standard deviation).In detail, the standard deviation values are reduced by the 45-60%. (1)

Testing phase
Mean and std values of the residuals' distributions are presented in this section with reference to the seismic event occurred on October 30, 2016 (briefly named test event), because it is the event with the most recordings of the whole dataset (241 accelerometric stations).It is worth noting that this event was not included in the dataset adopted for the training phase of the ML approach.Bearing in mind that 943 seismic events were characterised by M ≤ 6 and 25 earthquakes by M > 6 (see Fig. 1 referred to the to the training dataset), the Central Italy earthquake occurred on October 30, 2016 (M = 6.5) provides a robust test of the adopted ML approach.The GMPE proposed by Bindi et al. (2014) (hereafter also Bindi GMPE) was selected to gauge the IMs at the 241 sites of interest aiming to compare the GMPE and this ML approach performances.It should be noted that the Bindi GMPE provide IMs depending on the VS30 as in this study.Furthermore, the OPENQUAKE software (Pagani et al., 2014) was used to determine the IMs values based on the selected GMPE.Mean and std values referred to the test event (Table 4), are higher than those referred to the training and validation phase (Table 3), as expected, because the GPR model is trained on a few events with high magnitudes as discussed in § 2.
Moreover, mean and std values obtained in this example are lower than those obtained by means of GMPE as shown in Table 4.In detail, the standard deviation values are reduced by the 20-30%.Therefore, the overall performance of the proposed ML approach is satisfactory also at the highest magnitude.

Ground motion prediction map
After having demonstrated the goodness of the proposed method to reproduce IM values, this chapter presents examples of predictive maps produced by means of the exponential GPR model with a 50 x 50 m resolution.In § 4.1 the map for the August 24, 2016 seismic event of Central Italy is produced to compare some significant IM profiles produced with independent advanced numerical simulations and data retrieved from ShakeMaps (http://shakemap.rm.ingv.it/shake4/).In § 4.2 the map of the event of October 30, 2016, already used for the test phase, is analyzed in terms of spatial correlation structure.

Ground motion prediction map for August 24, 2016 seismic event of Central Italy and comparison with numerical modelling
The adopted GPR model was used to produce ground motion prediction maps referring to the earthquake occurred on August 24, 2016 (Fig. 4), with close ups for the Amatrice and Arquata villages (Fig. 5), where profiles of modelled spectral acceleration are already available (Gaudiosi et al., 2021;Giallini et al., 2020;Grelle et al., 2020).
The ground motion prediction map of the Sa0.3 reported in Fig. 4   In addition to the maps, Fig. 6 show the profiles (2 at Amatrice and 1 at Arquata) of Sa at 0.3 s and the comparison with the values of the same shaking parameter, calculated with different methodological approaches: ground motion prediction with ML approach (this study), 2D numerical simulations (modified after Gaudiosi et al., 2021;Giallini et al., 2020;Grelle et al., 2020), and ShakeMap (http://shakemap.rm.ingv.it/shake4/).The three profiles were chosen because they represent three very different geological and geomorphological structures: narrow valley (section AA' in Fig. 6, Arquata del Tronto), plateau of soft ground (section BB' in Fig. 6, Amatrice), morphology of a mountain peak with coverage of soft ground (section CC' in Fig. 6, close to Amatrice).As a matter of fact, the adopted ML approach reproduces the so-called valley effect, as in the case of Arquata del Tronto shallow valley (see the trend for 200 ≤ x ≤ 400 m in AA'), the combined lithostratigraphic and topographic effects, as in the case of Amatrice village (see the trend for 200 ≤ x ≤ 500 m in BB'), and the topographic amplification, as in the case of the AMT accelerometric station (Luzi et al., 2019;

Spatial correlation structure of the predicted maps
In this section we want to preliminarily deal with the spatial correlation of the IM parameters.In fact, the spatial correlation of ground-motion IMs represents a key issue in the seismic risk assessment, particularly in loss analysis (Infantino et al., 2021;Schiappapietra et al., 2020Schiappapietra et al., , 2021)).The geostatistical tool widely adopted to analyse the spatial correlation of geological and geotechnical data (Paolella et al., 2021, Raspa et al., 2008, Salvatore et al., 2019, Spacagna et al., 2018) is the semi-variogram (Chilès and Delfiner, 2012).The spatial structure is evaluated by assessing the dissimilarity of the variables measured at different locations.First, referring to the variable of interest (in this case, one of the selected IMs), the experimental semivariogram ˆ()  h is calculated from data using the method of moments (Chilès and Delfiner, 2012):  h .In this study, to assess the spatial structure of the variables (predicted IMs), the experimental variogram estimated from the predicted maps is fitted with the best fit model (i.e., the exponential model): where the parameters a and C are called respectively range and sill.The range defines the correlation distance, namely, the separation distance at which the data are spatially independent, and the sill represents the variance of the random process, limit value of (h).
For the Central Italy event occurred on October 30, 2016 and for all the predicted IMs maps (i.e., PGA, Sa0.3, Sa1; see Fig. 4 and supplementary materials), the spatial structure was performed with the GSTAT package (Pebesma, 2004) of the R software (R Core Team, 2021).The IMs values were extracted from the predicted maps with a regular punctual grid of 50 x 50 m.The isotropic experimental semi-variograms were computed and fitted with the above-mentioned exponential model.As an example, Fig. 7 shows the semi-variogram of the predicted Sa0.3 map.The spatial structure of all predicted IMs maps was characterized by the nested exponential model.The nested variograms highlight the presence of a double structure at different scales, i.e., a short-scale and a long-scale variability.In this case, two ranges and two sills are obtained for two levels of variability.Table 5 shows the sill and range values for the nested exponential models of all predicted IM maps.The first range, or short-scale structure, captures the first source of variability (first sill) over hundreds of meters and can be referred to lithostratigraphic site conditions and morphological variability.The long-scale structure captures the variability over thousands of meters and could be referred to regional geological units and large-scale morphological features.Furthermore, a significant part of the variance, around 30-40% of the total, are captured at short-scale.
An exhaustive treatment of this topic is beyond the scope of this work.We are now studying the spatial variability of input parameters that contribute to generate the target IM maps, and this will be the subject of a future paper.By the way, the preliminary results enlighten the importance to generate ground motion prediction maps with a spatial resolution in the order of hundreds of meters, to improve their quality in terms of predictivity.Seismic hazard maps should also include these specifications to consider the short-scale effects, even if starting from basic hazard maps with a resolution in the order of 2-5 km. 3) Production of maps with "internal" spatial correlation structures.With reference to the Central Italy earthquake occurred on October 30, 2016, the spatial structure of all predicted intensity measures maps is characterized by a nested exponential model providing a variance, captured at the short scale, in the range 30-40% of the total variability.
In terms of applications, the ground motion maps generated by means of the proposed Machine Learning approach are useful both for urban planning (aimed at reducing seismic risk) and for emergency management (aimed at a near real time estimation of damage to buildings and infrastructures).With reference to the emergency phase, by knowing the position and depth of the hypocentre and the magnitude of the event (in Italy these data are available a few minutes after the event), it is possible to predict the losses in the area struck by the earthquake in near real time.Overall, bearing in mind that the paradigm should be shifted from managing disasters to managing risk, the proposed methodology could represent a key-tool in seismic risk mitigation strategies deployed both pre and post seismic event.
In conclusion, the research on this topic will continue and focus on specific goals, which are listed in the following.
-Improve the method with more input proxies, made available after the seismic microzonation project for the whole national territory.In detail, maps of the depth to the engineering bedrock and of the fundamental frequency of the deposit will be soon available and allow to use such parameters as input data for the Machine Learning approach.
-Improve the method with worldwide seismological dataset.
-Improve the spatial resolution of existing input proxies integrating remote sensing data.
-Improve the spatial correlation analysis.
https://doi.org/10.5194/nhess-2021-282Preprint.Discussion started: 4 October 2021 c Author(s) 2021.CC BY 4.0 License.seismological data of the chosen events are shown in Figs. 1 and 2. The same figures also show the distribution of data described in the next part of this section.

Figure 1 .
Figure 1.Distribution of input data referred to the training dataset.

Figure 2 .
Figure 2. Distribution of output data referred to the training dataset in terms of geoH IMs.
is one of the cartographic results of this study; maps of PGA, PGV, and other spectral ordinates are in the supplementary materials.The theme of the map is the Sa parameter at T=0.3 s for an area around the epicenter of the event occurred on August 24, 2016, showing a maximum value equal to 2.25 g.Furthermore, referring to Fig. 4, macroseismic intensities, I_MCS, retrieved by Galli et al. (2017) are also reported next to the name of the https://doi.org/10.5194/nhess-2021-282Preprint.Discussion started: 4 October 2021 c Author(s) 2021.CC BY 4.0 License.villages.The maps of Fig.5show close ups referred to Arquata del Tronto (top) and Amatrice (bottom).These maps were chosen because the 0.3 s period is the fundamental vibration period of most buildings in the area (i.e., 2-3 storey buildings).Moreover, 0.3 s is compatible with the results of modelling provided byGaudiosi et al. (2021),Giallini et al. (2020),Grelle et al. (2020) for the same areas.The maps of Figs.4 and 5show an output that is in good agreement with the geological and geomorphological characteristics of the territory and, therefore, highlights local site effects.In fact, referring to Fig.4, it can be noted that the highest Sa0.3 values well describe the valleys' trend (i.e., the largest and continuous Tronto River valley) and the two extended areas in the southern part of the map (i.e., near Petrana and Torrita villages), which are characterized by lowest values of VS30(Mori et al., 2020a).

Fig. 5
Fig. 5 shows the mean values of Sa0.3 in the left side and the standard deviation values in the right side.It should be noted that the uncertainty is provided by a combination of the input data values.The uncertainty increases referring to input data values for which the ML is not well trained (Fig. 1 and discussion in § 2).For instance, values around 0.3-0.4 are in the areas of inhabited villages, characterised by input data values widely represented in the training dataset, while values in the range 0.6-0.8 are observed in correspondence with the combination of high slope values and high VS30 values, which are underrepresented in the training dataset.
see the trend for 100 ≤ x ≤ 200 m in CC').It should be noted that the trend of the values of our study reproduces that of the numerical simulations, also getting closer to the recorded values at Osservatorio Sismico delle Strutture (OSS, a network of buildings and bridges monitored in continuum by the Italian Civil Protection Department) site and AMT station (stars in BB' and CC').Moreover, the profiles provided by the ML approach are much more articulated and complex than the constant value (horizontal dashed line) of the ShakeMap (http://shakemap.rm.ingv.it/shake4/),which obviously fails to grasp the local site effects at this scale.https://doi.org/10.5194/nhess-2021-282Preprint.Discussion started: 4 October 2021 c Author(s) 2021.CC BY 4.0 License.

Figure 4 .
Figure 4. Ground motion prediction map of Sa0.3 (resolution 50 x 50 m) referred to the Central Italy earthquake occurred on August 24, 2016.I_MCS values retrieved by Galli et al. (2017) are reported next to the name of the villages.A and B squares are referred to the close ups at Arquata del Tronto and Amatrice, respectively.The surface active faulting, sketched in the figure, has been slightly modified after Galli et al. (2017).
https://doi.org/10.5194/nhess-2021-282Preprint.Discussion started: 4 October 2021 c Author(s) 2021.CC BY 4.0 License.xi) and z(xi + h) are the observed values of the variable z (i.e., one of the selected IMs) at the location xi and xi + h separated by h, and n(h) is the number of pairs at lag h.Under the assumption of second-order stationary, the semi-variogram increases with h up to a constant value of ˆ() https://doi.org/10.5194/nhess-2021-282Preprint.Discussion started: 4 October 2021 c Author(s) 2021.CC BY 4.0 License.

Figure 7 .
Figure 7. Semi-variogram of the predicted Sa0.3 map (Central Italy event occurred on October 30, 2016): experimental variogram based on the adopted ML approach and best-fitting model (nested exponential).

Table 1 . Input and output data for ML training and validation.
Seismological parameters are retrieved from Italian and European databases.With reference to 1'434 recording accelerometric stations, PGA, PGV, spectral accelerations (i.e., Sa referred to 0.3 s, 1 s, and 3 s), H, R, and M were retrieved from European Strong Motion Database, briefly ESM, Mori et al. (2020a)input data in ML approach), determined by means of in situ investigations, are also archived in the ESM and ITACA databases.With reference to ESM and ITACA sites not characterised by in situ surveys, the VS30 values were retrieved fromMori et al. (2020a).Fig.1shows the distribution of the VS30 data, referred to the selected sites.
Caglar et al. (2018)8)20a)ri et al. (2020a), based on SM studies, was here adopted.The SM studies have been carried out for the Italian municipalities through the funds allocated after the 2009 L'Aquila earthquake, in the framework of the Italian program for seismic risk prevention and mitigation(Moscatelli et al., 2020a).Approximately 4'000 SM studies have been already planned, representing about 99.8% of the municipalities eligible for funding (i.e., having 475 years return period PGA ≥ 0.125g).Out of the 4'000 planned SM studies, about 75% have been completed and approved(DPC, 2018;http://www.webms.it/servizi/stats.php).The SM studies permitted to collect, classify, and archive geological, geophysical, and geotechnical data with a uniform approach following national standard criteria(SM Working Group, 2008; TCSM, 2018).The data from in situ tests are organised into a database and georeferenced through an appropriate geographic information system, which is available at http://www.webms.it(DPC,2018).About 35'000 borehole logs and 11'300 VS profiles, related to about 1'700 Down-Hole and 9'600 MASW tests, were extracted from the SM dataset.Starting from the 11'300 VS profiles, VS30 values were calculated.Mori et al. (2020b)derive a large-scale VS30 map for Italy, starting from the global morphological classes afterIwahashi et al. (2018), by integrating the large amount of data from the Italian SM dataset.The VS30 map byMori et al. (2020a)was used here to integrate data where site-specific information was not available.modelisthe digital surface model which represents the canopy top and building roofs' elevations,Caglar et al. (2018)found that AW3D30 is the most accurate DEM among other similar data elevation products freely available.In detail, it was shown that the AW3D30 root mean square error is equal to 1.78 m. https://doi.org/10.5194/nhess-2021-282Preprint.Discussion started: 4 October 2021 c Author(s) 2021.CC BY 4.0 License.