Integrating macroseismic intensity distributions by a probabilistic approach: an application in Italy

The geographic distribution of earthquake effects quantified in terms of macroseismic intensities, the so-called 10 macroseismic field, provides basic information for several scopes including source characterization of pre-instrumental earthquakes and risk analysis. Macroseismic fields of past earthquakes as inferred from historical documentation may present spatial gaps, due to the incompleteness of the available information. We present a probabilistic approach aimed at integrating incomplete intensity distributions by considering the Bayesian combination of estimates provided by Intensity Prediction Equations (IPEs) and data documented at nearby localities, accounting for the relevant uncertainties and the discrete and 15 ordinal nature of intensity data. The performance of the proposed methodology is tested at 28 Italian localities with long and rich seismic histories, and for two well-known strong earthquakes (i.e., 1980 Southern Italy and 2009 Central Italy events). A possible application of the approach is also illustrated relative to a sixteenth century earthquake in Northern Apennines.

The proposed procedure is described at first, then it is tested on a set of localities and macroseismic fields included in the Italian Macroseismic Database DBMI15 (Locati et al., 2019).

65
In the frame of a coherent Bayesian formalization, the proposed procedure combines intensities estimated at a site with an IPE with observed intensities at neighboring localities for the same earthquake, taking into account the inherent uncertainty. To this purpose, considering any l-th event, the discrete probability density distribution pl (Is|Iv) is computed, to associate to each possible intensity value Is at the site s a probability value conditioned by the occurrence of effects of intensity Iv at any other (1) Here pl(Is) represents the "prior" probability density which is deduced from an IPE by using the epicentral parameters (location, epicentral intensity or magnitude, etc.) of the l-th event. In general, the most common IPEs (in their probabilistic formulation) have the form (2) 75 (Albarello and D'Amico, 2004) where the average µ is a function of the epicentral distance D and the intensity at the epicenter Ie (or, eventually, the estimated magnitude). Both average µ and standard deviation σ are determined from the statistical analysis of the available information (e.g., Pasolini et al., 2008 for Italy). To account for the uncertainty affecting epicentral intensity, the marginal probability can be computed as where g is the probability distribution which expresses the uncertainty affecting the epicentral intensity and Imax is the upper bound of the adopted macroseismic scale (e.g., 12 for the MCS scale). The probability density pl(Is) can be computed in the form It is worth noting that Eq. (1) can be iteratively applied when an increasing number of neighboring sites is considered to constrain intensity. This can be simply performed by substituting the "prior" distribution with the output of the preceding estimate. The key element of Eq. (1) is the conditional probability density q(Iv|Is), which expresses the correlation between intensity values at neighboring localities. In other terms, such a probability density represents the "smoothness" of the macroseismic field and plays the role of covariance in classical geostatistics. More specifically, q(Iv|Is) expresses the constraining power of Is on Iv. According to Albarello et al. (2007), q(Iv|Is) can be estimated empirically by considering observed intensity distributions. To this purpose, data provided by the Italian Macroseismic Database DBMI15 (Locati et al., 2019) were considered as a case study.
3 Assessing the spatial variability of macroseismic data in Italy

The Italian macroseismic database DBMI15
The long tradition of historical macroseismic investigation in Italy has produced a wealth of studies and data on the seismic 95 history of the country and neighboring areas. All such studies are collected and organized in the Italian Archive of Historical Earthquake Data -ASMI (https://emidius.mi.ingv.it/ASMI/index_en.htm, Rovida et al., 2017), which grants access to the information on more than 6200 earthquakes occurred in the Italian area from 461 B.C. to 2019. The data gathered in ASMI are of several typologies and formats, and provide a large number of intensity data from different sources, such as macroseismic bulletins, online databases, and many scientific papers and reports. The different information collected in ASMI for each 100 earthquake requires a careful comparison in order to identify the reference study among those available for the compilation of the Italian Macroseismic Database DBMI. The current release of the latter (DBMI15, https://emidius.mi.ingv.it/CPTI15-DBMI15/, here considered in its version 2.0, Locati et al., 2019) is the result of the specific selection of these data according to the content and quality of each study and to the number and spatial distribution of intensity data. DBMI15 makes available 123756 IDPs related to 3219 Italian earthquakes in the time-window 1000-2017, and referred to 20000 localities, of which 105 15332 are in Italy. DBMI15 results from 189 different studies, and the intensity data they provide are not homogenous as regards the geographic coordinates and the standards used for assessing macroseismic intensities. For this purpose, a series of operations were performed in order to obtain an homogenous set of intensity data: i) a unique gazetteer, covering the whole national territory was adopted in order to match the position of a locality with the macroseismic observation, and ii) a standard to express the macroseismic intensity (e.g., 6, 6-7, 7 MCS) was defined and non-conventional descriptive codes (e.g., "D" for 110 damage, or "F" for felt) were adopted when the available information is not sufficient for assessing an intensity value. DBMI15 allows direct access to seismic histories of Italian localities and provides data upon which the macroseismic parameters of the Although the guidelines of the European Macroseismic Scale EMS98 (Grünthal, 1998) recommend the users to preserve the 115 integer character of the intensity scale and avoid forms such as "6.5" or "6½" or "6+", in many studies intensity data are listed as intermediate values in order to express uncertainty affecting the intensity estimate. This is also the solution adopted in DBMI15. Following D'Amico and , intensity data can be classified in two categories: "certain data" (one single intensity value I', e.g., 6) and "uncertain data" (pair of values I'-I'', e.g., 6-7). In case intensity Iv is uncertain between two contiguous values Iv' and Iv'' (e.g., Iv = 6-7), Eq. (1) becomes: where an equal probability is assigned to the hypotheses Iv = I' and Iv = I''. This is the way "uncertain" intensity values contained in DBMI15 are treated in the following analysis.

Results
To estimate the probability q(Iv|Is) in Eq.
(1) one can consider the relative frequencies of the differences between intensity values at pair of sites affected by the same event. Such probability is expected to monotonically decrease with the distance between the sites, and above any distance threshold q(Iv|Is)≈ q(Iv), i.e., Is becomes not informative about Iv. The closer the sites considered are, the higher the informative power of Is on Iv is expected to be, because closer sites possibly share also the same seismostratigraphical and geomorphological conditions. On the other hand, since the selected sites correspond to urbanized areas (settlements, villages, towns, etc.), each conventionally represented with the geographical coordinates of a single point,

130
there is a lower limit to the distances between considered sites, which depends on the size and density of urbanized areas.
Moreover, distances in the estimate of q(Iv|Is) should be large enough to include at least two sites.
In order to evaluate the optimal distance threshold to characterize q(Iv|Is), the geographic distribution of the 15332 Italian localities in DBMI15 has been investigated. In particular, for each locality, the number of localities within a set of possible distance thresholds has been computed. Figure 1 shows that there are significant parts of the Italian territory (mainly in southern 135 areas) where the mutual distance of localities is larger than 10 km. On the other hand, for all the sites (except for 9 localities on small islands), there is at least another locality within 20 km. The latter was thus selected as the reference distance threshold for the characterization of q(Iv|Is).

140
To this purpose, a dataset derived from DBMI15 has been defined by selecting 546 earthquakes with at least 10 IDPs with intensity greater than or equal to 5 MCS. These earthquakes occurred in the period 1117-2017 CE and are well distributed over the whole Italian territory. From the intensity distributions of these earthquakes, we discarded: i) non-numerical macroseismic observations (e.g., "Damage" or "Felt"); ii) data related to unidentified localities or large areas (see Locati et al. (2019) for details);

145
iii) macroseismic observations related to earthquakes with epicenter inside the active volcanic areas (i.e., Mt. Etna and Campanian volcanoes), due to the faster attenuation observed in these zones with respect to the rest of Italy (e.g., Carletti and Gasperini, 2003).
We obtained 58062 IDPs with intensity ranging from 1-2 to 11 MCS (Fig.2), referred to more than 12500 Italian localities.  The conditional probability q(Iv|Is) in Eq. (1) was estimated from the relative frequencies of the differences between Iv and Is computed for each of the 546 selected earthquakes. In this analysis Iv and Is represent any pair of intensity values observed at neighboring localities (i.e., within a distance of 20 km). If the intensity values Is and Iv are both "uncertain" (e.g., 6-7), the two adjacent integer degrees (i.e., 6 and 7) are considered as equiprobable and the differences between the four intensity values are 155 computed; if both Is and Iv are "certain" values (e.g., 7), the difference is counted four times (Albarello et al., 2007). In general, one has where ΔI = Iv -Is and the dependence of both Iv and Is is due to the lack of a defined metrics for intensity degrees. As a preliminary step, we assume that which corresponds to the assumption of a linear metrics for intensity values. This hypothesis will be tested in the following.
Two different analyses were performed in order to estimate the frequency distribution q(ΔI) from the residuals (Iv -Is) considering: i) only the nearest IDP within 20 km, and ii) all the IDPs within 20 km. Table 1 reports the values of q(ΔI) expressed as the relative frequency of cases where, for each earthquake, site intensity Is differs by ΔI from the intensity Iv 165 observed at the nearest locality (q(ΔI)near) and at all localities (q(ΔI)all) within 20 km. The evident differences between the two analyses is that for q(ΔI)all the probability distribution results less peaked at ΔI = 0 and thus broader than q(ΔI)near. Figure 3 https://doi.org/10.5194/nhess-2021-113 Preprint. shows the effects of releasing the assumption in Eq. (7). To evaluate this aspect, we tested the dependence of q(ΔI) on Is. The results show that the probability of having the same intensity in the nearest locality is slightly higher for Is equal to 7 and 8 while this probability tends to decrease for Is equal to 6, 4, 9 and 5, respectively. The outcomes of this analysis seem to be 170 almost independent from the intensity Is. As a consequence, we consider the approximation in Eq. (7) reliable.
In both analyses, we then verified the impact on the results of the distance among localities. Figure 4 shows the effect of distance on q(ΔI). This effect is quite weak and only concerns the probability of observing the same intensity at the two sites (ΔI = 0). Considering only the nearest locality within 20 km (Fig. 4a), the frequency of ΔI = 0 decreases from around 52% for the distance range 0-5 km to 42% for the range 15-20 km. When all localities within 20 km are considered (Fig. 4b), this relative frequency decreases from 48% to 36% for the range 0-5 km and 15-20 km, respectively.

Testing procedure
To test the effectiveness of this procedure, the probability distributions in Eq. (1) were computed for a set of Italian localities 195 and then compared with available observations. The estimated probabilities pl(Is|Iv), derived at each j-th site for each l-th earthquake, were used to calculate the predicted number of occurrences for each intensity degree Is (Npred) over the total of the M sites and N earthquakes considered: The predicted values (Npred) can be compared with the observed number of occurrences (Nobs) for the same intensity degree

200
Is. If the observed intensity value is "certain" (e.g., 6), it can be expressed as: In case the Is value is "uncertain" (e.g., 6-7), an equal probability (0.5) was assigned to the two adjacent integer degrees. The Central Limit Theorem was used to check the consistency between predicted and observed values. The statistical test Z follows the standardized Gauss distribution: The standard deviation (σpred) associated to the predicted values was estimated as described in Albarello and D'Amico (2005): Equation (10) can be used to evaluate the statistical significance of the discrepancy between predicted (Npred) and observed (Nobs) values. According to Albarello and D'Amico (2005), when |Z| is greater than 2, the resulting discrepancy can be 210 considered statistically significant at the 5% confidence level.

Application
The above approach was applied to 28 localities with at least 40 intensity data in DBMI15 homogeneously distributed over the Italian territory. For each locality and for each earthquake, we computed the probability pl(Is|Iv) with Eq. (1) using the probability distribution q(ΔI) in Eq. (6) derived from all near localities within 20 km (Table 1), by excluding the intensity 215 observed at the site of concern. We then estimated the predicted number of occurrences for each intensity degree Is for all sites and earthquakes through Eq. (8) and compared it with the observed occurrences (Eq. 10).
The probability pl(Is|Iv) has been computed with three different analyses for the prior distribution pl(Is) and for q(ΔI): a) using a uniform distribution over the intensity range 2-11 for pl(Is) and the intensity observed at the nearest locality within 20 km (i.e., probability q(ΔI)near in Table 1); 220 b) using a uniform distribution over the intensity range 2-11 for pl(Is) and iteratively considering in Eq. (1) the intensities observed at all the localities within 20 km (i.e., probability q(ΔI)all in Table 1); c) using as pl(Is) the probability computed through an IPE and the intensities observed at all the localities within 20 km (i.e., probability q(ΔI)all in Table 1); the IPE defined for Italy by Pasolini et al. (2008) and recalibrated by Lolli et al. Using Eq. (10) the number of observed occurrences (Nobs) for a given intensity value Is was compared with the predicted number (Npred) derived for the three possible choices (a, b, c) of the prior distribution pl(Is) and q(ΔI) for the 28 selected localities ( Table 2). The differences in percentage between predicted and observed values were computed, and expressed as [(1-Npred/Nobs) * 100]. Z represents the standardized Gaussian statistics: if |Z| > 2, the resulting discrepancy can be considered 230 statistically significant (probability 0.05).

Is
Nobs   Table 2 shows that, for analysis c, the differences between observed and predicted values are less than 10% for intensity 3, 6, 7, 8. For analysis a and b, this is verified only for intensity 3, 7 and 4, 6 respectively. Results for intensity 2 and 11 cannot be considered significant due to the strong data incompleteness for the former (Fig. 5) and to the lack of data (only 1 observation available) for the latter. Furthermore, for intensity 5 there is a significant underestimation of the observed intensities in all the analyses, whereas for intensity 4 analysis b and c tend to overestimate the observed values (see Fig. 5 and Z values in Tab. 2).

240
These outcomes indicate that the number of predicted values (Npred) is consistent with the number of observed occurrences (Nobs) at the 28 test localities. Among the three analyses, analysis c is more effective than the others, although the discrepancies expressed with the Z test are statistically significant for intensity 4 and 5. However, this may depend on the selected dataset, because DBMI15 contains only earthquakes with maximum intensity greater than or equal to 5 (Locati et al., 2019).
To verify the impact of using this procedure rather than the IPE alone to predict intensity values, a comparison test was carried For the 2009 L'Aquila earthquake, Figure 6a shows that for 33 out of 315 sites (11%) the values predicted with the IPE alone are equal to the observed ones, and for 206 sites (65%) the predicted intensities differ by more than 1 intensity degree from the observations. The results obtained with our procedure (Fig. 6b) show a higher predictive performance because 218 sites (69%) present the same predicted intensity as the observed value and, for 288 sites (91%), the differences are within 1 intensity 255 degree. For the 1980 Irpinia earthquake, Figure 7 shows that the intensity values predicted with the IPE alone are equal to the observed ones for 652 out of the 1202 considered sites (54%), whereas using the proposed methodology these sites become 822 (68%). A difference of 1 intensity degree between the predicted values and the observed ones is shown at 478 sites (40%) with the IPE alone (Fig. 7a), whereas at 350 sites (29%) with our procedure (Fig. 7b).
This test demonstrates that the intensity values obtained by means of the proposed procedure better reproduce the observed 260 intensities than using the IPE alone. In fact, more than the 90% of differences between predicted and observed intensity values are within 1 intensity degree, that is the uncertainty associated to any macroseismic intensity assessment.

5 Case study
How this procedure may serve the purpose of reconstructing the macroseismic fields of past earthquakes, especially those with scattered IDPs, is shown by means of the case study of an earthquake occurred on 13 June 1542 in the Mugello area (Northern Apennines), with Mw 6 and epicentral intensity equal to 9 according to CPTI15. In DBMI15 there are 45 IDPs with maximum intensity equal to 9, as assessed by Guidoboni et al. (2007). As reported in Figure 8, the effects of this earthquake are primarily 275 known in the epicentral area with 31 localities with intensity greater than or equal to 8, whereas the macroseismic information at the localities far from the epicenter is extremely scattered.
With the aim of integrating the intensity distribution of this earthquake, 968 localities of DBMI15 within a radius of 20 km from each of the 45 IDPs were considered. Figure 9 shows the modal values of the probability distribution pl(Is|Iv) computed at each of the 968 localities assuming as prior distribution the probability derived through the IPE (Pasolini et al., 2008;Lolli et al., 2019) and using the intensities observed at all the localities within 20 km (analysis c). Such values are compared with the intensities (expressed as modal values) predicted by the IPE alone. Figure 9 shows that the intensity values estimated by the two approaches are quite different, particularly in the epicentral area. For example, focusing on the area where the IPE alone predicts intensity 7, the intensities computed by the proposed procedure are equal to 6, 7 and 8. On the contrary, moving away from the epicentral area, the two approaches provide similar results for intensity 5.
285 Figure 10 displays the geographical distribution of the predicted intensities at the 968 localities represented as the probability to be greater than or equal to intensity 6 and 8, computed through i) the IPE alone, and ii) the proposed procedure. As shown in Fig. 10a and 10b, the probability of intensity greater or equal to 6 is more than 90% for 278 localities using the IPE alone, while the same probability extends to 488 localities using the second procedure. The differences between the two approaches become more evident in case of localities where the probability of intensity greater than or equal to 8 is higher than 50%, that 290 is 96 localities using the IPE alone (Fig. 10c) and 212 using our procedure (Fig. 10d).
The results shown in Fig. 9 and Fig. 10 consent to appreciate the impact of the proposed methodology in reproducing the pattern of observed intensities with respect to the simple isotropic decay of intensity with distance predicted by IPEs.

Conclusions
The procedure proposed in this article estimates the probability distribution for a given intensity value at the considered site through a Bayesian approach. The procedure takes into account (i) region-dependent empirical relations to model macroseismic and ordinal nature of macroseismic intensity and its uncertainties. This procedure allows improving the macroseismic intensity distributions of historical earthquakes constraining the intensity values calculated at a site through an IPE with intensities observed at neighboring localities for the same earthquake.

310
The results obtained in the application part (see section 4.2) emphasize that this method well reproduces the observed values for intensity greater than or equal to 6 and equal to 3. On the other hand, the outcomes for intensity 4 and 5 show respectively an overestimation and underestimation that could be linked to both: (i) the incompleteness of the analyzed dataset due to the input threshold of DBMI15 (intensity ≥ 5) and (ii) the incompleteness of historical documentation for lower intensity degrees.
These outcomes demonstrate that the intensities predicted with the proposed procedure match the observed values better than 315 using the IPE alone.
This procedure is thought to integrate incomplete and scattered intensity distributions while avoiding the isotropic decay of intensity with distance resulting from existing IPEs. Through a more realistic modelling of the pattern of predicted intensities, this procedure takes into account the spatial distribution and variability of observed intensity data to constrain the results. Not unexpectedly, the obtained results are dependent on the spatial distribution of the data observed for the selected earthquake 320 and on the number of intensity values available in nearby localities.
The proposed procedure aims at the integration and enrichment of both the intensity distributions of individual earthquakes and the seismic history of single localities. Together with suggestions to further document the spatial distribution and severity of effects in the framework of historical seismological investigation, the outcomes provided by this procedure can be used for local seismic hazard assessment, as well as planning activities aimed at risk mitigation.