Uncertainty in PSHA related to the parametrization of historical intensity data

In recent times a great deal of research aimed to reduce of uncertainties in probabilistic seismic hazard analysis (PSHA). Most attention was paid to the role of ground motion prediction equations (GMPEs; see, e.g., Strasser et al., 2009), while no studies were devoted to a possible larger source of uncertainties: the historical catalogues of earthquakes. In areas where historical catalogues provide a record many centuries long and surface geology does not permit us to obtain complete catalogues of seismogenic faults at the moment, their use is unavoidable for estimating seismicity rates required for PSHA. Their use is also gaining popularity as an independent tool for the estimation of PSHA (D’Amico and Albarello, 2008) and for validation purposes (Stirling and Petersen, 2006; Mucciarelli et al., 2008). After proposing an alternative parametrization of historical macroseismic intensity, this paper is devoted to discussing what the real impact of starting uncertainties in intensities on the final uncertainties in PSHA is. 1 State of the art and a new proposal For each known historical earthquake, the first step of research is to assign a macroseismic intensity to any point of the area affected by what is recognized to be a single event. Then a parametrization is made in order to obtain epicentral data, such as location and epicentral intensity. In several instances, the historical seismologist is not able to unambiguously assign a single intensity value whose description matches the observations exactly. It is then common practice to assign values such as VII–VIII, which are subsequently treated as 7.5 in further calculations. This is not the correct way to proceed, since intensity scales do not provide for the use of half-degrees. This is advice also included in the second version of the EMS (European Macroseismic Scale) (Grünthal, 1998): It will often be the case that no single intensity degree can be decided upon with any confidence. In such cases, it is necessary to decide whether some approximate assessment of intensity can be made, or whether the data are so contradictory that it is better to leave the matter unresolved. ... It is recommended that the user preserves the integer character of the scale, and not uses forms such as “6.5” or “6 2” or “6+”. . . In such cases the intensity should be written as 6–7, meaning either 6 or 7; it does not imply some intermediate value. [sic] The use of forms such as 6.5 is a remnant of the punched card era, when earthquake catalogues needed to have all the information compressed in the 80 characters available for each line/event. Since the 1990s, there have been proposals for a more thorough statistical treatment of uncertainty in intensity data, either using Bayesian (García-Fernández and Egozcue, 1989; Rotondi et al., 1993) or nonparametric methods (Magri et al., 1994). The vectorial intensity was subsequently proposed for site hazard estimates by Albarello and Mucciarelli (2002). This is commonly done when using probabilistic attenuation relationships but could be a great opportunity for the revision of catalogues, allowing the historical seismologist to express his feeling that an event is “almost certainly” of an intensity VIII (e.g. 0.9 probability) while it cannot be completely ruled out that the effects Published by Copernicus Publications on behalf of the European Geosciences Union. 2762 M. Mucciarelli: Uncertainty in PSHA Table 1.Proposal of conversion rule of expert judgment into probability for the statement “the intensity is equal to or greater than a given class”. Absolutely true 1 Very likely 0.9 Uncertain 0.5 Unlikely 0.1 Absolutely false 0 reached IX (e.g. 0.1 probability). For 12 ◦ intensity scales (such as Modified Mercalli Intensity (MMI), Mercalli Cancani Sieberg (MCS), Medvedev Sponheuer Karnik (MSK), European Macroseimic Scale (EMS)), this will result in a 12-term probability vector, with a value for each intensity class expressing the probability that the intensity is equal to or larger than that class: [1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.9 0.1 0.0 0.0 0.0]. This formulation could be used both for single-site intensity data and for the epicentral parametrization of the earthquake. Of course, this could apply also to data that now are taken as integer intensity values but could raise some doubts for historical seismologists. The use of probability to express the degree of belief should be subjected to simple conversion rules such those listed in the EMS-98 scale for the definition of quantities such as “many”, “few”, or “most”. A possible conversion rule is given in Table 1. 2 The effect of intensity uncertainty on PSHA Mucciarelli and Albarello (2012) discussed a rough approximation of the effect of intensity uncertainty on PSHA (probabilistic seismic hazard analysis): if one is not sure that a quake had an intensity of, say, 5 because some elements suggest that it could be of an intensity of 6, then, from the point of view of relative error, this means that we are compiling our input catalog with data that are affected by a (6−5)/5 = 20 % starting error. The relative error gets smaller for higher intensities, but since the PSHA is calculated using the whole catalogue and since the theory of error propagation states that relative errors cannot decrease throughout the procedure from input data to final outcomes, the minimum effect on PSHA will be at least an uncertainty of 20 %. To explore the effect that uncertainty on intensity assignment has on mean recurrence times, the simplest estimate of PSHA at a given site, I have chosen sites in the area affected by the recent 2012 earthquake in Emilia, Italy. The Po Plain is an ideal study area because many cities have a complete seismic history dating back to Middle Ages. When considering the site earthquake catalogue for Ferrara and Modena, the two closest cities to the 2012 earthquake sequence, we Figure 1. Empirical cumulative distribution functions of the MRT in Ferrara (above) and Modena (below), produced by 5000 simulation runs for some of the intensity classes. note many damaging earthquake as well as an abundance of uncertain intensities (Table 2). There are two possible ways to proceed. They have in common the Monte Carlo simulation of hundreds of runs to estimate MRTs (mean return times). Once the classes of a seismic scale are defined as a domain for the Monte Carlo simulation, the inputs are generated according to one of these two ways: 1. fix a level of intensity (i.e. VII–VIII), modify a site history by associating the value p (or 1− p) with the intensity class VII (or VIII) for each of the earthquakes of that intensity, p being drawn from a uniform distribution on (0;1); 2. fix a level of intensity (i.e. VII–VIII), modify a site history by associating each of the earthquakes with the VII (or VIII) intensity class if the random number p, drawn from a uniform distribution on (0;1), is less (or larger) than 0.5 Nat. Hazards Earth Syst. Sci., 14, 2761– 2765, 2014 www.nat-hazards-earth-syst-sci.net/14/2761/2014/ M. Mucciarelli: Uncertainty in PSHA 2763 Table 2. Site seismic history for Modena and Ferrara, Italy, from the Italian Database of Macroseismic Intensities, available on the INGV website (http://emidius.mi.ingv.it/DBMI11/ ).


State of the art and a new proposal
For each known historical earthquake, the first step of research is to assign a macroseismic intensity to any point of the area affected by what is recognized to be a single event.Then a parametrization is made in order to obtain epicentral data, such as location and epicentral intensity.
In several instances, the historical seismologist is not able to unambiguously assign a single intensity value whose description matches the observations exactly.It is then common practice to assign values such as VII-VIII, which are subsequently treated as 7.5 in further calculations.This is not the correct way to proceed, since intensity scales do not provide for the use of half-degrees.This is advice also included in the second version of the EMS (European Macroseismic Scale) (Grünthal, 1998): It will often be the case that no single intensity degree can be decided upon with any confidence.In such cases, it is necessary to decide whether some approximate assessment of intensity can be made, or whether the data are so contradictory that it is better to leave the matter unresolved.... It is recommended that the user preserves the integer character of the scale, and not uses forms such as "6.5" or "6 1 2 " or "6+". . .In such cases the intensity should be written as 6-7, meaning either 6 or 7; it does not imply some intermediate value.[sic] The use of forms such as 6.5 is a remnant of the punched card era, when earthquake catalogues needed to have all the information compressed in the 80 characters available for each line/event.Since the 1990s, there have been proposals for a more thorough statistical treatment of uncertainty in intensity data, either using Bayesian (García-Fernández and Egozcue, 1989;Rotondi et al., 1993) or nonparametric methods (Magri et al., 1994).The vectorial intensity was subsequently proposed for site hazard estimates by Albarello and Mucciarelli (2002).This is commonly done when using probabilistic attenuation relationships but could be a great opportunity for the revision of catalogues, allowing the historical seismologist to express his feeling that an event is "almost certainly" of an intensity VIII (e.g.0.9 probability) while it cannot be completely ruled out that the effects Published by Copernicus Publications on behalf of the European Geosciences Union.[1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.9 0.1 0.0 0.0 0.0].
This formulation could be used both for single-site intensity data and for the epicentral parametrization of the earthquake.Of course, this could apply also to data that now are taken as integer intensity values but could raise some doubts for historical seismologists.
The use of probability to the degree of belief should be subjected to simple conversion rules such those listed in the EMS-98 scale for the definition of quantities such as "many", "few", or "most".A possible conversion rule is given in Table 1.
2 The effect of intensity uncertainty on PSHA Mucciarelli and Albarello (2012) discussed a rough approximation of the effect of intensity uncertainty on PSHA (probabilistic seismic hazard analysis): if one is not sure that a quake had an intensity of, say, 5 because some elements suggest that it could be of an intensity of 6, then, from the point of view of relative error, this means that we are compiling our input catalog with data that are affected by a (6−5)/5 = 20 % starting error.The relative error gets smaller for higher intensities, but since the PSHA is calculated using the whole catalogue and since the theory of error propagation states that relative errors cannot decrease throughout the procedure from input data to final outcomes, the minimum effect on PSHA will be at least an uncertainty of 20 %.
To explore the effect that uncertainty on intensity assignment has on mean recurrence times, the simplest estimate of PSHA at a given site, I have chosen sites in the area affected by the recent 2012 earthquake in Emilia, Italy.The Po Plain is an ideal study area because many cities have a complete seismic history dating back to Middle Ages.When considering the site earthquake catalogue for Ferrara and Modena, the two closest cities to the 2012 earthquake sequence, we

note many damaging earthquake as well as an abundance of uncertain intensities (Table 2).
There are two possible ways to proceed.They have in common the Monte Carlo simulation of hundreds of runs to estimate MRTs (mean return times).Once the classes of a seismic scale are defined as a domain for the Monte Carlo simulation, the inputs are generated according to one of these two ways: 1. fix a level of intensity (i.e.VII-VIII), modify a site history by associating the value p (or 1 − p) with the intensity class VII (or VIII) for each of the earthquakes of that intensity, p being drawn from a uniform distribution on (0;1); 2. fix a level of intensity (i.e.VII-VIII), modify a site history by associating each of the earthquakes with the VII (or VIII) intensity class if the random number p, drawn from a uniform distribution on (0;1), is less (or larger) than 0.5   For the site intensities, I modified the catalogue following the first approach, while the second was used for the sensitivity analysis of seismogenic zones.The result of 5000 runs on catalogues resulting from the simulation for Modena and Ferrara is shown in Fig. 1, with the MRT estimated separately for each intensity class as the time between the first and last shock divided by the sum of probabilities of events obtained counting the number (or the total weight) of the events in the various classes of intensity (VI;VII;VIII;IX) (see Magri et al., 1994).
It is possible to note that, as expected, the dispersion increases with the increase in intensity.On the other hand, the MRTs are smaller for lower intensities, so the relative error on estimates remains similar.This can be seen better by plotting histograms instead of empirical cumulative distribution functions (ECDFs) (see Fig. 2).
It is worth noting that the distributions are slightly asymmetrical for lower intensities (an effect of the Gutenberg-Richter-like distribution as a function of intensity) with fewer events coming from above than from below.The highest class is highly asymmetrical due to the fact that no contribution may come from above.Following the nonparametric style adopted for this exercise, the uncertainty can be estimated as the ratio between the median and interquartile range, with results in the range of 25-30 % for all intensity thresholds.
The last step of this study considers a recurrence model using epicentral intensities data, as given by the Catlogo Parametrico dei Terremoti Italiani, parametric catalogue of Italian earthquakes (CPTI) catalogue of Istituto Nazionale di Geofisica e Vulcanologia, national institute of geophysics and volcanology (INGV) (http://emidius.mi.ingv.it/CPTI11/).The area selected is shown in Fig. 3 and comprises the four seismogenic zones covering most of the Po Valley, according to the zoning available from INGV on the web page http://zonesismiche.mi.ingv.it/documenti/App2.pdf.
The parametric catalogue of this area (CPTI11) reports 184 events with an intensity greater or equal to a V-degree MCS, and 76 are listed with half-integer intensity values (see Fig. 4).
To estimate the effect of this uncertainty on seismic rates, the four zones were considered together.Hundreds of simulations were run to estimate seismic rates (again, as described in Magri et al., 1994), and at every run each half-degree was assigned either to the lower or to the higher degree using a random selection with uniform probability.
Figure 5 reports the result of 2000 runs.As expected, the variability increases with increasing intensity in a way similar to that observed for site intensities.In this case, a further analysis was performed: at each random generation, it was possible to save the distribution of interevent times of each modified catalogue; This was done in order to study the distribution of ratio average vs. variance (Fig. 6) compared to a Poisson distribution for which the ratio should be equal to 1.
The result is puzzling.Moving from the lowermost intensities to the highest, the behaviour seems to change from clustered to slightly periodic.This may be explained in several ways: 1.No single relationships exist for the frequency distribution of events (Gutenberg-Richter-like).  2. Notwithstanding the fact that CPTI11 is declared as a declustered catalogue, clustering is still present for lower intensities, maybe due to medium-range correlation of seismicity between the borders of the Po Plain, as suggested by Bragato (2014).
3. The move toward periodic behaviour (α < 1) at higher intensities could be due to the problem of undersampling because not all the modified catalogues allow for enough events of intensity IX to be present.For a discussion of the role of undersampling in assessing the time distribution of large events correctly, see Ellsworth et al. (1999) or Mucciarelli (2007).

Conclusions
The uncertainty in intensity assignment, both at a given site and in parametric historical catalogues, propagates to PSH (probabilistic seismic hazard) estimates, affecting final result with a relative error of around 25-30 %.To avoid this uncertainty, existing catalogues should be revised avoiding false half-degrees in favour of a probabilistic translation of expert judgment.The intensity could then be given for each event as a 12-values vector that can provide alternative input to standard PSHA codes or be easily incorporated into existing PSHA software designed for the exploitation of intensity data, such as SASHA (D'Amico and Albarello, 2008).

Figure 1 .
Figure 1.Empirical cumulative distribution functions of the MRT in Ferrara (above) and Modena (below), produced by 5000 simulation runs for some of the intensity classes.

Figure 2 .
Figure 2. Histograms of the MRT in Ferrara, produced by 5000 runs for some of the intensity classes.

Figure 3 .
Figure 3.The seismogenic zones whose earthquakes were used for simulations.

Figure 4 .
Figure 4.The earthquakes used for simulations, plotted by year and intensity.Note the large amount of half-degree intensities.

Figure 5 .
Figure 5. ECDF of the activity rates for four intensity classes.

Figure 6 .
Figure 6.Aperiodicity and clustering for 2000 runs of modified historical catalogues of the Po Plain.

Table 1 .
Proposal of conversion rule of expert judgment into probability for the statement "the intensity is equal to or greater than a given class".