Natural Hazards and Earth System Sciences A mixture of exponentials distribution for a simple and precise assessment of the volcanic hazard

The assessment of volcanic hazard is the first step for disaster mitigation. The distribution of repose periods between eruptions provides important information about the probability of new eruptions occurring within given time intervals. The quality of the probability estimate, i.e., of the hazard assessment, depends on the capacity of the chosen statistical model to describe the actual distribution of the repose times. In this work, we use a mixture of exponentials distribution, namely the sum of exponential distributions characterized by the different eruption occurrence rates that may be recognized inspecting the cumulative number of eruptions with time in specific VEI (Volcanic Explosivity Index) categories. The most striking property of an exponential mixture density is that the shape of the density function is flexible in a way similar to the frequently used Weibull distribution, matching long-tailed distributions and allowing clustering and time dependence of the eruption sequence, with distribution parameters that can be readily obtained from the observed occurrence rates. Thus, the mixture of exponentials turns out to be more precise and much easier to apply than the Weibull distribution. We recommended the use of a mixture of exponentials distribution when regimes with well-defined eruption rates can be identified in the cumulative series of events. As an example, we apply the mixture of exponential distributions to the repose-time sequences between explosive eruptions of the Colima and Popocat épetl volcanoes, México, and compare the results obtained with the Weibull and other distributions. Correspondence to: A. T. Mendoza-Rosas (ateresa@geofisica.unam.mx)


Introduction
Hazardous processes associated with volcanic eruptions may represent a serious threat to the lives, the property, and the economy of people dwelling near volcanoes.This threat may be quantified in terms of the volcano risk, which is the product of the hazard, the probability of occurrence of a potentially destructive eruption, and the vulnerability, the probability of damage to the exposed population and property associated with such an eruption.The knowledge of the hazard may help to reduce the vulnerability through the preparation, the series of measures specifically designed for the most probable eruption scenarios (De la Cruz-Reyna and Tilling, 2008).Assessing the hazard is thus an essential step of risk reduction that requires a broad understanding of the eruption occurrence patterns.
Most volcanoes have complex and irregular patterns of activity.This complexity is derived from the extent of the interaction between concurrent geophysical, geological and geochemical processes involved in the buildup of the mass and energy to be erupted, introducing a random behavior in the volcanic eruption time series.
Studies of volcanic eruption time series have been carried out by many authors using several statistical techniques on specific volcanoes.Some of the earliest, (Wickman, 1965(Wickman, , 1976;;Reyment, 1969;Klein, 1982) employed stochastic principles to analyze eruption patterns.Further studies included transition probabilities of Markov chains (Carta et al., 1981;Aspinall et al., 2006;Bebbington, 2007), Bayesian analysis of volcanic activity (Ho, 1990;Solow, 2001;Newhall and Hoblitt, 2002;Ho et al., 2006;Marzocchi et al., 2008), homogeneous and non-homogeneous Poisson process applied to volcanic series (De la Cruz-Reyna, 1991;Ho, 1991), a Weibull renewal model (Bebbington and Lai, A. T. Mendoza-Rosas and S. De la Cruz-Reyna: A simple and precise assessment of the volcanic hazard 1996a, b), geostatistical hazard-estimation methods (Jaquet et al., 2000;Jaquet and Carniel, 2006), a mixture of Weibull distributions (Turner et al., 2008) and non-homogeneous statistics to link geological and historical eruption time series (Mendoza-Rosas and De la Cruz-Reyna, 2008).An exhaustive list of the available literature on this subject is beyond the scope of this paper, and the above references only attempt to illustrate the diversity of methods that have been applied to the volcanic eruption sequences.
Different parameters have been used as random variables to characterize the eruptive time series, among them, the time of onset of eruptions, the interval between eruptions, the volume or mass released, and the intensity of eruptions (mass ejection rate).The probabilities of occurrence of future eruptions, i.e. the volcanic hazard, may be estimated analyzing the sequence of past eruptions at a volcano, characterizing the eruptions by a measure of their size that reflects their destructive potential, and assuming that the impact and effects of an eruption are proportional to both, the total mass or energy release (magnitude) and the rate of mass or energy release (intensity).The Volcanic Explosivity Index VEI is the semi-quantitative ranking of the eruption "size" based on those parameters (Newhall and Self, 1982).
The random variable considered in this study is the repose period, i.e., the length of the time intervals between successive eruption onsets according to specific VEI categories.The volcanic eruption sequences of polygenetic volcanoes are thus considered here as point processes developing along the time axis, and the distribution of the repose times between them is analyzed for each magnitude category.For our purpose, we shall consider here only significant explosive eruptions (i.e., larger VEI's), which usually are short duration events when compared with the time between eruptions.In this sort of point processes, volcanic activity regimes characterized by well-defined eruption rates may be readily identified.We thus propose a mixture of exponentials distribution (MOED) to study the distribution of repose times between successive eruptions for estimating the volcanic hazard of future explosive eruptions using VEI -characterized sequences of records.
The MOED is a sum of exponential distributions that may be quickly evaluated and interpreted.This method is particularly useful when the eruptive time series is non-stationary i.e. the distribution of the number of eruptions varies upon translation in a fixed interval (Cox and Lewis, 1966) and develops as a succession of eruptive regimes, each having a characteristic eruption rate.In such case, a single exponential distribution may not fit the observed distribution of repose times.Although a Weibull distribution (Johnson and Kotz, 1953;Ho, 1995;Bebbington and Lai, 1996b) may indeed fit such a distribution, the MOED permits an equally good fitting using a priori calculated distribution parameters, directly obtained from the identified occurrence rates of the regimes, whereas the Weibull distribution requires more complex calculations.In the present paper, we apply a MOED to the eruptive time series for Colima and Popocatépetl volcanoes, two of the most active polygenetic volcanoes in Mexico.
The hazard estimates are discussed and compared with the results obtained from the Weibull distribution and from other methods.

Mixtures of distributions
Mixtures of distributions occur frequently in the theory and applications of probability and statistics.Generally, an arbitrary distribution can be defined as a weighted sum of components distribution where t is the time between eruptions (or failure of any component in the context of quality control), and the parameters =(w 1 ,. . .w m , λ 1 ,. . .λ m ) are such that w i >0 for (j =1,. . ., m), and m i=1 w i = 1.w j is a weighting factor, and f j a component density function parameterized by λ j .Generally, a mixture distribution can be composed of m component distributions f j , each of a different type.Significant simplification can be achieved if all f j are of the same type, and only the parameters differ.Further simplification is attained if the chosen distributions depend on a single parameter.Since there is no restriction on the form of the distributions f j , a mixture of exponentials distribution was chosen for its simplicity, and because, considering that sequences of explosive eruptions follow Poissonian patterns (De la Cruz-Reyna, 1991, 1996), it is the distribution that describes the repose times of a Poisson process.On the other hand, Feldmann and Whitt (1998) showed that any monotone probability distribution function can be approximated by a finite mixture of exponentials.They also showed that a MOED is especially useful in modeling long-tailed data without some of the mathematical complications of other distributions such as the Pareto (Johnson and Kotz, 1953) and Weibull probability distributions.Therefore, we choose a mixture of m exponential distributions, also called a hyperexponential distribution, to describe the repose time distribution of explosive eruptions in Poissonian volcanoes with non-stationary behavior.The hyperexponential cumulative distribution function has the form: with a survival distribution of probability given by: The probability density function is: where λ j , w j >0 for (j =1,. . ., m), and  The parameters λ j s are the rates of the single exponential distributions; namely the number of occurring events per duration of each regime j .The weighting factors w j s are calculated as the normalized complement of the corresponding proportions of the duration of regimes, considering that regimes of shorter duration regimes tend to have higher eruption rates.
This model is used here to estimate the likelihood of at least one eruption in a given VEI category at a specified time in the future, i.e., the volcanic hazard.Supposing that the most recent event occurred s years ago, and assuming a duration t in years, the probability of no event in the next t years is P (T >s+t|T >s).We then obtain the probability of at least one eruption occurring within the next t years as

Applications
To illustrate the method, we apply a MOED to Colima and Popocatépetl volcanoes, intending to obtain a simply calculated estimation of the volcanic hazard.Both these volcanoes represent a significant threat to large populations dwelling around them.Colima volcano (19.51  Cruz-Reyna, 2008).This component appears as a succession of regimes, each one having a characteristic eruption rate, as can be recognized in Figs. 1 and 3. High and low eruption rate regimes alternate about a mean rate in such a way that the clustering of high regimes may not be attributed to chance, according to running mean tests (Klein, 1982;De la Cruz-Reyna, 1996;Mendoza-Rosas and De la Cruz-Reyna, 2008).
Figure 1 shows the four regimes that can be recognized from the historical activity of Colima volcano, based on reports from 1560 to the present for events with eruptive magnitudes greater than VEI 2. The rates of the eruption regimes are listed in Table 1.The time sequence of Colima is then fitted to a MOED, using the sum of four exponential distributions, each one having the corresponding observed eruption rate λ j .We use Eq. ( 2) with m=4.The weighting factors are calculated as the normalized complement of the duration in years of each regime, where D t is the duration of the sampled interval (449 years for Colima, and 497 years for Popocatépetl), and D i is the duration of the identified regime.Tables 1 and 3 summarize the MOED parameters for both volcanoes.
To compare the results obtained for Colima volcano using the MOED with the results from a Weibull distribution, we calculated the shape and scale parameters of the Weibull distribution by the graphical method described by Bebbington and Lai (1996b).The resulting parameters were 0.7767 and 19.2699, respectively.The comparison between the MOED and the Weibull distribution is shown in Fig. 2. Both the MOED and the Weibull distributions show good fits with the repose time data.
To obtain an objective measure of the quality of the fits, we performed a Kolmgorov-Smirnov (K-S) test as described in texts on nonparametric statistics (e.g., Gibbons, 1976), calculating the maximum absolute differences between both cumulative distributions, and the observed cumulative frequencies, obtaining 0.13182 and 0.15861, respectively and thus indicating a better fit of the MOED.These difference values are smaller than the critical K-S differences for the 0.01 level of significance.The MOED and Weibull probabilities of occurrence of at least one significant eruption at Colima volcano in the next t years, calculated from Eq. ( 5), are listed in Table 2.We applied the same procedure to the historical time series records of Popocatépetl volcano for eruptions with VEI≥2. Figure 3 shows a cumulative distribution of repose periods, in which two regimes may be recognized.Table 3 lists the 2-term MOED parameters for these regimes.The shape and scale parameters of the corresponding Weibull distribution were calculated with the same method used with Colima volcano as 0.6207 and 16.3137, respectively.and K-S tested the corresponding survival distributions.To cancel the advantage of the MOED over the Weibull distribution, the chosen change point should be moved at least 2 and 8 positions away from the graphically evident slopechange point for Popocatépetl and Colima volcanoes, respectively, indicating a good stability of the method.To deal with the case of smoothly changing regimes we suggest a simple procedure to decompose the MOED using the method of moments (Rider, 1961;Everitt and Hand, 1981;Sum and Oommen, 1995).Assuming that one can tell the number m x n i k , and from them the regime changes may be identified.

Discussion and conclusions
The simple MOED method discussed in this paper allows a simple assessment of the volcanic hazard from a straightforward analysis of the eruption time series.Apart from its simplicity, the flexibility of its shape allows good fits, even for long-tail distributions, and for non-stationary processes.Perhaps the most important feature of the MOED is that, unlike the Weibull distribution, the MOED parameters contain direct information of the physical process involved www.nat-hazards-earth-syst-sci.net/9/425/2009/Nat.Hazards Earth Syst.Sci., 9, 425-431, 2009 in the eruption time series, namely the eruption rates characterizing the successive regimes of a non-stationary process, and their durations.In the stationary case, defined by a single eruption rate invariant under translations along the time axis, both the MOED and the Weibull distribution reduce to the simple exponential distribution.This tends to confirm the Poissionian character of the eruption sequences, which is not lost when the eruption series shows a succession of eruptive episodes.An underlying assumption involved in this argument is that the eruption sequence may be perceived as a time-dependent process with a known interval distribution, namely the succession of high and low regimes alternating in such a way to maintain a steady long-term mean.This requires that the high-regimes are in the average systematically shorter (or longer) than the low-regimes.For example, in the case of Colima volcano the duration of the low regime (246 yr) is about five times the mean (54 yr) of the high regimes.Assuming the regimes maintain an approximately uniform distribution such that keeps the long-term eruption rate constant (Fig. 1), and noting that the current regime, initiated after the eruption of 1913, has produced only one VEI 3 eruption in the year 2005, we are inclined to believe that this eruption is a normal part of a low-regime, rather than signaling the onset of a high-regime.To emphasize the effects of the time dependence of the eruptive process characterized by a succession of low and high regimes, we compare the probabilities of at least one eruption occurring in given times intervals using the MOED, with the probabilities obtained from the Weibull distribution and from the stationary Exponential, Binomial and Poisson cumulative distribution functions characterized by the overall eruption rate of the whole sequence.The result of this comparison is summarized in Fig. 5. There, we observe that the Poisson, Bino-mial and Exponential distributions render the same results, as expected from a stationary eruption sequence having the mean eruption rate of the whole series, concealing in this way the effects of the non-stationary behavior.In contrast, the MOED and the Weibull distributions properly describe the increased probability of an eruption occurring in shorter periods (up to about 20 years), derived from the contribution of the high regimes.However, at longer repose periods, the Weibull probabilities approach the stationary exponential results, "saturating" at periods of about 120 years, and revealing some inability of those distributions to deal with the long-tailed part of the observed distribution.On the other hand, the MOED allows estimation of the probabilities of eruptions occurring after long repose periods, accounting for the low regimes.
One important point that must be emphasized is that this method should be performed on a portion of the time series that satisfies a criterion of completeness, i.e. a portion in which no significant eruption data are missing, which in most cases is the historical eruption data.A single missing event may importantly distort the observed distribution of repose times.We therefore use only the VEI categories that made possible to consider the eruptive series as complete over a period with reliable reporting, i.e., the last five centuries.A chief issue derived from this is that the relatively short historical series may not include low-rate major eruptions, that the geologic records indicate the volcano is capable to produce.How the estimates of volcanic hazard may be adjusted for such cases is a problem that has been addressed elsewhere (Mendoza-Rosas and De la Cruz-Reyna, 2008).The MOED method is thus recommended to estimate the volcanic hazard of volcanoes showing high rates of eruptive activity and significant evidence of a distribution of high and low eruptive regimes.

Fig. 1 .
Fig. 1.Cumulative number of eruptions reported in the period 1560-present in the magnitude class VEI>2 for Colima volcano and the four regimes that can be recognized (separated by solid vertical lines).The slopes of the dashed lines represent the eruption rate λ of each regime and the thick line slope represents the eruption rate of whole series (λ global =0.040089).
Figure 4 shows the fits of the survival distributions obtained from the MOED and the Weibull distributions with the observed data of the eruptive time series of Popocatépetl volcano.The Kolmogorov-Smirnov test at the significance level 0.01 yields 0.15355 and 0.16687 for the MOED and the Weibull distribution, respectively, indicating again a better fit of the MOED.The MOED and Weibull probabilities of occurrence of at least one eruption in different time periods to Popocatépetl volcano are shown in Table4.

Fig. 2 .
Fig. 2. Distribution of observed repose intervals with duration greater than t years (steps) for eruptions at Colima volcano with VEI>2 in the period 1560 to the present.The dashed line is the survival Weibull distribution and the solid line is the MOED.

Fig. 3 .
Fig. 3. Cumulative number of eruptions reported in the period 1512present in the magnitude class VEI≥2 for Popocatépetl volcano, and the two regimes that can be recognized (separated by the vertical lines).The dashed lines represent the eruption rate λ of each regime, and the thick line slope represents the eruption rate of whole series (λ global =0.034205).

Fig. 4 .
Fig. 4. Distribution of observed repose intervals with duration greater than t years (steps) for eruptions at Popocatépetl volcano with VEI ≥2 in the period 1512 to the present.The dashed line is the survival Weibull distribution and the solid line is the MOED.

Fig. 5 .
Fig. 5. Probabilities of occurrence of at least one eruption in a period t (years) calculated with a MOED (solid line), a Weibull, (dashed line) an exponential (triangles), a Binomial (diamonds), and a Poisson (circles) cumulative distribution functions for the eruptive time series of Colima volcano with VEI>2 in the period 1560 to the present.

Table 1 .
Observed eruptive regimes and calculated parameters of the MOED exponential distributions for Colima volcano (VEI>2).
• N, 103.62 • W) is the active volcano in México with the highest eruption rate, and a historical record of 41 eruptive events of varied magnitudes in the past 500 years (De la Cruz-Reyna, 1993; Mendoza-Rosas and De la Cruz-Reyna, 2008).Popocatépetl volcano (19.02 • N, 98.62 • W) is located within a densely populated region, about 70 km southeast of downtown Mexico City and 40 km west of the city of Puebla, which with other nearby cities add up to over 20 million people vulnerable to direct hazards associated with a major explosive eruption (De la Cruz-Reyna and Tilling, 2008).To calculate the eruption rates and the weighting factors, we use here the compilation of historical eruptive time series obtained by Mendoza-Rosas and De la Cruz-Reyna (2008) for those volcanoes.Colima and Popocatépetl volcanoes show a weak nonstationary component in the behavior of their eruption time sequences (De la Cruz-Reyna 1993; Mendoza-Rosas and De la

Table 2 .
Eruption hazards of Colima volcano expressed as probabilities of occurrence of at least one eruption with VEI>2 over different time periods.

Table 3 .
Observed eruptive regimes and calculated parameters of the MOED exponential distributions for Popocatépetl volcano (VEI≥2).Regime Time period Number of Duration of regime Annual rate λ Weighting factor w

Table 4 .
Eruption hazards of Popocatépetl volcano expressed as probabilities of occurrence of at least one eruption with VEI≥2 over different time periods.