Seismic zones for Azores based on statistical criteria

The objective of this paper is to define seismic zones in the Azores based on statistical criteria. These seismic zones will likely be used in seismic simulations of occurrences in the Azores Archipelago. The data used in this work cover the time period from 1915 to 2011. The Azores region was divided into 1 ◦ × 1 area units, for which the seismicity and the maximum magnitudes of events were calculated. The seismicity, the largest earthquakes recorded and the geological characteristics of the region were used to group these area units because similar seismic zones must delineate areas with homogeneous seismic characteristics. We have identified seven seismic zones. To verify that the defined areas differ statistically, we considered the following dissimilarity measures (variables): time, size and seismic conditions – the number of seismic events with specific characteristics. Statistical tests, particularly goodness-of-fit tests, allowed us to conclude that, considering these three variables, the seven earthquake zones defined here are statistically distinct.


Introduction
The Azores Archipelago is located at the triple junction of the Mid-Atlantic Rift, where the Eurasian, Nubian, and American Plates meet.
The intense seismic activity in the region has been studied by many authors (e.g., Bezzeghoud et al., 2008;Borges et al., 2008).
As shown in Fig. 1a, the Azores consists of nine islands distributed among three different groups: the islands of Flores and Corvo, constituting the Western Group; the islands of Terceira, Graciosa, São Jorge, Faial and Pico, which are part of the Central Group; and the islands of São Miguel and Santa Maria in the Eastern Group.
Figure 1b shows epicenters in the Azores between 1915 and 2011, and Fig. 1c shows a zoomed-in map of epicenters of the islands.
The aim of this study is to define the seismic zones of the Azores, which will later be used for seismic simulations of the region.
We define several zones that express differences in seismicity, while allowing for a model that is not overly complex.Seismic zones are defined by polygons that delineate areas of homogeneous seismicity characteristics.They are also based on differences in geology and tectonics, but seismicity is the main characteristic in defining them (e.g., Reiter, 1991;Kagan et al., 2010).
In this work, the main criterion to define the zones is the recorded seismicity, as different zones should exhibit different statistical characteristics.The number of events is the most important variable used in this study; magnitude is also used, although large events may be infrequent.Nunes et al. (2000) delineated a 28-seismic-zone model based on the distribution of epicenters and on the tectonics of Azores region.Due to the lack of seismic data, the model was simplified for use in hazard assessment to include nine main zones in order to allow a reliable statistical characterization of the model (Carvalho et al., 2001).
With the upgrade of the seismological network in the Azores in recent decades, seismic data have become more reliable and complete for magnitudes greater than M L = 3.This allows for more robust statistical analyses than were possible in the past.
The seismic zones cover geophysical units where data are available.For each unit, we computed the following: -The number of events.
Published by Copernicus Publications on behalf of the European Geosciences Union.-The maximum magnitude recorded.
We grouped the 28 areas of Nunes et al. (2000) into seven zones that exhibit different characteristics.We used several statistic tests (parametric and nonparametric) to confirm whether these seven zones were significantly different.

Data
The data used in this work were gathered from two sources.
For the period 1915-1998, we used the catalog of Nunes et al. (2004), and for the period 1999-2011, data were directly obtained from Instituto de Meteorologia ( 2011).
The earlier period covers the region encompassed by 11.50 • W-42.86 • W and 10.80 • N-47.54 • N. A total of 9214 earthquake records are available, of which 5456 have information on Richter-scale magnitudes.
The catalog for the later period covers the area within 21.31 • W-35.42 • W and 34.30 • N-45.57• N, and contains 9608 earthquakes, all of which include magnitude information.
Table 1 summarizes the main characteristics of the data used.
The data were analyzed as a whole, including foreshocks and aftershocks.Fig. 2 shows a Gutenberg-Richter plot, which indicates that the dataset is not complete.Many small-magnitude events occur in the sea, far from the seismic network, and thus are not recorded.According to the Gutenberg-Richter law, a linear trend should exist between Log N and m: where N is the number of events of magnitude greater than m, and a and b are constants fitted to the data.Removing earthquakes smaller than magnitude 2, a leastsquares approximation leads to LogN (m) = 5.77611 − 0.79 m, (2) with a correlation coefficient R = −0.996,which indicates a significant linear correlation and that the catalog is complete for earthquakes with magnitude larger than 2.
If we consider only events with magnitudes greater than 2, much of the dataset would be lost (the value 2 corresponds approximately to the 0.40 quantile of magnitude; see Table 4), and the aim of this paper is not to estimate the   1915 1922 1929 1936 1943 1950 1957 1964 1971 1978 1985 1992 1999 2006 Year No. of seismic events constants a and b of the Gutenberg-Richter law.Therefore, we consider all earthquakes with a catalog magnitude greater than 0, which corresponds to events for which the magnitude has not been determined.

Exploratory data analysis
We used the R ® software (e.g., Dalgaard, 2008;Venables et al., 2011) to perform the statistical analysis in this study.For some calculations, we also used the Turbo Pascal ® software.

Annual seismicity
The seismic records contain information about the year, month, day, hour, minute and second of each event.For straightforward computation, time was converted into decimal years.
Consider the variable annual seismicity (AS), which represents the number of earthquakes that occurred in one year.
Figure 3 displays the AS for the period from 1915 to 2011.The AS is very heterogeneous throughout the study period, and it appears to increase in 1960.This increase reflects the expansion of the seismic network in the Azores Archipelago.
Table 2 shows the main statistical properties of the AS for the period of 1915-2011.
Figure 4 shows a histogram of the AS, which is highly variable, varying between fewer than 200 earthquakes in one year to more than 1000.The heterogeneity of the data suggests that we should use records from 1960 onwards, as this produces a dataset that best reflects the actual seismicity.

Statistical study of some characteristics of seismic events
Each earthquake can be characterized by three variables: time, size and space.The variable time (Dt) is characterized by the time intervals between consecutive earthquakes, the variable size (S) is the Richter magnitude associated with an earthquake and the space variable (Sp) gives the number of the zone corresponding to the epicenter of the earthquake.However at this stage, Sp is characterized by the latitude and longitude of each earthquake.
Figure 5 describes a schematic representation of the seismic process of occurrences, where S i represents the size of earthquake i, Dt i the time interval between this event and the preceding one (i − 1) and Sp i the location of event i.

Study of the time variable
As previously stated, Dt represents the time intervals between consecutive earthquakes, expressed in years, during the time period of 1915-2011.Let Dt 60 be the variable that typifies the time intervals between consecutive earthquakes between 1960 and 2011.
Table 3 shows the statistics of the variables Dt and Dt 60 , with the largest difference observed in their maximum values.While the maximum value of Dt is approximately 5 yr, the maximum value of Dt 60 is only 0.78 yr, which is less than nine months.The mean Dt is approximately half of the mean Dt 60 .Comparing the quantiles of these two variables, there is no significant difference below the 0.90 quantile, indicating that the major difference is in the maximum values of the variables.
Figure 6a and b present the histograms of Dt and Dt 60 , respectively, which show clear difference between the two random variables.

Study of the size variable
As described in Sect.3.2.1,S represents the size of each earthquake between 1915 and 2011.
As shown in Table 1, 3757 seismic records do not include magnitude; the magnitudes are null values in the catalog.If these records are not removed, they will influence the statistics of S.
In addition, if the earthquakes with null magnitudes were ignored, the time intervals between consecutive events would increase.Let S w0 represent the earthquake magnitudes, excluding the zero values of each earthquake between 1915 and 2011.
Table 4 summarizes the statistics calculated for S and S w0 .
As expected, S w0 has a larger mean than S, and the standard deviation of S w0 is less than for S. The quantiles of S w0 are greater than similar quantiles of S, except for the 1.0 quantile (maximum).
Figure 7a presents a histogram of the absolute frequencies of S. The large number of zero magnitudes is due to earthquakes with unknown magnitudes.
The histogram displayed in Fig. 7b shows the asymmetry of the probability density function of S w0 , with a significant tail for large values of S w0 and a positive skewness coefficient.

Definition of seismic zones
As previously described, the main goal of this study is to identify regions with significant differences in seismicity.We use the number of events and the maximum magnitude recorded to identify these differences.The region included in the dataset was divided into 1 • × 1 • area units, and the number of earthquakes recorded in the period from 1915 to 2011 was computed for each area unit.Let Sq represent the number of events between 1915 and 2011 in each area unit.
Figure 8a shows the values of Sq for the region bounded by 40 There is a band of increased seismicity (values above the 0.8 quantile of AS) with an approximately WNW-ESE orientation, which covers the Eastern and Central groups of the Azores Archipelago, as well as the NW Faial region, the trench west of Graciosa, the D. João de Castro Bank and the Hirondelle Trench.
However, for roughly half of this band, there is a slight decrease in the AS of the region bounded by 36 • N-39 • N and 27 • W-28 • W.
A region with high values of AS, although lower than for the WNW-ESE band, is oriented approximately SSW-NNE, and includes the islands of the Western Group and the northern Mid-Atlantic Ridge.
East of the WNW-ESE band, there is a region with nearly E-W orientation, in which the AS is also elevated.
The maximum magnitude recorded was also computed for each area unit during the study period.
Figure 8b shows that the WNW-ESE and SSW-NNE bands of seismicity also have higher maximum recorded magnitudes, with the largest magnitude, 8.2, recorded in the E-W band.
In the WNW-ESE band, two centers of high magnitudes are highlighted, one of which covers the Central Group of the Archipelago, particularly the western region of Faial Island, and the other covers the Eastern Group of the Archipelago, with an emphasis on São Miguel Island.
The seismic zones were defined by aggregating area units according to the aforementioned patterns, with an emphasis on the seismicity and the maximum magnitude recorded.Differences in geomorphology were also taken into account.
In the region within 11.50 • W-42.54 • W, 10.80 • N-47.54 • N, the following seven seismic zones were defined (Table 5, Fig. 9a and b): Zone 1 comprises the Western Group of the Azores Archipelago and is situated NW of the Mid-Atlantic Ridge.magnitude recorded is 6.2.The islands of Flores and Corvo are in this zone.Zone 2 is a maritime zone corresponding to the Mid-Atlantic Ridge and its transform faults to the north.This zone also comprises the North Azores Fracture Zone.It has high levels of seismicity and a maximum magnitude of 6.0.

It presents low values of seismicity, and the maximum
Zone 3 is a maritime zone with very low seismicity, located NE of the Central and Eastern groups of the Archipelago and east of the Mid-Atlantic Ridge.The maximum magnitude recorded is 4.7, the lowest maximum magnitude for all zones.
Zone 4 encompasses the Central Group of the Archipelago, west of Capelinhos and the Terceira Rift central sector.It features very high seismicity and a maximum magnitude of 6.0.Compared to the maximum magnitudes recorded in other zones, this magnitude is not very high, indicating that the main characteristic of this zone is the high seismicity and not its maximum magnitude.This zone contains five islands: Faial, Pico, São Jorge, Terceira and Graciosa.
Zone 5 comprises the Eastern Group of the Archipelago, the Hirondelle Trench and the D. João de Castro Bank.It has the highest seismicity of all seven zones, and the maximum magnitude recorded is 7.0.This zone is characterized not only by its high seismicity but also by its high maximum magnitude recorded.This zone contains two islands: São Miguel and Santa Maria.
Zone 6 is a maritime zone and includes the Gloria Fault.The seismicity is moderate, but this zone has the highest magnitude of all zones: 8.2.It is characterized by a moderate number of earthquakes, which can be of relatively high magnitude.
Zone 7 is a maritime zone and is the furthest south of all seismic zones.It has the lowest seismicity, and the maximum magnitude recorded is 6.1.
Zones 1, 3 and 7 include small numbers of events compared to the other seismic zones.Therefore, they are considered to be background zones.
The statistical study focuses primarily on zones 2, 4, 5 and 6, although all zones were examined initially.
We calculated the number of earthquakes recorded between 1915 and 2011 for each seismic zone.Table 6 and Fig. 10 summarize the results.

Statistical study of the time and size variables for each seismic zone
In the statistical study of the time variable, characterized by the time intervals between consecutive events, only the period 1960-2011 was considered.
For the size variable, data from all time periods were considered, but the null values were not taken into consideration.

Time
Consider Dt 60,i , i ∈ {1, 2, 3, 4, 5, 6, 7}, the variable that represents the time interval between an event and its previous event, both in zone i, in 1960 and later.

Size
Let S w0,i represent the nonzero magnitudes in the zone i, i ∈ {1, 2, 3, 4, 5, 6, 7}.Where it is: "Fig.8b shows that the WNW-ESE and SSW-NNE bands of seismicity also have higher maximum recorded magnitudes, with the largest magnitude, 8.2, recorded in the WNW-ESE

Methodology for the dissimilation of seismic zones
For the region covered by the data, area units were aggregated by their identical characteristics, resulting in the seven distinct zones.
In the following tests, the aim was to quantitatively show whether the variables corresponding to these areas were significantly different.
If the variables time, size and seismic conditions, which will be explained latter, differ significantly for each defined area, then statistical tests must indicate that these samples come from different populations.
As the seismic zones 1, 3 and 7 are markedly different from other areas based on their reduced seismicity, they are considered to be background zones.It was unnecessary to carry out statistical tests for these zones, and our statistical study focuses on zones 2, 4, 5 and 6.

Statistical tests
Zones 2, 4, 5 and 6 were first studied together.We used a chisquare test for r independent samples to investigate whether the r populations from which r samples were extracted were the same; that is, we tested the null hypothesis of the variables corresponding to the different zones being taken from the same population.
If the test conclusion was a clear rejection of the null hypothesis, it would not be necessary to use additional tests for r samples, otherwise we must use, for example, the Kruskal-Wallis test (see Siegel and Castellan, 1988).
If a nonparametric test for r samples leads to the rejection of the null hypothesis, the variables cannot come from the same population, but it remains unclear as to whether all come from distinctly different populations.To investigate whether there are samples with the same distribution, we can compare any pair of the r samples using a nonparametric test for pairs of samples.
In this case, we can use the chi-square test for two independent samples or the Kolmogorov-Smirnov two-sample test (e.g., Conover, 1999), with the latter preferable because it is more powerful; see Appendix A2 for a detailed description of these methods.

Testing differences in time
The chi-square test for r independent samples was used to verify whether the samples formed by Dt 60,j , j ∈ {2, 4, 5, 6} can be extracted from the same population.The data may be grouped into classes.Ten classes bounded by the deciles of Dt 60 have been adopted (see Table 3).
The meanings of O ij , E ij , C k and n r are explained in Appendix A1.The results obtained in the chi-square test (Table 9) reveal significant differences between the observed and expected values, leading to the rejection of the null hypothesis.
The computation of the test statistic by Eq. (A1) -T = 441 301 with a 0.95 quantile of χ 2 27 of 40.11 and a 0.99 quantile of 46.96 -indicates that we should reject the null hypothesis.As expected, we can conclude that the samples do not have the same distribution.
Given the large difference between the critical values and the test statistic, it was not necessary to carry out more tests using multiple samples.
The rejection of the null hypothesis only means that the samples do not have the same distribution, but they do not determine whether the samples have distinctly different distributions.
In cases such as this, Siegel and Castellan (1988) recommend investigating whether there are any samples with the same distribution.For this purpose, it is adequate to use the Kolmogorov-Smirnov two-sample test, in which we compare C 4 2 = 6 pairs of samples.
Test statistics were computed using Eq.(A3).Table 10 summarizes the obtained results.
In all comparisons, the null hypothesis was rejected; that is, the statistical distributions of the variables Dt 60,i , i ∈ {2, 4, 5, 6} are different.
However, for the comparison of zones 4 and 5, the test statistic is equal to the critical value for a significance level of 1 %.This means that although the empirical distributions of these two populations differ significantly, the difference is smaller than that obtained for other pairs of samples.
To dispel any doubt concerning the possible (but unlikely) similarity between the distributions of Dt 60,4 and Dt 60,5 , a parametric test using the average of these two variables was conducted.The t test for two populations (variances unknown and unequal) (see Kanji, 1993) allows testing if the mean of the two variables may be considered equal.For details on t tests, see Appendix A3.
The test can be applied because the size of the samples is large.
Let µ 4 and µ 5 be the means of the variables Dt 60,4 and Dt 60,5 .

Null hypothesis
The test statistic t has a Student's t distribution with v degrees of freedom.Applying Eqs.(A6) and (A7), one obtains, respectively, t = 3.356 and v = 9436.The Student's t variable with n degrees of freedom approaches the standard normal distribution as n approaches infinity.Let Z 1−α/2 be the 1-α/2 quantile of the normal standard distributions: Z 0.975 = 1.96 and Z 0.995 = 2.58.
As t is much greater than the critical value, the null hypothesis can be rejected for the significance levels of 5 % and 1 %.

Testing differences in size
As was performed for Dt 60,j , j ∈ {2, 4, 5, 6}, the variables S w0,i , i ∈ {2, 4, 5, 6} were compared as a whole using the chi-square test for independent samples, and pairs were later compared.
Data can be grouped into classes.Ten classes bounded by the deciles of S w0 were adopted (see Table 4), but classes 1 and 2 were joined because they have few expected values.Table 11 summarizes the results obtained for the chisquare test for independent samples.
Computing the test statistic using Eq.(A1), we obtain T = 1781.1,with a 0.95 quantile of χ 2 24 of 36.42 and a 0.99 quantile of 42.98; we reject the null hypothesis.Therefore, S w0,i , i ∈ {2, 4, 5, 6} do not come from the same population.
To investigate whether the samples arise from the same population, they were compared in pairs using the Kolmogorov-Smirnov two-sample test.
Table 12 summarizes the results obtained for the Kolmogorov-Smirnov test.
In all comparisons, the null hypothesis was rejected; that is, the statistical distributions of the variables S w0,i , i ∈ {2, 4, 5, 6} are different, demonstrating that for the size variable, seismic zones differ significantly.In this case, performing additional tests is unnecessary.

Testing seismic conditions dissimilarity
For each seismic zone, all earthquakes belong to one of four seismic conditions: 1.A recent event (i.e., Dt 60,i ≤ 0.50 quantile of Dt 60 ) with a magnitude that is not large (i.e., S w0,i ≤ 0.80 quantile of S w0 ); 2.Not a recent event (i.e., Dt 60,i > 0.50 quantile of Dt 60 ) and with a large magnitude (i.e., S w0,i > 0.80 quantile of S w0 ); 3. A recent event (i.e., Dt 60,i ≤ 0.50) with a large magnitude (i.e., S w0,i > 0.80 quantile of S w0 ); 4.Not a recent event (i.e., Dt 60,i > 0.50 quantile of Dt 60 ) and with a magnitude that is not large (i.e., S w0,i ≤ 0.80 quantile of S w0 ) have the correct boundaries.Let cd i , i ∈ {2, 4, 5, 6} represent the seismic condition of each earthquake that occurred in zone i.This variable can assume only values of 1, 2, 3 and 4, corresponding to the four seismic conditions.
To verify that the samples formed by cd i , i ∈ {2, 4, 5, 6} can be extracted from the same population, a chi-square test for independent samples was used.
Figure 11 strongly implies that the test leads to the rejection of the null hypothesis.Indeed, there is only some similarity in the distribution of cd i in zones 4 and 5. Hypothesis H0: cd i , i ∈ {2, 4, 5, 6} have the same distribution; H1: cd i , i ∈ {2, 4, 5, 6} do not have the same distribution.Table 13 summarizes the results obtained for the chisquare test.
Calculating the test statistic using Eq.(A1), we obtain T = 1810.4,with a 0.95 quantile of χ 2 9 of 16.92 and a 0.99 quantile of 21.67.Therefore, we reject the null hypothesis  and conclude that the samples do not have the same distribution.This means that the distributions of seismic conditions are not the same in zones 2, 4, 5 and 6.
Table 14 summarizes the results obtained in the Kolmogorov-Smirnov test.
In all comparisons, the null hypothesis was rejected; that is, the statistical distributions of cd i , i ∈ {2, 4, 5, 6} are different, demonstrating that the seismic conditions of the seismic zones differ significantly.
We also tested the dissimilarity of seismic conditions using a similar procedure that differs only in using the Dt 60 quantile of 0.80 instead of 0.50.This provided similar results.

Conclusions
In this study, we defined seismic zones for the Azores region.We first divided the area into 1 • × 1 • area units.For each area unit, the seismicity and maximum magnitude recorded were computed.
These two variables were used with the geological characteristics of the region to group area units with similar characteristics; we identified seven seismic zones.
Statistical tests, particularly goodness-of-fit tests, were used, allowing for us to conclude that the variables time, size and seismic conditions describing the seven seismic zones differ significantly.
The results of this study will likely be used in future seismic modeling of occurrences in the region.

Statistical tests A1 Chi-square test for r independent samples
The data consist of r independent random samples of sizes n 1 , n 2 , . . .n r .
Let F 1 (x), F 2 (x), . . ., F r (x) represent their respective distribution functions.Each observation can be classified as exactly one of the k categories or classes.
Let O ij represent the observed number of cells (i, j ).The total number of observations is denoted by N. Therefore, N = n 1 + n 2 + . . .+ n r .
Let C j be the total number of observations in the j th class (j = 1, 2, . . ., k), such that C j = O 1 j + O 2 j + . . .+ O rj , j = 1, 2, . . ., k. where The term E ij represents the expected number of observations in cell (i, j ) if H0 is true.That is, if H0 is true, the number of observations in cell (i, j ) should be close to the ith sample size n i multiplied by the proportion C j /N.
Let α be the level of significance, i.e., the maximum probability of rejecting a true null hypothesis.

Figure 1a )
Figure 1a) The Azores Archipelago; b) epicentral map for 1915-2011; c) zoom of epicenters for the islands.

Figure 2 .
Figure 2. Gutenberg-Richter plot for the Azores region showing all catalog seismicity.

Fig. 2 .
Fig. 2. Gutenberg-Richter plot for the Azores region showing all catalog seismicity.

Figure 4 .
Figure 4. Histogram of the number of seismic events per year.

Fig. 4 .Figure 5 .
Fig. 4. Histogram of the number of seismic events per year.

Fig. 5 .
Fig. 5. Schematic representation of the seismic process of occurrences.
Figure 6a) Histogram of Dt; b) Histogram o

Figure 7a )
Figure 7a) Histogram of absolute frequencies of S; b) Histogram of S w0 .

Figure 9a )
Figure 9a) Schematic representation of the defined seismic zones; b) Morphological features of the study area.
Figure 10.A plot showing the number of seismic events in e

Fig. 10 .
Fig. 10.A plot showing the number of seismic events in each seismic zone.

Table 2 .
Statistics of the AS.

Table 3 .
Statistics of Dt and Dt 60 .

Table 4 .
Statistics of S and S w0 .

Oliveira: Seismic zones for Azores based on statistical criteria 2343
43a.

Table 6 .
The number of seismic events in each seismic zone.

Table 8 .
Statistics of S w0,i .

Table 9 .
Summary of results obtained in the chi-square test of time variable.

Table 10 .
Summary of results obtained in the Kolmogorov-Smirnov test for time.

Table 11 .
Summary of results applying the chi-square test for size.

Table 12 .
Summary of results obtained in Kolmogorov-Smirnov test for size.

Table 13 .
Summary of results applying the chi-square test for the seismic condition.

Table 14 .
Summary of results obtained in the Kolmogorov-Smirnov test for the seismic condition.

Table A1 .
Notation used in the chi-square test for r independent samples.