In this paper we introduce a method for fault network reconstruction based on the 3D spatial distribution of seismicity. One of the major drawbacks of statistical earthquake models is their inability to account for the highly anisotropic distribution of seismicity. Fault reconstruction has been proposed as a pattern recognition method aiming to extract this structural information from seismicity catalogs. Current methods start from simple large-scale models and gradually increase the complexity trying to explain the small-scale features. In contrast the method introduced here uses a bottom-up approach that relies on initial sampling of the small-scale features and reduction of this complexity by optimal local merging of substructures.

First, we describe the implementation of the method through illustrative synthetic examples. We then apply the method to the probabilistic absolute hypocenter catalog KaKiOS-16, which contains three decades of southern Californian seismicity. To reduce data size and increase computation efficiency, the new approach builds upon the previously introduced catalog condensation method that exploits the heterogeneity of the hypocenter uncertainties. We validate the obtained fault network through a pseudo prospective spatial forecast test and discuss possible improvements for future studies. The performance of the presented methodology attests to the importance of the non-linear techniques used to quantify location uncertainty information, which is a crucial input for the large-scale application of the method. We envision that the results of this study can be used to construct improved models for the spatiotemporal evolution of seismicity.

Owing to the continuing advances in instrumentation and improvement of
seismic networks coverage, earthquake detection magnitude thresholds have
been decreasing while the number of recorded events is increasing. As
governed by the Gutenberg–Richter law, the number of earthquakes above a
given magnitude increases exponentially as the magnitude is decreased
(Gutenberg and Richter, 1954; Ishimoto and Iida,
1939). Recent studies suggest that the Gutenberg–Richter law might hold down
to very small magnitudes corresponding to interatomic-scale dislocations
(Boettcher et
al., 2009; Kwiatek et al., 2010). This implies that there is practically no
upper limit on the amount of seismicity we can expect to record as our
instrumentation capabilities continue to improve. Although considerable
funding and research efforts are being channeled into recording seismicity,
when we look at the uses of the end product (i.e., seismic catalogs) we often
see that the vast majority of the data (i.e., events with small magnitudes)
are not used in the analyses. For instance, probabilistic seismic hazard
studies rely on catalogs containing events detected over long terms, which
increases the minimum magnitude that can be considered due to the higher
completeness magnitude levels in the past. Similarly, earthquake forecasting
models are commonly based on the complete part of the catalogs. For
instance, in their forecasting model, Helmstetter et
al. (2007) use only

In this conjecture, fault network reconstruction can be regarded as an effort to tap into this seemingly neglected but vast data source and extract information in the form of parametric spatial seismicity patterns. We are motivated by the ubiquitous observations that large earthquakes are followed by aftershocks that sample the main rupturing faults, and conversely that these faults become the focal structures of following large earthquakes. In other words, there is a relentless cycle as earthquakes occur on faults that themselves grow by accumulating earthquakes. By using each earthquake, no matter how big or small, as a spark in the dark, we aim to illuminate and reconstruct the underlying fault network. If the emerging structure is coherent, it should allow us to better forecast the spatial distribution of future seismicity and also to investigate possible interactions between its constituent segments.

The paper is structured as follows. First, we give an overview of recent developments in the field of fault network reconstruction and spatial modeling of seismicity. In Sect. 2, we describe our new clustering method and demonstrate its performance using a synthetic example. In Sect. 3, we apply the method to the recently relocated southern California catalog KaKiOS-16 (Kamer et al., 2017) and discuss the obtained fault network. In Sect. 4, we perform a pseudo-prospective forecasting test using 4 years of seismicity that was recorded during 2011–2015 and was not included in the KaKiOS-16 catalog. In the final section, we conclude with an outlook on future developments.

In the context of the work presented here, we use the term “fault” as a three-dimensional geometric shape or kernel optimized to fit observed earthquake hypocenters. Fault network reconstruction based on seismicity catalogs was introduced by Ouillon et al. (2008). The authors presented a dynamical clustering method based on fitting the hypocenter distribution with a plane, which is then iteratively split into an increasing number of subplanes to provide better fits by accounting for smaller-scale structural details. The method uses the overall location uncertainty as a lower bound of the fit residuals to avoid over-fitting. Wang et al. (2013) made further improvements by accounting for the individual location uncertainties of the events and introducing motivated quality evaluation criteria (based, for instance, on the agreement of the planes orientations with the event's focal mechanisms). Ouillon and Sornette (2011) proposed an alternative method based on probabilistic mixture modeling (Bishop, 2007) using 3D Gaussian kernels. This method introduced notable improvements, such as the use of an independent validation set to constrain the optimal number of kernels to explain the data (i.e., model complexity) and diagnostics based on nearest-neighbor tetrahedra volumes to eliminate singular clusters that cause the mixture likelihood to diverge. While our method is inspired by these studies and in several aspects builds upon their findings, we also note an inherent drawback of the iterative splitting approach that is common to all the previously mentioned methods. This can be observed when an additional plane (or kernel), introduced by splitting, fails to converge to the local clusters and is instead attracted to the regions of high horizontal variance (see Fig. 1 for an illustration in the case of Landers' seismicity).

Iterative splits on the 1992 Landers aftershock data.
Points with different colors represent seismicity associated with each
plane. Black dots show the center points of the planes resulting from the
next split. Notice how in steps

This deficiency has motivated us to pursue a different concept. Instead of starting with the simplest model (i.e., a single plane or kernel) and increasing the complexity progressively by iterative splits, we propose just the opposite: start at the highest possible complexity level (as many kernels as possible) and gradually converge to a simpler structure by iterative merging of the individual substructures. In this respect, the new approach can be regarded as a “bottom-up” while the previous ones are “top-down” approaches.

The method shares the basic principles of agglomerative clustering (Rokach and Maimon, 2005) with additional improvements to suit the specifics of seismic data, such as the strong anisotropy of the underlying fault segments. We illustrate the method by applying it to a synthetic dataset obtained by sampling hypocenters on a set of five plane segments, and potentially adding uncorrelated background points which are uniformly distributed in the volume (see Fig. 2). The implementation follows the successive steps described below:

For a given dataset featuring

In this equation,

Since our goal is to obtain a fault network where segments are modeled
by Gaussian kernels, we begin by estimating how many such kernels can be
constructed with the clusters featured in the AHC tree. At its most detailed
level (

Once we determine the holding capacity, all points that are not
associated with any Gaussian kernel are assigned to a uniform background
kernel that encloses the whole dataset. The boundaries of this kernel are
defined as the minimum bounding box of its points. The uniform spatial
density of this background kernel is defined as the number of points divided by
the volume (see Fig. 3). The Gaussian kernels
together with the uniform background kernel represent a mixture model where
each kernel has a contributing weight proportional to the number of points
that are associated with it (Bishop, 2007). This representation
facilitates the calculation of an overall likelihood and allows us to
compare models with different complexities using the Bayesian information
criteria (BIC) (Schwarz, 1978) given by

At the holding capacity, the representation with the large number of
kernels is likely to constitute an overfitting model for the dataset.
Therefore, we iteratively merge pairs of the Gaussian kernels until an
optimal balance between fitness and model complexity is reached. We use the
measure of information gain in terms of BIC to select which pair of kernels
to merge. For any given pair of Gaussian kernels, the BIC gain resulting
from their merger is calculated using Eq. (3),
where

Notice that each merging of a pair of kernels decreases

Using this formulation, we calculate a matrix where the value at the
intersection of the

The computational demand of the BIC gain matrix increases quadratically with
the number of data points. To make our approach feasible for large seismic
datasets, we introduce a preliminary check that considers clusters as
candidates for merging only if they are overlapping within a confidence
interval of

During all steps of the merging procedure, the data points are in the state
of

The final models obtained using the local

For this synthetic dataset, we observe that both the local and global criteria converge to a similar final structure. The global criterion yields a model with the same number of clusters as the input synthetic, while the local criterion introduces four additional clusters in the under-sampled part of one of the faults. For most pattern recognition applications that deal with a robust definition of noise and signal, the global criterion may be the preferred choice since it is able to recover the true complexity level. However, since this method is intended for natural seismicity, we also see a potential in the local criterion. For instance, consider the case where two fault segments close to each other are weakly active and thus have a low spatial density of hypocenters compared to other distant faults that are much more active. In that case, the global criterion may choose to merge the low-activity faults, while the local criterion may preserve them as separate.

In order to gain insight into the sensitivity and the robustness of the
proposed method, we conduct a more elaborate synthetic test. We generate a
set of 20 randomly oriented planes with their attributes varying in the
following ranges: strike angle

Clustering similarities between ground truth synthetic
dataset and method results quantified by the Rand index.

These synthetics indicate that the method is robust in the presence of uniform background noise and that it is able to recover structures that are sufficiently sampled. In the presented case, the performance saturates around 0.5 points per square kilometer; however, this value can change based on the particular setting. For instance, if faults are very closely spaced and intersecting, higher sampling may be needed. On the other hand, if the structures are isolated, similar performance can be achieved at lower sampling. The MATLAB code used for generating the synthetics and evaluating the reconstruction's Rand index is provided. Users may prefer to create synthetic cases that are informed by the properties of the actual data they are working on (such as numbers of points, spatial extent, etc.).

In this section, we apply our method to observed seismicity data. For this purpose, we use the KaKiOS-16 catalog (Kamer et al., 2017) that was obtained by the probabilistic absolute location of nearly 479 000 southern Californian events spanning the time period 1981–2011. We consider all events, regardless of magnitude, as each event samples some part of the fault network. Before tackling this vast dataset, however, we first consider the 1992 Landers sequence as a smaller dataset to assess the overall performance and computational demands.

We use the same dataset as Wang et al. (2013) that consists of 3360 aftershocks of the 1992 Landers earthquake. The initial atomization step produces a total of 394 proto-clusters that are iteratively merged using the two different criteria (local and global). The resulting fault networks are given in Fig. 6 together with the fault traces available in the community fault model of southern California (Plesch et al., 2007). Comparing the two fault networks, we observe that the local criterion provides a much detailed structure that is consistent with the large-scale features in the global one. We also observe that, in the southern end, the global criterion produces thick clusters by lumping together small features with seemingly different orientations. These small-scale features have relatively few points and thus low contribution to the overall likelihood. The global criterion favors these mergers to reduce the complexity penalty in Eq. (2), which scales with the total number of points. In the local case, however, because each merger is evaluated considering only the points assigned to the merging clusters, the likelihood gain of these small-scale features can overcome the penalty reduction and they remain unmerged. It is also possible to employ metrics based on the consistency of focal mechanism solutions to evaluate the reconstructed faults. For a detailed application of such metrics the reader is referred to the detailed work by Wang et al. (2013). In this study, since we do not have focal mechanism solutions for our target catalog, we focus on information criteria metrics and out-of-sample forecast tests.

Our second observation is that the background kernel attains a higher weight of 11 % using the local criterion compared to the global one yielding only 5 %. Keeping in mind that both criteria are applied on the same initial set of proto-clusters, and that there are no mergers with the background kernel, we argue that the difference between the background weights is due to density differences in the tails of the kernels. We investigate this in Fig. 7 for the simple 1D case considering mergers between two boxcar functions (analogous for planes in 3D) approximated with Gaussian functions. We observe that the merged Gaussian has higher densities in its tails compared to its constituents. The effect is amplified when the distance between the merging clusters is increased (Fig. 7b). Hence, in the local case, the peripheral points are more likely to be associated with the background kernel due to the lower densities at the tails of the small, unmerged clusters.

Two uniform distributions (dotted gray lines), their Gaussian approximations (solid gray lines) and the Gaussian resulting from their merger (solid black line). Notice that the joint Gaussian has higher densities at the tails compared to its constituents.

Another important insight from this sample case was regarding the
feasibility of a large-scale application. As pointed out here and in
previous studies (Ouillon and Sornette, 2011; Wang
et al., 2013), the computational demand for such pattern recognition methods
increases rapidly with the number of data points. The Landers case with
3360 points took

The condensation method reduces the effective catalog length by first
ranking the events according to their location uncertainty and then
successively condensing poorly located events onto better-located ones (for
detailed explanation see Kamer et al., 2015). The initial
formulation of the method was developed considering the state-of-the-art
catalogs of the time. Location uncertainties in these catalogs are assumed
to be normally distributed and hence expressed either in terms of a
horizontal and vertical standard deviation, or with a diagonal

Idealized schematic representations of three events with one, two and three Gaussian kernels each.

The KaKiOS-16 catalog contains 479 056 events whose location PDFs are represented by a total of 1 346 010 Gaussian components (i.e., kernels). Condensation reduces this number to 600 463 as weights from events with high variance are transferred to better-located ones. Nevertheless, in Fig. 9 we see that nearly half of these components amount to only 10 % of the total event weight. The computation time scales with the number of components, while the information content is proportional to number of events. Hence the large number of components amounting to a relatively low number of events would make the computation inefficient. A quick solution could be to take the components with the largest weights constituting 90 % or 95 % of the total mass, mimicking a confidence interval. Such a “solution” would depend on the arbitrary cut-off choice and would have the potential to discard data that may be of value for our application.

Cumulative weights of the 600 463 condensed KaKiOS-16 components representing a total of 479 056 events. The components are ranked according to increasing weights.

We can avoid such an arbitrary cut-off by employing the fact that the condensed catalog is essentially a Gaussian mixture model (GMM) representing the spatial PDF of earthquake occurrence in southern California. We can then, in the same vein as the hard clustering described previously, assign each event to its most likely GMM component (i.e., kernel). If we consider each event individually, the most likely kernel would be the one with the highest responsibility. However, for a globally optimal representation we need to find the best representative kernel for each event among all other kernels. To do this, we sample the original (uncondensed) PDF of each event with 1000 points and then calculate the likelihood of each sample point with respect to all the condensed kernels. The event is assigned to the kernel that provides the maximum likelihood for the highest number of sample points (see Fig. 8c, d). As a result of this procedure, the 479 056 events are assigned to 93 149 distinct kernels. The spatial distribution of all the initial condensed kernels is given in Fig. 10a, while the kernels assigned with at least one event after the hard clustering are shown in Fig. 10b. Essentially, this procedure can be viewed as using the condensed catalog as a prior for the individual event locations. The use of accumulated seismicity as a prior for focusing and relocation has been proposed by Jones and Stewart (1997) and investigated in detail by Li et al. (2016). We can see the effect of this strategy more clearly in Fig. 8, where starting from three different events in the catalog (Fig. 8a), we finally converge to only two different final locations (Fig. 8d).

In previous works, we concluded that the spatial distribution of southern California seismicity is multifractal, i.e., it is an inhomogeneous collection of singularities (Kamer et al., 2015, 2017). The spatial features in Fig. 10 can be seen as expressions of these singularities. Since we are interested in the general form of the fault network rather than the second-order features (e.g., inhomogeneous seismicity rates along the same fault) we consider all the centers of all 93 149 kernels as individual points, effectively disregarding their weights. Considering the weight of each kernel would result in more complex structure with singularities that can be associated with the fractal slip distribution of large events (Mai and Beroza, 2002) modulated through the non-uniform network detection capabilities. Thus, by disregarding the kernel weights we are considering only the potential loci of earthquakes, not their activity rates.

Another important aspect, in the case of such a large-scale application, is the uniform background kernel. The assumption of a single background kernel defined as the minimum bounding box of the entire dataset seems to be suitable for the case of Landers aftershocks; however, it becomes evident that for whole of southern California such a minimum bounding box would overestimate the data extent (covering aseismic offshore areas) and would thus lead to an underestimated density. In addition, one can also expect the background density to vary regionally in such large domains. We thus extend our approach by allowing for multiple uniform background kernels. For this purpose, we make use of the AHC tree that is already calculated for the atomization of the whole dataset. We then cut the tree at a level corresponding to only a few clusters (5 or 30 in the following application), which allows the original catalog to be divided into smaller subcatalogs represented by each cluster. Each of these subsets is then atomized individually yielding its own background kernel. The atomized subsets are then brought together, to be progressively merged. Naturally, we have no objective way of knowing how many background kernels a dataset may feature. However, in various synthetic tests, involving cuboid backgrounds with known densities, we observe that inflating this number has no effect on the recovered densities, whereas a value that is too low causes underestimation. Apart from this justification, we are motivated to divide this large dataset into subsets for purely computational reasons as this allows for improved parallelization and computational efficiency.

Fault network reconstructions for the KaKiOS-16 catalog.
Panels

Figure 11 shows the two fault networks obtained for
two different initial settings: using 5 and 30 subsets. For each choice, we
show the results of the local and global criterion; the background cuboids
are not plotted to avoid clutter. Our immediate observation is related to
the events associated with the 1986 Oceanside sequence
(Wesson and Nicholson, 1988) located at coordinates
(

At this point, it is natural to ask which of these fault networks is a better model? The answer to this question would depend on the application. If one is interested in the correspondence between the reconstructed faults and focal mechanisms, or high-resolution fault traces, which are expressions of local stress and strain conditions, then the ideal choice would be the local criterion. However, if the application of interest is an earthquake forecast covering the whole catalog domain, then one should consider the global criterion because it yields a lower BIC value, since it is formulated with respect to the overall likelihood. We leave the statistical investigation of the fault network parameters (e.g., fault length, dip, thickness distributions) as a subject for a separate study and instead focus on an immediate application of the obtained fault networks.

Several methods can be proposed for the validation of a reconstructed fault network. One way could be to project the faults on the surface and check their correspondence with the mapped fault traces. This would be a tedious task since it would involve a case-by-case qualitative analysis. Furthermore, many of the faults illuminated by recent seismicity might not have been mapped or they may simply have no surface expressions. In the case of the 2014 Napa earthquake, there was also a significant disparity between the spatial distribution of aftershocks and the observed surface trace (Brocher et al., 2015). Another option would be to compare the agreement between the reconstructed faults and the focal mechanisms of the events associated with them. With many of the metrics already developed (Wang et al., 2013), this would allow for a systematic evaluation. However, the current focal mechanisms catalog for southern California is based on the HYS-12 catalog (Hauksson et al., 2012; Yang et al., 2012) obtained by relative double-difference techniques. As previously discussed in our studies (Kamer et al., 2015, 2017), we have demonstrated that this catalog exhibits artificial clustering effects at different scales. Hence, any focal mechanism based on hypocenters from this relative location catalog would be inconsistent with the absolute locations of the KaKiOS-16 catalog.

Therefore we are left with the eventual option of validation by spatial
forecasting. For this purpose, we will use the global criterion model
obtained from 30 subsets because it has the lowest BIC value of the four
reconstructions presented above. Our fault reconstruction uses all events in
the KaKiOS-16 catalog, regardless of their magnitude. The last event in this
catalog occurred on 30 June 2011. For target events, we consider all
routinely located events by the southern California Earthquake Data Center
between 1 July 2011 and 1 July 2015 with magnitudes larger
than M2.5. We limit our volume of interest arbitrarily to the region limited
by latitudes [32.5, 36.0], longitudes [

Average negative log likelihood for the target dataset limited to events above M2.5 (light gray), M3.0 (dark gray) and M3.5 (black). Performance of the TripleS models is evaluated as a function of the isotropic kernel bandwidth (dotted lines). The fault network performance is plotted with constant level solid lines. The performance of a single uniformly dense cuboid is plotted with a dashed line.

The superiority of our model with respect to TripleS can be understood in
terms of model parameterization, i.e., model complexity. There is a general
misconception regarding the meaning of “model complexity” in the
earthquake forecasting community. The term is often used to express the
degree of conceptual convolution employed while deriving the model. For
instance, in their 2010 paper, Zechar and Jordan refer to the TripleS model
as “a simple model” compared to models employing anisotropic or adaptive
kernels (Helmstetter et al., 2007; Kagan and
Jackson, 1994). As a result, one might be inclined to believe that the model
obtained by fault reconstruction presented in this study is far more complex
than TripleS. However, it is important to note that the complexity of a
model is independent of the algorithmic procedures undertaken to obtain it.
What matters is the number of parameters that are needed to communicate it,
or in other words its minimum description length
(Rissanen, 1978; Schwarz, 1978). TripleS is essentially a
GMM model expressed by the 3D locations of its components and a constant
kernel bandwidth. Hence it has a total of (

Another contributing factor to the performance of the fault network can be regarded as the utilization of location uncertainty information that facilitates condensation. This has two consequences: (1) decreasing the overall spatial entropy and thus providing a clearer picture of the fault network, and (2) reducing the effect of repeated events occurring on each segment, thus providing a more even prior on all segments.

We presented an agglomerative clustering method for seismicity-based fault
network reconstruction. The method provides the following advantages: (1) a
bottom-up approach that explores all possible merger options at each step
and moves coherently towards a global optimum; (2) an optimized atomization
scheme to isolate the background (i.e., uncorrelated) points; (3) improved
computation performance due to geometrical merging constraints. We were able
to analyze a very large dataset consisting of 30 years of southern Californian
seismicity by utilizing the non-linear location uncertainties of the events
and condensing the catalog to

Notwithstanding these encouraging results, there are several aspects in
which the proposed methodology can be further improved and extended. In the
current formulation, the distinct background kernels are represented by the
minimum bounding box of each subset, so that they tend to overlap and bias
the overall background density. This can be improved by employing convex
hulls, alpha shapes (Edelsbrunner and Mücke, 1994) or a
Voronoi tessellation (Voronoi, 1908) optimized to match the
subset borders. The shape of the background kernel could also be adapted to
the specific application; for induced seismicity catalogs, it can be a
minimum bounding sphere or an isotropic Gaussian since the pressure field
diffuses more or less radially from the injection point
(Király-Proag et al., 2016). Different types
of proto-clusters such as Student

The reconstructed faults can facilitate other fault-related research by providing a systematic way to obtain planar structures from observed seismicity. For instance, analysis of static stress transfer can be aided by employing the reconstructed fault network to resolve the focal plane ambiguity (Nandan et al., 2016; Navas-Portella et al., 2020). Similarly, the orientation of each individual kernel can be used as a local prior to improve the performance of real-time rupture detectors (Böse et al., 2017). Studies relying on mapped fault traces to model rupture dynamics can be also extended using reconstructed fault networks that represent observed seismicity including its uncertainty (Wollherr et al., 2019).

An important implication of the reconstructed fault network is its potential in modeling the temporal evolution of seismicity. The epidemic-type aftershock sequence (ETAS) model can be simplified significantly in the presence of optimally defined Gaussian fault kernels. Rather than expressing the whole catalog sequence as the weighted combination of all previous events, we can instead coarsely grain the problem at the fault segment scale and have multiple sequences corresponding to each fault kernel, each of them being a combination of the activity on the other fault kernels. Such a formulation would eliminate the need for the commonly used isotropic distance in the ETAS kernels, as this single-degree kernel induces essentially the same deficiencies discussed in the case of the TripleS model. Thus, we can expect such an ETAS model, based on a fault network, to have significantly better forecasting performances compared to its isotropic variants.

The MATLAB implementation of the agglomerative fault reconstruction method and the synthetic tests can be downloaded from

All authors conceived and designed the research. YK wrote the paper with major contributions GO and DS. YK developed the computer codes.

The authors declare that they have no conflict of interest.

We would like to thank our two reviewers Leandro C. Gallo and Nadav Wetzler for their valuable comments and suggestions, which improved this paper considerably.

This paper was edited by Filippos Vallianatos and reviewed by Nadav Wetzler and Leandro C. Gallo.