Ground-motion correlation models play a crucial role in regional seismic risk modeling of spatially distributed built infrastructure. Such models predict the correlation between ground-motion amplitudes at pairs of sites, typically as a function of their spatial proximity. Data from physics-based simulators and event-to-event variability in empirically derived model parameters suggest that spatial correlation is additionally affected by path and site effects. Yet, identifying these effects has been difficult due to scarce data and a lack of modeling and assessment approaches to consider more complex correlation predictions. To address this gap, we propose a novel correlation model that accounts for path and site effects via a modified functional form. To quantify the estimation uncertainty, we perform Bayesian inference for model parameter estimation. The derived model outperforms traditional isotropic models in terms of the predictive accuracy for training and testing data sets. We show that the previously found event-to-event variability in model parameters may be explained by the lack of accounting for path and site effects. Finally, we examine implications of the newly proposed model for regional seismic risk simulations.

Earthquakes can cause widespread damage to the built environment, exposing its users to severe and potentially long-lasting societal stress. Analyzing earthquake-induced consequences is key to enhancing efficient and targeted seismic risk management strategies. Empirical ground-motion models (GMMs) are widely used for the prediction of earthquake-induced ground-motion intensity measures (IMs) at individual sites. The assessment of consequences to spatially distributed systems, such as the residential building stock of an urban area or its road network, additionally requires spatial correlation models to characterize the dependency among IMs at different sites

A predictive spatial correlation model consists of a

To alleviate the scarcity of data, some researchers pooled data from multiple earthquakes and assumed that the same correlation model parameters apply to different events

Identifying explanatory factors by estimating correlation models from data of individual earthquakes is challenging. First, the comparison of a single model parameter estimate per event with a single metric describing a certain aspect of the region the event was recorded in (such as the heterogeneity of soil conditions) suffers from scarcity of data. Estimation of event-specific correlation model parameters requires data from particularly well-recorded events, of which there are only a few. Second, the use of an isotropic model and the condensation to a single parameter estimate per event may hide path and site effects that are present within the event data. Third, the estimated model parameters are subject to varying degrees of estimation uncertainty because the underlying data sets stem from earthquakes that were recorded by a different number and layout of seismic network stations.

This study explores novel correlation models that, in addition to spatial proximity, also account for path and site effects. In contrast to previous studies, we do so by modifying and extending the functional form and the dependent variables of the correlation models. The increased complexity of these models calls for a consistent quantification of the inherent estimation uncertainty, thus complicating the use of conventional geo-statistical curve-fitting techniques. To address this, we use Bayesian inference to estimate the model parameters. While Bayesian inference has been proposed for GMMs in the past

We present the proposed correlation model in Sect.

This study on spatial correlation models builds on empirically derived GMMs that predict a ground-motion IM at site

The joint distribution of the same ground-motion IM at

The GMMs and correlation models considered in the present study are ergodic. As such their predictions do not depend on the absolute rupture and site locations but only on their relative positioning (e.g., via a certain source-to-site distance). It is noted that the parameters of some GMMs vary between broadly defined regions (e.g., California and Japan). This is also true for the GMM of

A natural first assumption is that the correlation between sites decreases as the Euclidean distance between them increases. If the (projected) Cartesian coordinates of two sites

Besides the Euclidean distance between two sites, their correlation may also depend on their position relative to the earthquake rupture (due to arriving waves potentially traveling similar propagation paths). In this study we use the epicentral azimuth

Path effects in the context of correlation models for ergodic GMMs imply stronger correlation between sites that share a similar wave propagation path. This is different from non-ergodic models where one tries to identify systematic and repeatable path effects in areas where multiple events have been recorded by the same seismic network. In the latter context, correlation functions are used to establish probabilistic links between stations in the seismic network in order to estimate the systematic path effects. While these functions are typically defined for Euclidean distance and have a similar functional form as Eq. (

Illustration of the proposed spatial correlation models: distance and dissimilarity metrics (dependent variables)

We account for site effects via measuring dissimilarities in local soil conditions following the premise that sites with similar soil conditions have stronger correlations. We use

For model EAS, we explored several combinations of the individual models E, A, and S. Compared to the model defined in Eq. (

We note that the herein proposed models focus on correlation of within-event residuals for a single IM at multiple, spatially distributed sites. In future studies, the models may be extended to the case of multiple IMs, for example, through the use of the linear model of co-regionalization as shown in

We follow a Bayesian approach to estimate the parameters of the correlation models

We consider ground-motion IM data from the NGA-West2 database

The data set obtained by pooling data from all

For the pooled training data set for Sa(1 s): number of station pairs in joint bins of Euclidean and angular distance in

For event

We chose weakly informative prior distributions for the parameters based on guidance provided in

Prior distributions for the parameters

The posterior distribution,

In the following we discuss the estimated parameters by first focusing on isotropic models E that are derived from data

Event-specific correlation models have parameters estimated from data of an individual event. Figure

The results shown in Fig.

Isotropic model E estimated separately for Sa(1 s) and for three events with increasing number of records:

Pooled models are derived by combining data from multiple individual earthquakes. In contrast to event-specific models, we use the same parameters to describe correlations of data from all events.

Table

For models EAS and E, Fig.

Parameters for the different correlation models estimated from the pooled training data set for Sa(1 s). Stated quantities are the mean and the 5 % and 95 % quantiles from the posterior samples.

Posterior correlation models EAS and E for Sa(1 s) as a function of Euclidean distance and soil dissimilarity plotted at three angular distances:

Table

We next evaluate the three models in terms of their predictive accuracy on test data from either an individual event or multiple events. Given the posterior parameters of model M inferred from training data,

As stated in the Introduction, previous studies found that the parameters of an isotropic correlation model estimated from data of different events vary from earthquake to earthquake. To examine this event-to-event variability, we compute for each event

To illustrate the aforementioned LPPD metrics, we first use 125 records from the Hector Mine earthquake in Fig.

We see that the event-specific model E has the largest variance in log-likelihood values, as the model parameters are more uncertain due to limited data, and some sampled parameter values give low probability of observing the data (i.e., low log-likelihoods). For most realizations, however, the event-specific model E outperforms the pooled model E for the given event, as would be expected. By computing the LPPD metric using Eq. (

Log posterior predictive density (LPPD) for data from the Hector Mine earthquake event of models E and EAS estimated from the pooled data set, as well as model E with parameters estimated only from data of that event. The histograms show the log-likelihood of the data conditional on samples from the posterior parameters,

Figure

Figure

On the other hand, Fig.

Relative difference in log posterior predictive density (LPPD) of pooled models E

Whereas the previous results considered within-event residuals of Sa(1 s), this section expands the discussion to different periods

Figure

For within-event residuals of Sa

We use recorded ground-motion data from the 2019 Ridgecrest, California, earthquake sequence

The out-of-sample performance is first assessed via the LPPD of the test set

In- and out-of-sample performance in terms of relative difference (Eq.

For the three events in the second test data set,

For the three earthquakes in the second test set from the 2019 Ridgecrest sequence: the spatial distribution of scaled within-event residuals

Figure

Next, the four subregions are gridded into individual sites with a spacing of 3 arcsec (approximately 90 m). The quantity of interest is the proportion of sites,

The rupture scenario is taken from the UCERF2 earthquake rupture forecast

Map of the case study area used for regional risk assessment and the considered M6.25 rupture on the Hayward fault:

Note that the above process requires realizations of

Exceedance probability curves for the proportion of sites

We used the posterior mean parameters of model E and model EAS to compute exceedance probability curves for the proportion of sites within the different subregions where Sa(1 s) jointly exceeds the 10 % exceedance probability thresholds. Figure

In subregions with differing epicentral azimuth values and heterogeneous soil conditions (top row), the isotropic model E predicts stronger correlations and thus heavier tailed distributions (i.e., higher probabilities of jointly exceeding the threshold value at a high proportion of sites). This is especially apparent if subregions one and two are combined (Fig.

Exceedance probability curves for the proportion of sites

Figure

This study explored the role of spatial proximity, local site effects, and path effects on spatial correlations of recorded ground-motion intensity measures. The motivation for this work came from the substantial event-to-event variability found in the correlation model parameters estimated in previous studies, as well as questions as to whether such variability was due to event-specific characteristics or due to model and estimation uncertainty. Site and path effects are qualitative contributors to spatial correlations but were not captured by the isotropic correlation models used in previous studies: thus, our focus is on the path and site effects to explain the observed model parameter variability.

We proposed a novel correlation model, EAS, that accounts for path and site effects in addition to spatial proximity. The EAS model assigns decreasing correlation coefficients for sites with increasing Euclidean distance, increasing angular distance, and increasing soil dissimilarity. These three model components reflect the role of spatial proximity, path effects, and site effects, respectively, on spatial ground-motion correlations. Compared to an isotropic model, the proposed model has increased complexity and more parameters (five instead of one or two for the isotropic model). To account for this increase in model complexity, we employ Bayesian inference to estimate the parameters, and we assume that the same parameters describe the correlation for all events in the considered ground-motion database (i.e., a pooled model).

For each event in the NGA-West2 training data set, we then computed the predictive accuracy of the proposed EAS model, as well as of two isotropic models E, where the parameters of one model were estimated from the pooled data set, and the others exclusively consider data from that specific event. For most events, we found that the event-specific models E have higher predictive accuracy then the pooled model E, thus confirming the presence of some event-to-event variability in correlation model parameters. However, the pooled model EAS outperforms the event-specific models E for the majority of events and, especially, for the well-recorded events. This indicates that the event-to-event variability in estimated isotropic model parameters found in previous studies is an apparent variability due to estimation uncertainty and the lack of accounting for site and path effects rather than a true variability. Data from the 2019 Ridgecrest earthquake sequence were then used to compare the models in terms of their out-of-sample performance. The results showed a higher predictive accuracy for model EAS compared to the isotropic model E, further highlighting the benefit of accounting for site and path effects in correlation models.

We then used a case study to explore the implications of using the different correlation models for regional seismic risk simulations. First, we found that generating correlated ground-motion samples using the mean values from the posterior distribution of each parameter instead of sampling from the posterior predictive distribution produces ground-motion fields with practically equivalent distributions. This is helpful because it is much less computationally expensive to use mean parameter values. Second, we saw that the isotropic model E predicts substantially stronger correlations than model EAS in regions with heterogeneous soil conditions and varying epicentral azimuths. This may lead to an overestimation of regional seismic risk tails (low-probability, high-consequence events), particularly in regions located close to the earthquake source.

The proposed model and analysis could benefit from some further study. This could include a refined model parameterization to consider an azimuth metric that accounts for finite-fault effects or to consider other metrics of dissimilarity in site conditions. The refined EAS model could also be tested on more complex risk analysis problems to further understand the practical impact of these refinements. Despite those opportunities for further study, the proposed EAS model form and the proposed techniques for evaluating model performance should be of general use for analysts interested in studying and improving the prediction of spatial correlations in ground motions.

Parameters for correlation model EAS estimated for Sa at nine periods from the pooled training data set with indicated number of records,

This appendix provides additional results for the pooled model EAS. Table

Posterior correlation models EAS and E for Sa(0.3 s)

Figure

Exceedance probability curves for the proportion of sites

The code for Bayesian inference, the post-processing, and the case study application is available at

LB: conceptualization, methodology, software, formal analysis, investigation, data curation, visualization, writing – original draft preparation. JWB: conceptualization, supervision, writing – review and editing. BS: resources, project administration, funding acquisition, writing – review and editing.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Chenying Liu and Jorge Macedo for their feedback and help with the 2019 Ridgecrest sequence data set, as well as Brendon Bradley for his feedback on an early draft of this article. We also thank three anonymous reviewers for helpful advice that improved the quality of this article. The calculations were run on the Euler cluster of ETH Zürich. The first author gratefully acknowledges support from the ETH Risk Center (grant 2018-FE-213) and the ETH Doc.Mobility fellowship.

This research has been supported by the ETH Risk Center (grant no. 2018-FE-213).

This paper was edited by Maria Ana Baptista and reviewed by three anonymous referees.