Probabilistic earthquake forecasts estimate the likelihood of future earthquakes within a specified time-space-magnitude window and are important because they inform planning of hazard mitigation activities on different time scales. The spatial component of such forecasts, expressed as seismicity models, generally relies upon some combination of past event locations and underlying factors which might affect spatial intensity, such as strain rate, fault location and slip rate or past seismicity. For the first time, we extend previously reported spatial seismicity models, generated using the open source inlabru package, to time-independent earthquake forecasts using California as a case study. The inlabru approach allows the rapid evaluation of point process models which integrate different spatial datasets. We explore how well various candidate forecasts perform compared to observed activity over three contiguous 5-year time periods using the same training window for the input seismicity data. In each case we compare models constructed from both full and declustered earthquake catalogues. In doing this, we compare the use of synthetic catalogue forecasts to the more widely used grid-based approach of previous forecast testing experiments. The simulated catalogue approach uses the full model posteriors to create Bayesian earthquake forecasts, not just the mean. We show that simulated catalogue based forecasts perform better than the grid-based equivalents due to (a) their ability to capture more uncertainty in the model components and (b) the associated relaxation of the Poisson assumption in testing. We demonstrate that the inlabru models perform well overall over various time periods: The full catalogue models perform favourably in the first testing period (2006–2011) while the declustered catalogue models perform better in the 2011–2016 testing period, with both sets of models performing less well in the most recent (2016–2021) testing period. Together, these findings demonstrate a significant improvement in earthquake forecasting is possible although this has yet to be tested and proven in true prospective mode.

Probabilistic earthquake forecasts represent our best understanding of the expected occurrence of future seismicity

Forecasts can only be considered meaningful if they can be shown to demonstrate a degree of proficiency at describing what future seismicity might look like. The Regional Earthquake Likelihood Model

In this paper we construct and test a series of time-independent forecasts for California by building on the spatial modelling approach described by

We develop a series of spatial models of seismicity modelled by a time-independent log-Gaussian Cox process and fitted with inlabru. This process is summarized in the workflow in Fig.

The

The workflow for generating spatial seismicity models in inlabru, with functions shown on the right.

In

Input model covariates:

The inlabru models provide spatial intensity estimates which can be converted to spatial event rates by considering the time scales involved. Since the models we develop here are to be considered time-independent, we assume that the number of events expected in this time period is “scaleable” in a straightforward manner, consistent with a (temporally homogeneous) spatially varying Poisson process. However we know that the rate of observed events is not Poissonian due to observed spatiotemporal clustering

Pairwise comparison of models for full catalogue models. The top-right side of the plot shows differences in log median intensity and the lower left section shows the differences in model variances between the different models. The median log intensities for each model are shown on the diagonal. Models include combinations of smoothed past seismicity (MS), strain rate (SR), fault distance (FD) and fault slip rates (NK).

Pairwise comparison of models for declustered catalogue models. The top-right side of the plot shows differences in log median intensity and the lower left section shows the differences in model variances between the different models. The median log intensities for each model are shown on the diagonal. Models include combinations of smoothed past seismicity (MS), strain rate (SR), fault distance (FD) and fault slip rates (NK).

To extend this approach to a full forecast, we distribute magnitudes across the number of expected events according to a frequency-magnitude distribution. Given the small number of large events in the input training catalogue, a preference between a tapered Gutenberg-Richter (TGR) or standard Gutenberg-Richter magnitude distribution with a rate parameter

Schematic of the code for constructing grid-based (left) and simulated catalogue-based (right) earthquake forecasts given an inlabru LGCP intensity model. These represent step 7 of the workflow.

A schematic diagram showing how grid-based and catalogue-based approaches are applied is shown in Fig.

To test how well each forecast performs, we first test the consistency of the model forecasts, developed from data between 1985 and 2005, with observations from 3 subsequent and contiguous 5-year time periods, using standard CSEP tests for the number, spatial and magnitude distribution and conditional likelihood of each forecast. The original CSEP tests calculate a quantile score for the number (

The likelihood test compares the performance of individual cells within the forecast. The likelihood of the observation given the model is described by a Poisson likelihood function in each cell and the total joint likelihood described by the product over all bins. The quantile score measures if the joint log-likelihood over many simulations falls within the tail of the observed likelihoods, with the score defined by the fraction of simulated joint log-likelihoods less than or equal to the observed. The conditional likelihood or

In the new CSEP tests

In constructing the three models both with and without clustering, we can examine relative contributions of the model components given differences in spatial intensity resulting from short-term spatiotemporal clustering.
Table 1 shows the posterior mean component of the log intensity for each model both with and without clustering for

Figure

Posterior means of model components and number of expected events for full and declustered (DC) models.

We now test the models using the pyCSEP package for python

Quantile scores for CSEP tests. Upper bounds for

We first compare the performance of our 5-year forecasts, developed with a training window of 1985–2005, over the testing period 1 January 2006–1 January 2011 with the

Grid-based forecast tests for all forecasts for three 5-year time periods: 2006–2011 (top), 2011–2016 (middle) and 2016–2021(bottom). The bars represent the 95 % confidence interval derived from simulated likelihoods from the forecast, while the symbol represents the observed likelihood for observed events. The green square identifies that a model has passed the test and a red circle indicates inconsistency between forecast and observation. The forecasts are compared to both the full (Helmstetter aftershock) and declustered models of

We then repeat the tests for two additional 5-year periods of California earthquakes illustrated in Fig.

In the 2016–2021 period (Fig.

T-test results for the inlabru models showing information gain per earthquake relative to the full

These statistical tests (

Our second stage of testing uses simulated catalogues in order to make use of the newer CSEP tests

We expect the grid-based and simulated catalogue approaches to have similar results in terms of the magnitude (

We might expect the most noticeable differences to occur in the spatial test, because it measures the spatial component consistency with observed events and because we are now using the full posterior distribution of spatial components, and therefore potentially allowing more variation in the observed spatial models. The middle panel of Fig.

Similar to the grid-based examples, for the 2006–2011 period (red star indicator) the spatial performance of the SRMS and FDSRMS models is better when the full, rather than declustered catalogue, has been used in model construction.

All of the models pass the

Finally, the pseudo-likelihood test (Fig.

Top: Catalogue of events in California from 1985–2021. The period 1985–2005 is used for model construction, and the three testing periods are shown with red, blue and green backgrounds. The left panel shows the magnitude of events in time and the right the number of events in each year. Bottom: the comcat catalogues for the three 5-year testing intervals.

While the full catalogue models performed well in the tests for the first 5-year time window, the other two sets of test results were less promising. This can be largely explained by the number of events that occurred in the 10-year period from 2006–2016 (red and blue backgrounds in Fig.

We did not filter for main shocks in the observed events, so we might expect the

The declustered models pass spatial tests more often than the full catalogue models because they are less affected by recent clustering, and perhaps benefit from being smoother overall than the full catalogue models (Figs.

We conclude that neither a full nor declustered catalogue necessarily gives a better estimate of the future number of events in any 5-year time period, although the declustered models tend to perform better spatially, and may be more suitable for longer term forecasting. Given that different declustering methods may retain different specific events and different total numbers of events, different declustering approaches may lead to significant differences in model performances, especially in time periods with a small number of events in the full catalogue. To truly discriminate between which approach is best, a much longer testing time frame would be needed to ensure a suitably large number of events.

In general, the simulated catalogue-based forecasts were more likely to pass the tests than the gridded models. This is most obvious in the first testing period, when the simulated catalogue-based models based on the full catalogue passed all tests and those for the declustered catalogues only failed due to the smaller expected number of events. Similarly, in the most recent testing period (2016–2021) the simulated catalogue forecasts are able to just pass the

The simulated catalogue approach allows us to consider more aspects of the uncertainty in our model. For example, we could further improve upon this by considering potential variation in the

The main limitation of the work presented here, and many other forecast methodologies, is how aftershock events are handled. Our choice of (a relatively high) magnitude threshold for modelling may have also benefited the full model by ignoring many small magnitude events that would be removed by a formal declustering procedure. The real solution to this is to formally model the clustering process.

The approach presented here strongly conforms with current practice. In time-independent forecasting and PSHA, catalogues are routinely declustered to be consistent with Poisson occurrence assumptions. Operational forecasting already relies heavily on models, such as the epidemic type aftershock sequence model (ETAS,

One way to handle these conflicts is to model the seismicity formally as a Hawkes process, where the uncertainty in the tradeoff between the background and clustered components is explicit and can be formally accounted for. In future work we will modify the workflow of Fig. 1 to test the hypothesis that this approach will improve the ability for inlabru to forecast using both time-independent and time-dependent models.

We have demonstrated the first extension of spatial inlabru intensity models for seismicity to fully time-independent models, created using both classical uniform grids and fully Bayesian catalogue-type forecasts that make use of full model posteriors. We demonstrate that the inlabru models perform well in pseudo-prospective testing mode, passing the standard CSEP tests and performing favourably in competition with existing time-independent CSEP models over the 2006–2011 period. Forecasts constructed using a declustered catalogue as input performed less well in terms of the number of expected events, but nevertheless described spatial seismicity well even where the testing catalogue had not been declustered, and the declustered models performed better than the full catalogue models in the 2011–2016 testing period. Further testing on longer time scales would be necessary to assess if full or declustered catalogues provide a better estimate of the number of expected events on the time scales examined here. In the most recent testing period, i.e. the one with the longest time lag between the learning and the testing phase, neither full or declustered catalogue models perform well, suggesting a possible degree of memory loss over a decadal time scale in both clustered and declustered seismicity. Simulated catalogue forecasts that make use of full model posteriors passed consistency tests more often than the grid-based equivalents, most likely due to their ability to account for uncertainty in the model itself, including test metrics that do not rely on the Poisson assumption. This demonstrates the potential of fully Bayesian earthquake forecasts that include spatial covariates to improve upon existing forecasting approaches.

The code and data required to produce all of the results in this paper, including figures, can be downloaded from

An earlier version of this paper tests models constructed with data from 1984–2004. These results can be found in the Supplement. The supplement related to this article is available online at:

KB developed the methodology, carried out the formal analysis and interpretation, and wrote the first draft of the paper. FK contributed significantly to visualization, particularly development of Fig. 1. MN and IM contributed to the conceptual design, the interpretation of the results, and the writing of the paper. All authors contributed to paper review and drafting.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Francesco Serafini and Finn Lindgren for helpful discussions and suggestions. We thank Paolo Gasperini and an anonymous reviewer for helpful and constructive comments.

This research has been supported by the European Commission, Horizon 2020 Framework Programme (RISE; grant no. 821115). Farnaz Kamranzad was jointly funded through the Tomorrow’s Cities GCRF Hub (grant no. NE/S009000/1) and the School of GeoSciences internal funding at the University of Edinburgh.

This paper was edited by Oded Katz and reviewed by Paolo Gasperini and one anonymous referee.