Consistency of seismic hazard estimates from a physics-based earthquake simulator: a case study in south-eastern Spain

Gómez-Novell, Octavi; Visini, Francesco; Herrero-Barbero, Paula; Álvarez-Gómez, José A.; García-Mayordomo, Julián

doi:10.5194/nhess-26-2691-2026

Articles | Volume 26, issue 6

https://doi.org/10.5194/nhess-26-2691-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/nhess-26-2691-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 26, issue 6

Research article

|

10 Jun 2026

Research article |

| 10 Jun 2026

Consistency of seismic hazard estimates from a physics-based earthquake simulator: a case study in south-eastern Spain

Octavi Gómez-Novell, Francesco Visini, Paula Herrero-Barbero, José A. Álvarez-Gómez, and Julián García-Mayordomo

Download

Final revised paper (published on 10 Jun 2026)
Supplement to the final revised paper
Preprint (discussion started on 22 Dec 2025)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-5485', Anonymous Referee #1, 07 Jan 2026

This manuscript addresses an important topic in seismic hazard assessment: the integration of physics-based earthquake simulators into probabilistic seismic hazard analysis (PSHA) frameworks. The work demonstrates that properly calibrated physics-based models can provide consistent hazard estimates in low-to-moderate strain regions, which represents a valuable contribution to the field. However, some issues require clarification.

MAJOR COMMENTS

Lines 220-235: The paper presents a consistency testing framework, but there are opportunities to employ more rigorous statistical validation methods. In statistical practice, methods exist to estimate out-of-sample performance and predictive power of models, such as cross-validation. While applying such methods to an entire PSHA model may be technically challenging, components of the PSHA (particularly the occurrence models) can be tested using these approaches. Since this paper focuses on physics-based synthetic catalogs, testing the catalog or the occurrence models based on the synthetic catalogs before integrating them into the full PSHA would strengthen the methodology significantly. If this is out of the scope of your work, or if that would lead to a lengthy article, please discuss this briefly and state these issues should be tested in a companion report or paper. If such tests have been conducted in other studies, they should be cited here. Additionally, the adopted testing procedure evaluates the joint model (occurrence rates combined with GMPE), which should be explicitly stated and briefly discussed.

Lines 250-254: Some clarifications are needed to rule out methodological inconsistency. The synthetic catalogs are tested against the stationary Poisson process hypothesis, which is typically used in PSHA for declustered events (mainshocks only). The manuscript suggests that there may be some foreshocks and aftershocks in the synthetic catalogs, which do not significantly affect the PSHA results, which is acceptable. However, readers may be confused and think the paper performs PSHA for full non-declustered seismicity. This distinction must be clarified explicitly. Please discuss briefly whether the presence of clustering in the catalogs affects the interpretation of results and whether a sensitivity analysis excluding clustered events would alter conclusions.

Line 208: The authors use "Classical PSHA" in OpenQuake, but the OpenQuake Engine also allows event-based PSHA using synthetic catalogs directly. This approach would avoid fitting occurrence laws, a procedure that potentially loses information contained in the catalogs. Moreover, OpenQuake allows scenario calculations that could be run for every event in the synthetic catalogs to calculate exceedances directly. These alternatives should be discussed, including why the classical approach was chosen over these potentially more direct methods. This discussion would clarify whether methodological choices limit the ability to fully leverage the physics-based catalog information.

Line 137: While the authors state that Cat-21 and Cat-18 represent the best and worst performing catalogs based on benchmarking, the rationale for explicitly including the worst-performing catalogue requires explanation. What insights does Cat-18 provide that justify its inclusion in the hazard analysis? Is it meant to demonstrate the importance of proper model selection, or does it represent a plausible epistemic uncertainty branch? This clarification would help readers understand whether poor-performing models should ever be included in operational hazard assessments or logic trees.

Line 23 (Abstract) and throughout: The manuscript advocates for combining physics-based and traditional approaches, describing this as "complementarity." This concept is well-established across many scientific fields as hybrid modeling, which systematically combines physics-based models with data-driven approaches. The authors should acknowledge this broader context and cite relevant literature on hybrid models in seismic hazard or related fields (e.g., hydrology, climate science). This would strengthen the theoretical foundation and help position the work within established methodological frameworks.

MINOR COMMENTS

ABSTRACT

Line 21: The phrase "both the lower-performing simulation and" could be omitted for brevity without loss of meaning.

Line 22: The term "reliable" should be verified to ensure it is fully supported by the results presented. Given the consistency testing (rather than validation) performed for some components, this wording may overstate the conclusions.

1. INTRODUCTION

Line 28: The statement that PSHA was "formalized by Cornell (1968)" is not 100% fair. Please see McGuire (2008), Probabilistic seismic hazard analysis: Early history, Earthquake Engng Struct. Dyn. 2008; 37:329–338. DOI: 10.1002/eqe.765

Line 86: The reference to Ellingwood and Wen (2005) does not support the statement about high-impact, low-probability events as written. This citation should be removed or replaced with more appropriate references.

End of Introduction: A paragraph should be added that clearly states the objectives of this paper and provides a roadmap of the manuscript structure. This would help readers understand the overall contribution and organization.

2. DATA AND METHODS

Line 175: Please clarify whether each rupture in the catalogs has only a single occurrence, or whether repeated similar ruptures can occur.

Line 262: The statement that macroseismic records at close distances are mostly related to faults requires explanation.

Line 290: The statement that PGV "is the most likely linked to damage" is too strong and not generally true. Please see the literature on seismic fragility curves, e.g.
Luco N., Cornell C.A. (2007) Structure-Specific Scalar Intensity Measures for Near-Source and Ordinary Earthquake Ground Motions, Earthquake Spectra, Volume 23, No. 2, pages 357–392. https://doi.org/10.1193/1.2723158

Figure 5c: The jump in the curve at the last point for intensity MI=12 requires explanation. Is this a computational artifact, a feature of the GMICE, or physically meaningful?

Line 324: Please provide the justification for taking the average p-value across the four cases and then computing its logarithm.

3. RESULTS

Line 340: The choice of 2% probability of exceedance in 50 years should be justified. While this is a valid choice, the most common selection for design purposes is 10% in 50 years. Was this choice made for specific reasons related to the EBSZ, or to match existing hazard maps? This should be stated explicitly.

Figure 6: Please provide specific commentary on the differences between the hazard maps in the area of the city of Vera.

Figure 7: The current presentation makes comparison difficult. It would be more effective to have one subplot per city/station showing all three hazard curves (Cat-21, Cat-18, and area source) overlaid. This would facilitate direct comparison of model performance.

Line 455: The summing of LogP values to rank models requires justification.

EDITORIAL COMMENTS

Line 234: The term "consistency check" is introduced but not formally defined until later in the text. Provide a brief definition at first use.

Line 590: "areas model" should be "area source model" throughout.

Citation: https://doi.org/10.5194/egusphere-2025-5485-RC1
- AC1: 'Reply on RC1', Octavi Gomez-Novell, 11 Mar 2026
  
  Dear reviewer,
  We sincerely thank you for taking the time to review our manuscript. Please find attached a document containing our detailed responses to each of your comments.
  Kind regards,
  
  Octavi Gómez Novell, on behalf of all authors
  
  Citation: https://doi.org/10.5194/egusphere-2025-5485-AC1
RC2:
'Comment on egusphere-2025-5485', Anonymous Referee #2, 09 Jan 2026

Overall, the manuscript is well written and technically sound, addressing an important and timely topic on the integration and evaluation of physics-based earthquake simulators within a PSHA framework. The methodology is rigorous, the workflow is clearly designed, and the consistency tests against both macroseismic and instrumental observations are carefully implemented, making the study suitable for publication after revision. That said, the introduction would benefit from a clearer hierarchical structure and stronger logical progression, particularly by more explicitly distinguishing the limitations of traditional PSHA, the advantages of physics-based approaches, and the specific scientific gap this work fills, as well as by summarizing the main objectives and novel contributions more concisely toward the end of the section. In addition, while the discussion is generally solid, its depth and breadth could be further enhanced by providing a more integrative synthesis of the results, clarifying the broader implications for seismic hazard practice in low-to-moderate strain regions, and more explicitly discussing the transferability of the proposed workflow to other tectonic settings and its role within future hybrid or logic-tree PSHA frameworks. Addressing these points would improve the clarity, balance, and overall impact of the manuscript without requiring major changes to the core analysis or results.

Citation: https://doi.org/10.5194/egusphere-2025-5485-RC2
- AC2: 'Reply on RC2', Octavi Gomez-Novell, 11 Mar 2026
  
  Dear reviewer,
  We sincerely thank you for taking the time to review our manuscript. Please find attached a document containing our detailed responses to each of your comments.
  Kind regards,
  
  Octavi Gómez Novell, on behalf of all authors
  
  Citation: https://doi.org/10.5194/egusphere-2025-5485-AC2
RC3:
'Comment on egusphere-2025-5485', Anonymous Referee #3, 08 Feb 2026

The paper aims to test seismic hazard estimates derived from earthquake simulators, against historical intensity data and instrumental ground shaking recordings. This is an interesting attempt to connect with observational constraints and deserves publication. I offer a few questions and comments below to help improve the manuscript.
1. The description of the models could be improved. In table 1 cat – 21 the initial normal stress is stated as 20 MPa per kilometer, but it is not mentioned that this is a vertical gradient, which only became clear from looking at the earlier paper references [Herrero-Barbero et al., 2021; Gomez-Novell et al., 2025]. Also, how can this low normal stress in the shallow layers be reconciled with the large initial shear stress of 60 MPa?
2. I do not understand what the role of the initial shear stress plays in the simulation, once spin-up has been achieved, other than changing the specific sequences. Why is this a relevant parameter from a statistical point of view?
3. The stated loading suggest there would be uniform slip across the sections. Under backslip, this would be expected to nucleate a lot of events off the fault boundaries. Looking at the earlier references, I did not see any indications of where events are nucleating with depth and along-strike. Further, Figure 4 in [Gomez-Novello et al. 2025] suggest there are different slip patterns accumulating in different models. That would be unexpected in a steady state backslip. So these are either incomplete not long-term steady-state catalogs, or something else is going on. It would be good to clarify this all. One option is to do hybrid loading which uses regularized stressing rates to achieve self organizing slip rates, which can then be used in backslip mode. At a minimum, clarity in what is being used here, and plots of hypocenters would be useful.
4. There is a difficulty in trying to use the simulator models to compare against short time scale observations (less than large event repeat time), in that these shorter times scale features then become dominated by the small events. The problem there is that the small events tend to be dominated by smaller scale geometrical features which we know we don’t know. In contrast, for longtime scales, the largest events dominate and there is more hope that large scale geometrical features play a more relevant role there. Note that in [Shaw et al. 2018] the hazard differences between the simulator and UCERF3 estimates in California were small at century time scales, but grew rapidly at times scales of a few decades and shorter where small events dominate the expected shaking. In low strain rate regions, this effect is only going to be exacerbated. The authors do speak to this in their comments regarding the utility of combining the simulators to go after longer term hazard, with the area sources to complement this. But a fuller discussion of what may be possible in these comparisons with this type of data over available and foreseeable time scales would be helpful. For measures which are going to be dominated by small events, hypocentral distributions are of additional importance due to ground motion model sensitivity to closest distances. This circles back to the previous comment, so some discussion of small event hypocenters along-strike, and with depth, is needed.
5. Line 490 please be more specific on what is meant by the results support the simulators capturing not just large scale hazard patterns but also localized differences seen in empirical data. What is this referring to exactly? How robust are those results?

Citation: https://doi.org/10.5194/egusphere-2025-5485-RC3
- AC3: 'Reply on RC3', Octavi Gomez-Novell, 11 Mar 2026
  
  Dear reviewer,
  We sincerely thank you for taking the time to review our manuscript. Please find attached a document containing our detailed responses to each of your comments.
  Kind regards,
  
  Octavi Gómez Novell, on behalf of all authors
  
  Citation: https://doi.org/10.5194/egusphere-2025-5485-AC3

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (review by editor) (23 Mar 2026) by Veronica Pazzi

AR by Octavi Gomez-Novell on behalf of the Authors (11 Apr 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (05 May 2026) by Veronica Pazzi

AR by Octavi Gomez-Novell on behalf of the Authors (12 May 2026)

Download

Article (11329 KB)
Full-text XML

Short summary

Evaluating seismic hazard requires past earthquake observations to perform accurate forecasts. Physics-based earthquake cycle simulators are algorithms that model long-term earthquake sequences on faults, overcoming completeness limitations of observations. We test the performance of physics-based seismic hazard assessments in comparison with traditional approaches in Spain. The physics-based approach yields more accurate forecasts, highlighting the potential of simulators for seismic hazard.