Long-term multi-hazard and risk assessments are produced by combining many hazard-model simulations, each using a slightly different set of inputs to cover the uncertainty space. While most input parameters for these models are relatively well constrained, atmospheric parameters remain problematic unless working on very short timescales (hours to days). Precipitation is a key trigger for many natural hazards including floods, landslides, and lahars. This work presents a stochastic weather model that takes openly available ERA5-Land data and produces long-term, spatially varying precipitation data that mimic the statistical dimensions of real data. This allows precipitation to be robustly included in hazard-model simulations. A working example is provided using 1981–2020 ERA5-Land data for the Rangitāiki–Tarawera catchment, Te Moana-a-Toi / Bay of Plenty, New Zealand.

Natural hazard and risk assessments are probabilistic by necessity. They must incorporate the intrinsic variability in natural systems and the large number of unknown (but often data-constrained) input parameters. To produce such assessments, many model simulations are run by sampling from a distribution for these parameters. The outputs are then combined (often overlaid in a spatial context) to calculate hazard likelihoods across an area and/or to produce risk maps, key for communicating hazards (Thompson et al., 2015; Hyman et al., 2019). The spatial extent of such hazards is key to such assessments, as is a robust approach to simulation design. Precipitation is causally linked to many natural hazards including floods, landslides, and lahars (Gill and Malamud, 2016). While several stochastic weather models exist in the published literature, they either require detailed local rainfall information – which is rare over long timescales (Zhao et al., 2019; Muñoz-Sabater et al., 2021) – or are run for a single spatial reference point – which is insufficient for many hazard models (e.g. floods – Arnaud et al., 2002; landslides – Gao et al., 2017). The model provided here uses openly available ERA5-Land data (Muñoz-Sabater, 2019) and produces realistic (i.e. statistically similar to real data) precipitation patterns to improve the sampling strategy of atmospheric properties and support robust hazard assessments. This brief correspondence first presents algorithm construction and then an example application using the Rangitāiki–Tarawera catchment, Te Moana-a-Toi / Bay of Plenty, New Zealand. All code is in R (R Core Team, 2021) and freely available.

The stochastic weather model (SWM) comprises three steps: data conversion, block construction, and stochastic weather generation. Due to the relative simplicity of the model, by exploiting some coding efficiencies in the R package

SWM first pulls time- and location-stamped precipitation data from the ERA5-Land data and converts values from accumulated to hourly rainfall before combining all data into a single 3-D spatio-temporal array for analysis (Fig. 1a). A single point is selected at random from the locations used in the download of the original ERA5-Land data (Fig. 1b). The precipitation data at this point are used to split the single array into periods of precipitation (

SWM algorithm flow diagram:

The Rangitāiki–Tarawera catchment is an area susceptible to many natural hazards including volcanic eruptions, flooding, and extreme weather events (ex-tropical cyclones). Hourly rainfall data across an 11

ERA5-Land data across the case study area:

Four sets of statistical analyses were undertaken to ensure that SWM-simulated data are stochastically similar to ERA5-Land data. For this, the 999 sets of 40 years of simulated data across the 11

Student's

Tukey's honest significant difference (HSD) (e.g. Miller, 1981) was used to determine whether source (real or simulated) is a significant factor in the prediction of total monthly rainfall.

A non-parametric bootstrap method was applied whereby empirical cumulative distribution functions (eCDFs) for each simulated dataset representing the 95th-percentile envelope (from 999 runs) are overlaid by the real dataset to determine if the latter is within the resulting envelope.

Autocorrelations are calculated for both the real and the simulated data to compare any significant lags (Venables and Ripley, 2002). Calculations require a continuous variable in discrete time, so they are run on cumulative daily and cumulative monthly rainfall timescales.

Statistical analysis results. Simulated data are shown as grey lines, and real data are shown as red lines.

Overall, while SWM passed all statistical tests for both realisations, some departure was noted in several combinations, the specifics of which are detailed below (see Supplement for a complete set of results).

Real data failed the Shapiro–Wilk test of normality (

Two linear models were built with rainfall as the response variable and both month and source (real or simulated) as predictor variables. One was built with an interaction term (m1) and one without (m2). Both models for both realisations passed Tukey's HSD test, rejecting source as a statistically significant predictor (Realisation 1:

The method and code provided through this brief communication can be used to rapidly generate multiple sets of realistic, long-term, hourly precipitation data over a spatial region. While the outputs may not have the nuances that come with more complex models (e.g. Burton et al., 2008; Papalexiou, 2022), the efficient open-source code, written in an open-source language and based on open-source data, facilitates an easy-to-plug-in input for hazard simulations to support long-term, time and spatially varying, probabilistic risk assessments, uncertainty quantification, and multi-hazard models.

Code is written in R (open-source software) and is freely available at

All data were obtained from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS), available here:

The supplement related to this article is available online at:

Both authors conceptualised the model, MGW built the model, and statistical tests were guided by MSB and coded by MGW.

The contact author has declared that neither of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

The two anonymous referees greatly improved this paper and especially breadth of tests detailed in the Supplement through their recommendations.

This research has been supported by the Resilience to Nature's Challenges Multi-hazard Risk model programme, New Zealand (grant no. GNS-RNC043).

This paper was edited by Dan Li and reviewed by two anonymous referees.