Quantitative comparison between two different methodologies to define rainfall thresholds for landslide forecasting
Abstract. This work proposes a methodology to compare the forecasting effectiveness of different rainfall threshold models for landslide forecasting. We tested our methodology with two state-of-the-art models, one using intensity–duration thresholds and the other based on cumulative rainfall thresholds.
The first model identifies rainfall intensity–duration thresholds by means of a software program called MaCumBA (MAssive CUMulative Brisk Analyzer) (Segoni et al., 2014a) that analyzes rain gauge records, extracts intensity (I) and duration (D) of the rainstorms associated with the initiation of landslides, plots these values on a diagram and identifies the thresholds that define the lower bounds of the I–D values. A back analysis using data from past events is used to identify the threshold conditions associated with the least number of false alarms.
The second model (SIGMA) (Sistema Integrato Gestione Monitoraggio Allerta) (Martelloni et al., 2012) is based on the hypothesis that anomalous or extreme values of accumulated rainfall are responsible for landslide triggering: the statistical distribution of the rainfall series is analyzed, and multiples of the standard deviation (σ) are used as thresholds to discriminate between ordinary and extraordinary rainfall events. The name of the model, SIGMA, reflects the central role of the standard deviations.
To perform a quantitative and objective comparison, these two models were applied in two different areas, each time performing a site-specific calibration against available rainfall and landslide data. For each application, a validation procedure was carried out on an independent data set and a confusion matrix was built. The results of the confusion matrixes were combined to define a series of indexes commonly used to evaluate model performances in natural hazard assessment. The comparison of these indexes allowed to identify the most effective model in each case study and, consequently, which threshold should be used in the local early warning system in order to obtain the best possible risk management.
In our application, none of the two models prevailed absolutely over the other, since each model performed better in a test site and worse in the other one, depending on the characteristics of the area.
We conclude that, even if state-of-the-art threshold models can be exported from a test site to another, their employment in local early warning systems should be carefully evaluated: the effectiveness of a threshold model depends on the test site characteristics (including the quality and quantity of the input data), and a validation procedure and a comparison with alternative models should be performed before its implementation in operational early warning systems.