Brief communication: Introducing rainfall thresholds for landslide triggering based on artificial neural networks

. In this communication we show how the use of artificial neural networks (ANNs) can improve the performance of the rainfall thresholds for landslide early warning. Results for Sicily (Italy), show how performance of a traditional rainfall event duration and depth power law threshold, yielding a true skill statistic (TSS) of 0.50, can be improved by ANNs (TSS = 0.59). Then we show how ANNs allow to easily add other variables, like peak rainfall intensity, with a further performance improvement (TSS = 0.66). This may stimulate more research on the use of this powerful tool for deriving landslide early 10 warning thresholds.


Introduction
Landslides triggered by rainfall can cause damages on infrastructures, buildings, and in the worst scenario, even human losses (Froude and Petley, 2018). Commonly, rainfall thresholds indicating the conditions under which landslides can potentially occur, are a key component of warning systems aimed at protecting the population from a possible landslide event. In most of 15 the cases, thresholds are determined using empirical methods that link characteristics of precipitation, such as duration D and mean intensity I or cumulated rainfall E = I×D, to landslide occurrence (Caine, 1980). Rainfall thresholds are generally usually determined by assuming a predetermined parametric equation, which in most of the cases is a power law (Guzzetti et al., 2008).
In general, for a given set of predictors, the choice of a predetermined threshold equation form (e.g. power-law) can potentially limit its performanceSuch a constraint can potentially limit the predictive performance of the thresholds, because the 20 informative content of the considered explanatory predictor variables may not be exploited at fullest. This holds true all the more so when searching for alterative or additional variables with the aim at improving the performances of the thresholds, such as antecedent rainfall conditions (Glade et al., 2000), water storage and soil moisture data (Bogaard and Greco, 2018;Marino et al., 2020). For the case of E-D or I-D thresholds the use of a power law is customary and its rationale has been also verified based on a combined stochastic and physics-based approach (Peres and Cancelliere, 2014). On the contrary, either in 25 the case of a different pair of variables or the analysis of more than two variables, there is no analogous prominent parametric form of the threshold equation. For instance, as reported by Conrad et al., (2021), alternative formulas have been considered for hydrometeorological thresholds -i.e., based on rainfall and soil moisture or catchment storage -, including linear and bilinear functions, interpolated line segments without a mathematical function, cut-off values for integration of antecedent 2 conditions with traditional rainfall thresholds, and more complex logical operators. In general, for a given set of predictors, 30 Tthe choice of a predetermined threshold equation form (e.g. power-law) can potentially limit the its performance. of the threshold derivable from the given set of variables, and thusThis can also limit jeopardize the scientific soundness of the comparisons between different thresholds using different sets of predictors (e.g. rainfall thresholds vs. thresholds using soil moisture)approaches for deriving landslide triggering thresholds. The use of predetermined parametric forms can finally jeopardize the scientific soundness of comparisons between thresholds using different sets of predictors (e.g., rainfall 35 thresholds vs. thresholds using soil moisture). Artificial Neural Networks (ANNs), belonging to Artificial intelligence or Machine learning techniques, allow to potentially remove the mentioned limitation of having to choose a predetermined parametric threshold form, as they are universal approximators, i.e. capable of reproducing any continuous function (Haykin, 1999).
Up to now, a number of studies have used the potentiality of ANNs and of other machine learning techniques in landslide 40 analysis. Many studies focused on susceptibility mapping and individual slope instability have exploited the potentialities of ANNs (Reichenbach et al., 2018). In other studies, the focus is on the prediction of individual deep seated landslide displacements by machine-learning algorithms using detailed in situ data (e.g., van Natijne et al., 2020). Based on the this briefly outlined state-of-the-art, it appears that ANN skills are mainly used to create susceptibility maps and/or in local early warning systems, while application for territorial landslide early warning (Piciullo et al., 2018) has not been investigated so 45 far. In this communication, we present our preliminary investigations showing how ANNs can allow to derive landslide early warning thresholds with higher performances than traditional rainfall intensity-duration power law thresholds.

Data and methods
We refer to the case study of Sicily ( Fig. 1), one of the 20 regions of Italy. We have retrieved hourly rainfall from 306 rain gauges distributed within the region, managed by the Regional water observatory (Osservatorio delle Acque, OdA), the SIAS 50 (Sicilian Agro-meteorological Information Service), and by the Regional Civil Protection Department (DRPC).  This database contains information on landslides occurred in Italy from January 2010 to December 2019 and is available online (https://franeitalia.wordpress.com/database/, last accessed on 17/11/2021). Thus, our analysis is based on the period from January 2010 to October 2018, where both rainfall and landslide information is available. Some landslide events have been removed from the analysis. In particular, this was done based on the landslide typology, material and type of trigger fields included in FraneItalia that characterize the observed landslide events -typology, material and trigger. Only events having 65 "rainfall" or "rainfall and other" trigger have been considered, so as to exclude landslides due to earthquakes and anthropogenic activities. Also, events of the "fall" typology combined with "rock" material have been removed from the analysis, as in the case of rRockfalls have been removed from the analysis as well, as their triggering can be not always linked to rainfallrainfall may have a triggering role different from the other types of landslides. Rainfall data have been checked so in order to remove suspicious rainfall data. In particular, where hourly rainfall exceeded 250 mm -corresponding to about one third of mean 70 yearly rainfall for Sicily and to about two times the maximum rainfall ever recorded in 1 hour -the series has been visually inspected, and in the case of an evident error (rain gauge malfunction) the whole rainfall event surrounding the peak has been removed. In light of the above, Aa flow chart of representing the applied methodology is shown in Fig. 2a. First, pre-processed precipitation and landslide data were inputted to the CTRL-T (Calculation of Thresholds for Rainfallinduced Landslides-Tool) code (Melillo et al., 2018). The software consists of a code in R language, and allows to reconstruct rainfall events and characterizing them by the following variables: duration D, mean intensity I, total depth E = D×I and peak 80 intensity I p (defined as the maximum hourly intensity occurring during a rainfall event). The most probable rainfall conditions associated to each landslide (multiple rain gauges available for a given location) event are computed by the software based on distance between rain gauge and the landslide location, and the characteristics of the reconstructed rainfall event. In particular, for a given landslide, all rain gauges within a circle of radius R b specified by the user are searched and, when more than one rain gauge is located within the circle, the rainfall events from each rain gauge are weighted based on the rain gauge-landslide 85 distance and the rainfall event characteristics (cumulated rainfall and duration). The weight is used to estimate the "probability" associated to each rainfall condition potentially attributable to each landslide event. In particular, the probability, in case of multiple rainfall conditions, is computed by dividing each weight to the sum. CTRL-T then determines the triggering rainfall conditions of each landslide as those corresponding to the highest probability. Finally, the code provides power-law E-D thresholds for different levels of non-exceedance frequency of triggering events. The software allows the user to set different 90 values of the parameters to reconstruct rainfall events in order to take into account seasonality, i.e., different average evapotranspiration rates in different periods of the year. In particularSpecifically, following the study by Melillo et al. (2016), we assumed that in the warm season C W (April -October) the minimum dry period separating two rainfall events is of P 4warm = 48 hours, while in the cold season a longer period is assumed (P 4cold = 96 hours). The rain gauge sensitivity is G s = 0.1 mm.
The rain gauge search radius has been fixed to R b =16 km. A binary coding has been attributed each rainfall event, flagging 95 triggering events as a target with value of 1 and a non-triggering event with null value. Application of the CTRL-T software allowed to reconstruct the rainfall events associated to the 144 landslide events in the inventory (triggering events) and 47398 non-triggering events. For 103 events, only the day of triggering was known, while for the remainder a more precise indication of the triggering instant was available. In the first case, the triggering instant was attributed to the end of the day, in the second case to the instant of peak rainfall within the time interval when the triggering has occurred. Furthermore, for the 144 landslide 100 events detailed information on the typology was available only in 18 cases, 10 of which were "fall" of "more than one material", 4 "flow" and other 4 "slide". The average distance between rain gauge and landslide for the 144 events is about 5 km, thus seldom the maximum value of Rb= 16 km was reached.
The characteristics of the events were used as input variables to ANNs devised for pattern recognition, as implemented within the Neural Net Pattern Recognition tool in MATLAB. The neural network, characterized by a feed-forward structure (Fig 2b), 105 is composed of three layers: input, hidden and output (Fig 2b). The input layer takes the series of predictors and sends them to the hidden layer, where the series are combined and transformed though a specific activation function. Two different activation functions have been considered: a tan-sigmoid function f(n) for the hidden layer, and a log-sigmoid ( ) for the output layer: (2) 110 The ANNs have been trained through the scaled conjugate gradient backpropagation algorithm, while cross-entropy was assumed as the performance function for training. Denoting the generic ANN output with y i (assuming values in the open interval between 0 and 1) and the binary target with t i , i =1,2, …, N, the cross-entropy function F heavily penalizes inaccurate predictions and assigns minimum penalties for correct predictions: (3) 115 The ability to distinguish triggering events from non-triggering events was measured using the confusion matrix, a doubleentry table in which it is possible to identify true positive TP (triggering events correctly classified), true negative TNs (nontriggering events correctly classified), the false negative FN (triggering events classified as non-triggering) and FP false positive (non-triggering events classified as triggering). Through the confusion matrix it is possible to determine the True 6 Positive Rate (TPR) and the False Positive Rate (FPR), as well as their difference, known as the True skill statistic (TSS), 120 which is widely used for threshold determination (Peres and Cancelliere, 2021): The output of the ANNs is transformed into a binary code (dichotomization), by assuming a value of 1 (ANN predicts a 125 landslide) when the output is greater that athe application of a threshold value, and of 0 otherwise. We then identify the the mentioned threshold value by maximizing the TSS, which has the advantage to do not be affected by allows also to overcome unbalanced training dataset issues respect to other indices, such as ROC accuracy ACC = (TP + TN)/(TP + TN + FP + FN)a performance metric used as default by many ANN training software tools.. Maximization of TSS implicitly assumes that all entries of the confusion matrix have the same importanceutility. Quantifying the loss how more important isof a false negative 130 respect to a false positive, is a complex task task that goes beyond the aim of the present analysis, and that has been treated faced only in very recent studies (cf. Sala et al., 2021). Results from ANNs are compared with rainfall duration-depth powerlaw thresholds derived through the maximization of TSS -i.e., again, analysing both triggering and non-triggering events.
For our analysis different combination of input data (duration D, intensity I, total depth E and peak intensity I p ) and different architectures, changing the number of hidden neurons were tested. In particular, the following input variable configurations 135 have been investigated: 1) D; 2) E; 3) I; 4) I p ; 5) D and E; 6) D and I; 7) D and I p ; 8) E and I p ; 9) I and I p ; 10) D, E and I p . The listed input configurations are indeed all possible ones, except those combining both E and I with duration D. This has been done because the two pairs D-I and D-E have the same informative content by construction, as confirmed by the fact that the performances of the D-I and D-E neural networks do not differ significantly (see later, Tab. 1) -slight differences may occur as ANNs can be sensitive to how a set of variables having together the same information content of another are presented to 140 the network. All the data have been inputted taking their natural logarithms. Different networks have been considered varying the number of hidden neurons from 5 to 20, in order to search for the best value, i.e., the one yielding the highest TSS.
The entire dataset of rainfall events was divided into a training, a validation, and a test data set, selected randomly from the entire dataset, in the proportions of 70%, 15% and 15%. The training dataset is data used to fit the model; while the validation provides an unbiased evaluation of a model fit on the training dataset while tuning model hyper-parameters, such as the number 145 of training iterations. Finally, the test dataset provides an unbiased evaluation of a final model fit. This subdivision allowed to apply the early-stopping criterion to prevent overfitting. According to this criterion, the training of the neural network is stopped when the values of the performance function calculated on the validation dataset start to get worse. In order to ensure representativeness of the data randomly assigned to the training, validation and test datasets, results where the TSS in the test or the validation data set are greater than the TSS in the training data set are removed from the analysis. Once the network is 150 developed considering these three datasets and early stopping, it is "frozen" and ROC metrics (e.g., TSS) can be computed with that network on the entire dataset, and the corresponding performances can be considered generalizable. Thus, when comparing our proposed approach with the traditional one, we focus on these last performances (labeled as "all"). This seems the most appropriate way to proceed, as the I-D power law and its performance is are determined with respect to the entire dataset. 155

Results and discussion
Application of the CTRL-T software has allowed to build the dataset of triggering and non-triggering events and to derive the threshold according to the so-called frequentist method (based on triggering events only). Considering a non-exceedance frequency for triggering events equal to 5%, threshold from the software is as follows: This threshold is lower than the one obtained for Sicily by Gariano et al. (2015), yet comparable with an updated one derived by Melillo et al. (2016). Specifically, thresholds reported on the mentioned two studies are respectively the following (nonexceedance frequency is again 5%): It should be mentioned that these thresholds were both derived from rainfall datasets covering the period July 2002-December 2012, which is different from the one we have considered in our analysis. The first threshold has been derived with an earlier version of the CTRL-T code, which required manual selection of the most representative rain gauge , while the second study derives from the updated algorithm, where this selection is made automatically.
These thresholds however are not comparable with those to derive with the proposed ANN approach, because non-triggering 170 events are neglected. We have hence derived the power-law threshold corresponding to the maximum TSS -externally of the CTRL-T software, via MATLAB ® global optimization toolbox -, obtaining the following result: = 2.40 0.68 (10) that has a TSS = TSS 0 = 0.50, obtained from a TPR = 0.76 and a FPR = 0.26. The threshold has a lower intercept but a higher slope, so, after a duration of about 5 hours, it is above that the one given in Eq. 7. 175 Table 1 shows the results obtained from the tested 160 neural network configurations (10 different input layers and 16 different numbers of hidden neurons). In particular, the table shows, for each set of input variables: the optimal number of hidden neurons corresponding to the maximum TSS for the training, the validation, the test and the entire data set ("all"). For the entire data set also the TPR and the FPR are shown. As can be seen, for most of the input configurations, the TSS for the training and validation data sets is generally quite close. This proves that overfitting has been sufficiently prevented, thanks to 180 the early-stopping criterion -otherwise the performance in the training data set would have been significantly higher than those in the validation and the test data set. Table 1: Results of tests with ANNs, showing the optimal number of hidden neurons (a number from 5 to 20 has been tested) and 185 the True skill statistics (TSS) for the entire, the training, the validation and the test data sets. Values in the table are compared to TSS0 = 0.50 which is the maximum value associated to a D-E power law threshold. As can be seen from the Table, using only one input variable, the performances are significantly lower than those obtained from the use of the power-law threshold of Eq. 10: however, for the variable with the highest informative content, mean rainfall 190 intensity I, the TSS = 0.44 is quite close to TSS 0 = 0.50. When using input variables in pairs, performances increase significantly. Notably, in the case of the pairs D-I and D-E -i.e., the same variables used for the power law -the TSS = 0.59 (0.60), which is significantly higher than TSS 0 . This is obtained by both an increase of the TPR (true positives) and a decrease of the FPR (false positives). The fact that with same input data the neural network provides significantly better performances than the power law, proves that the use of a predetermined parametric form for the threshold equation does not allow to exploit 195 at fullest the informative content of the input variables, while the flexibility of ANNs allows to achieve a better classification.

Input data
In other words, one of the shortcomings of a power law is that the same equation is usually assumed valid for all the durations, while ANN are more flexible. Finally, adding a third variable (network input D-E-Ip), a further improvement is obtained (TSS = 0.66), mainly due to a decrease of the FPR. This result demonstrates how neural networks can be an aid in searching additional variables that can provide a more reliable dynamic prediction of landslide triggering conditions. In particular, in this 200 case, it has been shown that peak intensity may have an important informative content, an aspect that has not been perhaps sufficiently investigated in the literature, even though some researchers have found that the addition of a third variable is a possible way to derive thresholds that better adapt to complex case studies (e.g., Rosi et al., 2021).

Concluding remarks
The identification of rainfall thresholds indicating landslide triggering conditions is a key step for implementing territorial 205 landslide early warning systems. Commonly, thresholds are searched in a limited space, i.e., constrained to a predetermined parametric form, which is generally a power law linking rainfall event, duration D and mean intensity I (or total depth E =D I). In this communication we have shown that choosing a predetermined form for the law of the threshold can potentially limit the performance of the empirical model, and how Artificial neural networks are a valuable tool to overcome this limitation.
The analysis, referred to the case study of Sicily, has shown that an E-D power-law threshold has a maximum true skill statistic 210 of TSS = TSS0 = 0.50. On the other hand, the classifier based on neural networks, using the same pair of input variables, yielded a significantly greater TSS = 0.60 It has also been shown how neural networks allow to easily explore the potential information content of other variables, and hence provide a way to improve predictive performance. For instance, it has been shown that the inclusion of peak rainfall intensity as an additional variable, can lead to an improvement of performance. It is important that when training neural networks, generalization capabilities are ensured, for instance by the early stopping 215 technique. Overfitting is not an issue for the traditional approach based on the power law -or any other parametric equationas in general the number of free parameters is very low (2 for a power law). This may be a drawback for neural networks, even though it forces one to consider both triggering and non-triggering events, which is fundamental for obtaining thresholds with acceptable statistical characteristics (Peres and Cancelliere, 2021). Another possible disadvantage of neural networks with respect to predetermined-form thresholds is also represented by the fact that it is may be difficult to express the neural network 220 classifier as a simple equation. This could limit the practical implementation of triggering thresholds based on neural networks, which could be perceived as impractical by practitioners. However, this limit can for instance be overcome by providing a user-friendly software to the end user.
Author contributions. D.J.P. designed the research, P.D. and D.J.P. conducted the analyses and wrote the paper, A.C. and P.S. and D.J.P. supervised the research and critically revised the manuscript. 230 Acknowledgements. Pierpaolo Distefano doctoral program's grant is funded by the "Notice 2/2019 for financing the Ph.D. regional grant in Sicily" as part of the Operational Program of European Social Funding 2014-2020 (PO FSE 2014-2020) CUP E65E19000830002. David J. Peres was supported by the post-doctoral grant on "Sviluppo di modelli per la valutazione di strategie innovative di gestione delle risorse idriche in un contesto di cambiamenti climatici" (Development of models for the evaluation of new strategies for water resources management in a changing climate). The research has been partially 235 conducted within the following projects: LIFE 17 CCA/IT/000115 SimetoRES funded by the EASME (now CINEA) of the European Commission, and the Programma Operativo Nazionale Governance e Capacità Istituzionale 2014-2020 -Programma per il supporto al rafforzamento della governance in materia di riduzione del rischio ai fini di protezione civile CUP (Program to support the strengthening of governance in the field of risk reduction for civil protection purposes). APCs funded by "fondi di ateneo 2020-2022, Università di Catania, linea Open Access". 240