Rainfall-Induced Landslide Early Warning System based on corrected mesoscale numerical models: an application for the Southern Andes

. Rainfall-Induced Landslide Early Warning Systems (RILEWS) are critical tools for reducing and mitigating 10 economic and social damages related to landslides. Despite this critical need, the Southern Andes does not yet possess an operational-scale system to support decision-makers. We propose RILEWS using a logistic regression system in the Southern Andes. The models were forced by corrected simulations of precipitation and geomorphological features. We evaluated the precipitation using the Weather and Research Forecast (WRF) model on an hourly scale. The precipitation was corrected using bias correction approaches with daily data from 12 meteorological stations. Four logistic and probabilistic 15 models were then calibrated using Logit and Probit distributions. The predictor variables used were combinations of the slope, corrected daily precipitation and data preceding the events (7 and 30 days previous) for 57 Rainfall-Induced Landslides (RIL); validation was by ROC analysis. Our results showed that WRF does not represent the spatial variability of the precipitation. This situation was resolved by bias correcting. Specifically, the PP_M4a method with Bernoulli distribution for the occurrence and Gamma for the intensity produced lower MAE and RMSE values and higher correlation 20 values. Finally, our RILEWS had a high predicting capacity with an AUC of 0.80 using daily precipitation data and slope. We conclude that our methodology is suitable at an operational level in the Southern Andes. Our contribution could become a useful tool in the mitigation of impacts related to climate change. reproduce the spatial precipitation distribution with a bias-correct using in-situ weather stations. The precipitation was integrated into a logistic model subsequently, to establish the spatial probability of occurrence of a RIL event.

infrastructure damages (Guzzetti et al., 2020;Chikalamo et al., 2020;Hermle et al., 2021). The present work evaluates the design of a RILEWS using a mesoscale atmospheric model coupled to a logistic discriminator in the Southern Andes.
Due new rainfall scenarios related to climate change, RILEWS have become increasingly used in recent years, reducing the vulnerability of populations using different approaches (Peres & Cancelliere, 2014;Segoni et al., 2018;Fan et al., 2019;35 Tiranti et al., 2019;Thirugnanam et al., 2020;Lee et al., 2021). RILEWS based on intensity/duration curves that do not consider the effect of soil moisture, leading to bias in their predictive capacity (Marra et al., 2017;Zhao et al., 2019;Chikalamo et al., 2020). Some RILEWS use historical precipitation data with long-term observations, climate reanalysis models and atmospheric mesoscale models (Lazzari & Piccarreta, 2018;Tichavský et al., 2019). Moreover, atmospheric mesoscale models have shown a high uncertainty in areas with scarce meteorological stations and complex topography. 40 Recently, the integration of mesoscale atmospheric models with local weather stations allowed areas susceptible to RIL to be defined by determinist numerical models (Fustos et al., 2020a). Therefore, a correct implementation of mesoscale models could allow the implementation of this source of information in RILEWS.
In recent years, mesoscale models showed incapable of representing precipitation fields suitable for RILEWS in areas with complex topography like the Southern Andes (Yáñez-Morroni et al., 2018). Currently, mesoscale models are restricted to the 45 quality of their atmospheric forcings, needing to generate ensembles to obtain approximate solutions (Wayand et al., 2013).
The object of the present work was to evaluate the implementation of a mesoscale logistic model forced by geomorphological and precipitation constraints. We corrected mesoscale models using weather stations, generating RILprone probability zones for the first time in the Southern Andes. The paper is structured as follows: after the introduction, the 55 second section describes the study site and its pertinence to implement RILEWS. In the third section, we describe the data and methods, including the calibration and validation procedures. In the fourth section, we outline the main results of the proposed RILEWS, focusing on the quality of predictors and model outputs. The fifth and final section comprises the discussion and conclusions, presenting the implications of this proposal and their general applicability to the southern Andes. north-south. From west to east, they are the Coastal Range, the Central Valley and the Andes Range (Figure 1). In the western area, altitudes range from 100-1,000 masl, with slopes between 0 and 25°. In the central valley, the maximum 65 altitude is 150 masl, with slopes between 0 and 15° in the central part and between 25 and 45° towards the Andes. Finally, the highest altitudes (400 to 2,700 masl) and the steepest slopes (25 -70°) are found in the eastern zone (Gomez-Cardenas & Garrido-Urzua, 2018).
Annual precipitation is strongly correlated with the topography and latitude. In the north segment (~40°33' -~41°10' S) it is over 1,200 mm per year, while in the south (~41°10' -~42°10' S) it rises to over 1,400 mm per year. In the Central Valley, 70 the precipitation exceeds 1,910 mm per year. The highest precipitations are recorded in the Andes Range, of over 4,000 mm (Alvarez-Garreton et al., 2018). The climate in the area is classified as oceanic climate (Beck et al., 2018) with a dry summer in the north portion, but no dry months in the south (Alvarez-Garreton et al., 2018).
The oldest geological units in the area correspond to cretaceous intrusive bodies which emerge in the Rupanco lake peninsula and further south. In the Coastal Range, there are outcrops of metamorphic rocks from the Paleozoic Triassic (300-75 250 Ma). These rocks are largely covered by sedimentary deposits of various origins: marine from the Oligocene-Miocene (eastern flank of the Coastal Range), volcanic from the Oligocene-Miocene (40 to 5 Ma; south of Rupanco lake), and glacial from the Pleistocene-Holocene. In the SE of the region is the North Patagonian Batholith (132-77 Ma), consisting of granites, granodiorites, tonalites and leucogranites (Gomez-Cardenas & Garrido-Urzua, 2018). Elsewhere in the region, there are clayey soils called trumaos and ñadis, which have developed from glacial-fluvial-volcanic sediments. These soils present 80 a high organic content, poor drainage and low development (Blanco & de la Balze, 2004).

Methodology
We assessed the feasibility of a RILEWS applied to Rainfall-Induced Landslides (RIL) using geomorphological and precipitation forcings for the Southern Andes. Precipitation data and local geomorphological features were integrated into a logistic model to evaluate the occurrence of RIL. These variables were taken into account because both the precipitation and the topography predispose the study area to RIL (Fustos et al., 2017;2020a). A database of previous RIL was studied 90 (Gomez-Cardenas & Garrido-Urzua, 2018), divided into calibration subsets with subsequent validation of the method. The bias associated with the precipitation obtained from the mesoscale model was corrected using in-situ stations ( Figure 2). To establish the reliability of the model for the correct prediction of RIL, its sensitivity was calculated using the validation subset. This allowed the RIL prediction sensitivity to be characterised for operational implementation in future LEWS.

3.1
Atmospheric modelling The study area contains a limited number of meteorological stations, becoming a challenge to represent the spatial distribution of precipitation. To overcome the limitation imposed by the meteorological data, precipitation fields were 100 https://doi.org/10.5194/nhess-2021-317 Preprint. Discussion started: 8 November 2021 c Author(s) 2021. CC BY 4.0 License. estimated using the Weather and Research Forecast model 4.0 (WRF, Skamarock et al., 2019). Atmospheric conditions were simulated for the period 2014 to 2018 at hourly time resolution. We used a spatial resolution of 4 km, which allowed represent the complex topography of the Andes. WRF parametrisation followed the WSM 3-Class Simple Ice Scheme microphysical model (Hong et al., 2004), while the soil-atmosphere interaction was parametrised by the Unified Noah Land-Surface Model (Tewari et al., 2004). Final Operational Global Analysis product from the US-National Centers for 105 Environmental Prediction NCEP, also known as FNL (NCEP, 2000), was used as the global forcing to obtain the solutions of precipitation at mesoscale.
The precipitation fields of the WRF model were compared with 12 meteorological stations available in the area to evaluate the bias of the numerical model ( Figure 1). Biases associated with local effects of the parametrisation selected in WRF were corrected by MeteoLab (Wilcke, 2013) using three different methods (Table 1). We compared the methods with different 110 statistics functions such as bias, MAE, RMSE, and Pearson and Spearman correlations. Following, the model corrected with the lowest RMSE in precipitation was used in a RILEWS implementation.
BC_QPQM Bias correction approach. Precipitation bias correction methods for high-resolution regional climate simulations using COSMO-CLM: Effects on extreme values and climate change signal (Gutjahr and Heinemann, 2013).

Rainfall-Induced Landslide Early Warming
We propose a model for RILEWS based on the probability of occurrence of RIL in space and time. The probability was 115 determined using Logit and Probit logistic distribution functions, which have been implemented previously in the Southern Andes (Fustos et al., 2017;2020b). The advantage of logistic regressions is that they establish statistical relations between physical processes at different scales with a limited quantity of information (Fustos et al., 2020b). The logistic regressions were trained based on the local geomorphological conditions (slope) and previously corrected simulations of precipitation.
We used slope values derived from SRTM data. A limited number of 4,987 RIL have been reported for the south of Chile 120 (Gomez-Cardenas & Garrido-Urzua, 2018). However, we had detailed information only for 2,035 of these, including the exact date. We used as database 57 RIL events, considering mudflow, debris flow and mass wasting. The Logit distribution model fit the probability of occurrence of an event using a logistic curve (Li et al., 2011). The Logit distribution model ( ) is given by: https://doi.org/10.5194/nhess-2021-317 Preprint. Discussion started: 8 November 2021 c Author(s) 2021. CC BY 4.0 License.
where ( = 1) is the probability of occurrence of a RIL, is the number of predictors used ( ), ′ are the coefficients of 125 the function and 0 ′ is the intercept. A Probit distribution also uses binary dependent variables and its main difference from the Logit distribution is the use of the inverse standard normal distribution. The Probit distribution ( ) (McCullagh & Nelder, 1989;Javier & Velazquez, 1990) is given by: where , and refer to the same variables as the Logit distribution, is the error of the fit with standard normal distribution ∼ (0, ) and −1 denotes an inverse normal probability function (McCullagh & Nelder, 1989). Four 130 predictors were used for both the Logit and Probit functions, daily precipitation, precipitation over the previous 7 and 30 days, and slope ( Table 2).
The complete RIL database was split into a calibration sub-base (DB1) and an independent calibration validation sub-base (DB2) for subsequent evaluation (Figure 2). The database was split by taking from 20 to 30% of the data, chosen at random, for calibration. A calibration set was selected 100 times to obtain and ′ , and their standard deviations denoted by and 135 ′ respectively, calculated according to the methodology presented by Fustos et al. (2020b). The quality of each regression was evaluated by ROC analysis (Fawcett, 2006) using the independent database BD2 ( Figure  140 2). This allowed us to understand the degree of accuracy in identifying a RIL event under determined conditions of slope and precipitation. A probability threshold (tolerance) was established to define the instant when the models identify a RIL event correctly. The tolerance was defined from the results of the ROC curve for probability thresholds between 50 and 95%. In this way, the sensitivity of each iteration was estimated (Eq. 3), representing the capacity of the set of estimators to detect RIL events correctly (Fawcett, 2006;Hand & Till, 2001). The sensitivity was defined as the ratio of true positive predictions 145 of events (TP), over the total of positive events (including false-negative predictions -FN). The specificity was also calculated (Eq. 4) to evaluate the capacity of detection of non-RIL events or true negative (TN), to avoid false positives (FP) (Fawcett, 2006). Finally, this methodology made it possible to evaluate the capacity of each model to detect RIL events (Fustos et al., 2020b). correspondence with the observations than did the uncorrected simulation. Therefore, our results showed that the mesoscale 170 correction allows improving the rainfall representation quality.  The precipitation fields corrected with different approaches of meteolab (Table 2) showed improved values in weather stations in comparison to raw solution. The corrected ISI-MIP results were similar to those described for PP_M4a, but with slightly larger error values. Both ISI-MIP and PP_M4a presented a bias lower than 0.5 mm. The gpQM method varied 185 between -2.69 and 0.95 mm (Figure 3). We point out that the PP_M4a method shown the best performance considering MAE and the RMSE (~0.04 and ~0.23 mm respectively). The Spearman coefficient ranged between 0.90 and 0.98, increasing the quality of representation of the precipitation fields in comparison to weather stations.

Rainfall-Induced Landslide Early warning
The probability of occurrence of RIL at spatial and temporal scale was estimated using the precipitation values corrected on for the precipitation of the previous 7 days the estimator varied from -0.6413 ± 0.0063 to 0.0020 ± 0.0086 [1/mm]. The indicator obtained for the monthly precipitation was -0.3518 ± 0.0033 [1/mm] (used exclusively for the M3 model), while 195 the slope estimator fluctuated between -0.1696 ± 0.0049 and -1289 ± 0.0072 [1/degree] ( Figure 5).
We point out that estimators related to the precipitation had a higher absolute weight than the slope for all the models calibrated. The precipitation used in daily (M1), previous 7 days (M2) or previous 30 days (M3) showed a decreasing value (in absolute terms) as the accumulated precipitation period increased. The results of the PP_M4a model, which considered the daily precipitation in conjunction with that of the previous 7 days, showed that the latter had an absolute weight of almost 200 zero compared to the former. In general, the standard deviations (σk') obtained from the estimators and intercept were very low for all the Logit models calibrated. The Probit model showed the same behaviour (as the Logit) of the intercept for the 4 models; its estimator fluctuated between 1.7482 ± 0.0041 and 1.9113 ± 0.0030. The values for daily precipitation varied from -0.4166 ± 0.0046 to -0.4016 ± 0.0027; 7-day precipitation from -0.3545 ± 0.0029 to -0.0202 ± 0.0038; 30-day precipitation with -0.1897 ± 0.0020 (just used in M3), and the slope from -0.0741 ± 0.0022 to -0.0596 ± 0.0033 ( Figure 6). 205   Performance assessment ROC analysis of the Logit and Probit models showed that the M1, M2 and M4 models gave a similar performance (Figure 7 and Figure 8). The area under the curve for the Logit models varied between 0.8032 and 0.6672, while for the Probit models it varied between 0.8076 and 0.6672. Our results showed a lack of performance for M3 in comparison to daily precipitation data for Logit and Probit models (0.6582 and 0.6672, respectively). Models with AUC values equal to 0.5 indicate that do 215 not suitable of discriminate the landslides, generating random predictions. Therefore, our results demonstrate that the calibrated models do not do a random fitting.
The rate of valid positives in the Logit distributions of the M1 and M4 models was higher than 0.97 with tolerances below 50%. For the same range, however, the rate of FP was over 45%. The same occurred with the Probit models. For a tolerance of 95%, the prediction of FN for both regressions diminished to below 40%, although the accurate predictions (TP) also fell 220 by ~11%. Similar performance was observed in M2, with slightly higher numbers of FP, but fewer as a proportion of TP. M3 in contrast presented rates of accurate predictions and FN close to 1 for thresholds lower than 85%. In general, we observe that the Probit models had greater AUC values than the Logit being more suitable for RILEWS.  (Table 3). In general, we observed that the Logit models were more sensitive than the Probit. The specificity values for the M2 and M3 models were subtly higher for the Probit regressions than for the Logit, while for the M1 and M4 models the results obtained were almost equal. 235 According to this, the best model for predicting RIL in the study area was M1 (daily precipitation and slope). The sensitivity and specificity values with the 95% threshold chosen after ROC analysis were higher than 82%. The results showed that the indicators were similar for the M1, M3 and M4 models (Table 3). However, a reduction was observed in the rate of TP for 240 the M2 model (~10%).

Analysis and Discussion
Implementing RILEWS is a challenge due to the natural limitations like historical records and the precipitation data available. One of the main challenges in RILEWS corresponds to develop a model that generates warning only using limited meteorological information. Therefore, a low uncertainty precipitation representation is a valuable contribution in complex 245 topography environments (Table 3). Our study proposes an alternative to landslide forecast into scarce data environments, allowing to increase the resilience of the local community. Here, we demonstrated that the mesoscale models become suitable to reproduce the spatial precipitation distribution with a bias-correct using in-situ weather stations. The precipitation was integrated into a logistic model subsequently, to establish the spatial probability of occurrence of a RIL event.

Precipitation accuracy 250
Implementation of a LEWS applied to RIL requires precise estimation of the spatial distribution of precipitation. Zones with a low density of meteorological stations generate uncertainties in the RILEWS implementation (Marra, 2018;Peres et al., 2018). Previous works have shown the sensitivity of mesoscale models to abrupt changes of complex topography (Srivastava et al., 2015;Osman et al., 2018;Heredia et al., 2018;Jeong & Lee, 2018;Buchici et al., 2019;Bannister et al., 2019;Worku et al., 2020); being consistent with the abrupt topography of the eastern part of the study area (Figure 4), where the MAE 255 (6.6) and RMSE (17.9) values were concentrated. We avoided the precipitation constrain using a bias-corrected version of the WRF model to reduce the spatial error estimation in the precipitations. The use of bias-corrected precipitation of the WRF model improved the spatial representation in this study. The uncorrected model had bias values higher than 16 mm, becoming critical during the incorrect early warning generation. Therefore, an incorrect precipitation estimation could become a human loss. Following, our results deliver precipitation data with a low uncertainty level. That becomes suitable to 260 operative RILEWS with a low false-positive rate (FP).
The bias-correction using meteolab improved the precipitation representation to compared with weather stations (Figure 4).
The data from 12 spatially distributed meteorological stations were sufficient to represent the precipitation fields with low RMSE values (max. 0.36 mm). Thus, the corrected results represent the precipitation fields in Andean areas with lower bias values than previous studies (Yáñez-Morroni et al., 2018;Schumacher et al., 2020). PP_M4a approach was found to reduce 265 the bias efficiently for the study area. We propose that perfect prog approach allow to represent accurately the topographic influence in the precipitation if the distribution of the weather stations is available. We note that 30% of all the RIL occurred on days with low precipitation on the day and during the preceding days (7 and 30 days previous). Therefore, we propose that future developments should progress to analysis on a sub-daily scale. In this context, future developments should aim to use corrected WRF at an hourly scale; or else use lower-resolution satellite estimates of precipitation as a tool to complement 270 WRF simulations.

5.2
Rainfall-Induced Landslide Early Warning The Southern Andes has a complex topography that triggers precipitation events with different intensities in a few kilometres of separation (examples in Figure 9 and Figure 10). Hence, a correct precipitation representation along the space allows increasing the sensibility. The sensitivity of a LEWS depends heavily on the input variables, specifically the precipitation in 275 this case. The RILEWS achieved high predictive ability with AUC values between 0.65 (M3) and 0.80 (M1), suggesting high sensitivity to intense precipitations in short periods. The performance of the model diminished when data at monthly scale were used (M3) in comparison with daily resolution (M1 and M2), where M2 model (AUC=0.77) had similar performance to M1. This similarity may be associated with the soil moisture content, reflecting the previous precipitation; this means that it functions as a memory of the soil moisture in the slope before a RIL (related to the different soil types). 280 The memory effect in the slope in M3 will reflect in part its predisposition to suffer a RIL based on the soil moisture content in the first few centimetres. Numerous works have related satellite information on precipitation and soil moisture to establish https://doi.org/10.5194/nhess-2021-317 Preprint. Discussion started: 8 November 2021 c Author(s) 2021. CC BY 4.0 License. links between them (Brocca et al., 2020;Camici et al., 2020;Pellarin et al., 2020). The slope memory approach could be the best way to obtain a proxy of the soil moisture content, as there is no network of moisture sensors in the study area. This is consistent when we compare M1 (AUC=0.80) and M4 (AUC=0.79); they present similar sensitivity values (~91% in both 285 cases), suggesting that either model could be used.

5.3
Future developments The Andes in one of the most propensity zones to be affected by intense precipitation product of climate change. Moreover, the complex topography needs a high temporal resolution to reproduce the precipitation variability of the Southern Andes. 295 Potential improvements should be directed towards increasing the predictive ability by increasing the temporal resolution of the precipitation products. Our models do not consider the soil hydraulic variability like tephra fall or intensely weathered soft rocks. Recently, rainfall-induced landslides affected actives (Fustos et al., 2021) and older volcanic environments (Somos et al., 2020). The new generation of RILEWS will need a parametrization of these environments from a geotechnical point of view. Moreover, all RILEWS must be able to be automated, which involves computing capacities of various kinds; 300 to mitigate the calculation costs we suggest incorporating the available satellite precipitation products, but at lower spatial resolution (~10 km). Satellite estimations require validation of these outputs in areas with complex topography, like southern Chile (Zambrano-Bigiarini et al., 2017). Likewise, new geoscientific data interfaces like GSMaP will allow better integration with precipitation, complementing WRF products. One limitation of the present study is the quality of the RIL inventory used. South America presents a low density of recorded events, despite the high density of their occurrence. Future efforts 305 should be directed towards generating RIL identification records using remote sensor techniques (Guzzetti et al., 2020;Fustos et al., 2017;Jia et al., 2019) or numerical identification (Chikalamo et al., 2020;Guzzetti et al., 2020;Fustos et al., 2020a). To date, our database is the best available for the spatial location and date of generation in the study area. We suggest that alternatives should be considered in future to strengthen the generation of RIL databases in the Southern Andes with a larger number of events. This could help to strengthen future RILEWS in this area, improving their performance in 310 terms of sensitivity and specificity.

Conclusions
This work evaluated the implementation of a RILEWS based on a logistic model and forced by geomorphological and atmospheric conditions in the Southern Andes. For the first time in the Southern Andes, we showed how the WRF model can be integrated into RILEWS operating systems without the need to use ensembles, by use of bias correction processes. This 315 opens the door to the implementation of precipitation-based prediction models without costly computer iterations by ensembles of models (Yáñez-Morroni et al., 2018;Schumacher et al., 2020). New studies of LEWS in the Southern Andes should be directed towards increasing the RIL database currently available. In future we suggest evaluating alternatives to strengthen better quality RIL database generation in this segment of South America, completing the existing database from the records of the Chilean National Geological and Mining Service (Sernageomin). This could help to strengthen future 320 RILEWS in the Southern Andes, improving their performance in terms of sensitivity and specificity.
Logistic models proved their capacity to predict RIL events with AUC varying between 0.65 and 0.80, indicating their ability to represent RIL occurrence correctly. Despite the high relative sensitivity of M3, the models which presented high sensitivity and specificity were those which included precipitations on a daily scale (models 1, 2 and 4). Using the https://doi.org/10.5194/nhess-2021-317 Preprint. Discussion started: 8 November 2021 c Author(s) 2021. CC BY 4.0 License.
precipitation of the previous 7 days could improve this approach to representing soil moisture. There is no network of 325 moisture sensors in the area, so Model 4 should be incorporated as it allows this factor to be represented. Finally, we propose to use models M1 and M4 in conjunction.