Exploring the potential relationship between the occurrence 1 of landslides and debris flows : A new approach 2

Abstract. The aim of the present study is to explore the potential relationship between landslides and debris flows by establishing susceptibility zoning maps separately with the use of random forest. Longzi township, Longzi County, located in Southeastern Tibet, where historical landslide and debris flow are commonly occurred, was selected as the study area. The work has been carried out with the following steps: (1) A complete landslide and debris flow inventory map was prepared; (2) Slope units and 11 controlling factors were prepared for the susceptibility modelling of landslide while watershed units and 12 factors for debris flow; (3) Establishing susceptibility zoning maps for landslide and debris flow, respectively, with the use of random forest; (4) The performance of two models are verified using ROC curve, the values of AUC and contingency tables; (5) Putting the high or very-high-class watershed units in the debris flow susceptibility zone map as the base map to observe its coverage by slope units of different classes; (6) The landslide zoning map was put at the bottom floor and analyzed the distribution of high or very-high-class slope units in watershed units; (7) transforming the slope units into points and distributed them on the watershed units. Two models based on random forest have demonstrated great predictive capabilities, of which accuracy was close to 90% and the AUC value was close to 1. The loose sources carried out by the debris flows are not necessarily brought by the landslides although most landslides can be converted into debris flows. The area prone to debris flow does not promote the occurrence of landslides. A susceptibility zoning map composed of two or more natural disasters is comprehensive and significant in this regard.


Rainfall is the only triggering factor to be considered for both landslide and debris flow in this paper, 147 which was reclassified into six classes (Fig. 2 and Fig.3). Slope angle is frequently employed in both 148 https://doi.org/10.5194/nhess-2020-127 Preprint. Discussion started: 27 April 2020 c Author(s) 2020. CC BY 4.0 License.   155 Statistical models for landslide susceptibility zonation reconstruct the relationships between 156 dependent and independent variables using training sets, and verify these relationships using 157 validation sets (Guzzetti et al., 2006a,b), which usually implies the partitioning of the inventory in 158 subsets. The sampling strategy affects the results of the susceptibility map (Yilmaz, 2010). Based 159 on temporal, spatial or random criteria, the partition of landslide inventories can be made (Chung 160 and Fabbri, 2003) and the most applied one is a one-time random selection (Reichenbach et al., 2018). 161

Sampling strategies and validation
In the current study, the random partition was used due to existing constrains with the temporal and the 162 spatial partition. Therefore, sample data was divided into two parts: 70% of the data was selected as 163 training data to create a prediction model, and the remaining 30% of the data was used for validation.

164
The computation of the area under the curve (AUC) is the most popular metrics to estimate 165 the quality of model , which has been applied for ROC curves ( Green and Swets, 1966). It is one 166 of the most commonly used indicators. A typical two-entry confusion matrix, including true 167 positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), is another 168 common index. In current study, both ROC curve and the contingency tables were used to 169 https://doi.org/10.5194/nhess-2020-127 Preprint. Discussion started: 27 April 2020 c Author(s) 2020. CC BY 4.0 License. RF has the ability to reduce errors caused by unbalanced data, which is suitable for susceptibility 178 assessment. 179 In this study, the scikit-learn package (Pedregosa et al.,2011) in the programming software 180 python version 3.7 was used for the modeling. The number of trees (k) and the number of 181 predictive variables used to split the nodes (m) are two user-defined parameters required to grow a 182 random forest (Ahmed et al.,2016). In order to ensure the algorithm convergence and good 183 prediction results, the number of trees (k) has been fixed to 500 and the number of predictive 184 variables (m) has been selected as 5 (Breiman et al.,2001). 185

187
In this study, the predictive accuracy, ROC curves and AUC values of the RF model using training 188 data are showed in Table 1 and The task of validating the predicted results is the critical strategy in prediction models as 195 shown in Table 3 and (Fig. 4). Consequently, the values of TN and TP were 92.90% and 90. 0%, 196 respectively. It can be seen that the model has also a great performance in terms of AUC with 197 value of 0.977. In comparison with the training model, the accuracy and AUC values have slightly 198 decreased, but still perform well. 199 The landslide susceptibility map was also reclassified into five classes: very low (0~0.2), low 200 179, accounting for 17% (Fig.6). Disaster points were mostly in the dark (red or orange) areas. 205 The units belonging to moderate class accounted for the smallest proportion, at 13% (Fig.7). 206 The controlling factors with significant effects were selected and normalized as shown in 207 Table 2. The weight values of slope angle, distance to fault, plan curvature and topographic wetness 208 index was 0.21, 0.19, 0.17, 0.13 respectively, which was closely related to the occurrence of 209 landslide. The weight values of distance to road, maximum elevation difference, profile curvature 210 and elevation are less than 0.1 as 0.08, 0.08, 0.06, and 0.05, respectively (Fig.7). 211 https://doi.org/10.5194/nhess-2020-127 Preprint. Discussion started: 27 April 2020 c Author(s) 2020. CC BY 4.0 License.

212
The debris flow susceptibility model perform well with a very high TN and TP values as 90.90% 213 and 91.18%, respectively. In terms of AUC, the model has also a great prediction performance 214 with the value of 0.979 (Fig.4). Three evaluation statistics also indicate a reasonable Table 1 shows that in the 30% sample data used for verification, the values of TN and TP should be consistent or complementary. The fact that the appropriate prediction method and 339 mapping units applied to the two disasters makes it possible to merge the two zoning maps .In 340 addition, two natural disasters with potential relationship are simultaneously reflected in the same 341 susceptibility zoning map, which can better guide the implementation of engineering, such as 342 landslide-debris flow disaster chain. 343

344
In this paper, susceptibility prediction models for landslides and debris flows are established 345 through random forest, respectively and the performance of the models are excellent in terms of 346 accuracy and goodness of fit. The potential relationship between landslide and debris flow is 347 discussed by the superimposition of two zoning maps and the following conclusions can be drawn: 348 (1) The landslide and debris flow susceptibility prediction models based on random forest have 349 great performance of accuracy and goodness-of-fit and have the ability to analyze the relative 350 importance of different impact factors, which is suitable for the evaluation of natural disasters; 351 (2) Although most landslides will be converted into debris flows, the landslides are not 352 necessarily the source of debris flows, and the loose sources carried by the debris flow are not 353 necessarily brought by the landslides; 354 (3) By comparing the extent of the landslide affecting the debris flow, the impact of the debris 355 flow on the landslide is not obvious, which indicates that the area prone to debris flow does not 356 promote the occurrence of landslides; 357 (4) A susceptibility zoning map composed of two or more natural disasters is more 358 comprehensive and significant, which provides valuable reference for researchers and engineering 359 https://doi.org/10.5194/nhess-2020-127 Preprint. Discussion started: 27 April 2020 c Author(s) 2020. CC BY 4.0 License. https://doi.org/10.5194/nhess-2020-127 Preprint. Discussion started: 27 April 2020 c Author(s) 2020. CC BY 4.0 License.