Landslide susceptibility assessment based on different machine-learning 1 methods in Zhaoping County of eastern Guangxi

: Regarding the ever increasing and frequent occurrence of serious landslide disaster in 14 eastern Guangxi, the current study were implemented to adopt support vector machines (SVM), 15 particle swarm optimization support vector machines (PSO-SVM), random forest (RF), and 16 particle swarm optimization random forest (PSO-RF) methods to assess landslide susceptibility by 17 Zhaoping County. To this end, 10 landslide disaster-related causal variables including digital 18 elevation model (DEM)-derived, meteorology-derived, Landsat8-derived, geology-derived, and 19 human activities factors were selected for running four machine-learning (ML) methods, and 20 landslide susceptibility evaluation maps were produced. Then, receiver operating characteristics 21 (ROC) curves, and field investigation were performed to verify the efficiency of these models. 22 Analysis and comparison of the results denoted that all four ML models performed well for the 23 landslide susceptibility evaluation as indicated by the values of ROC curves -- from 0.863 to 0.934. 24 Moreover, the results also indicated that the PSO algorithm has a good effect on SVM and FR 25 models. In addition, such a result also revealed that the PSO-RF and PSO-SVM models have the 26 strong robustness and stable performance, and those two models are promising methods that could 27 be transferred to other regions for landslide susceptibility evaluation. 28


31
The geological environment in eastern Guangxi is fragile and landslide disaster occur 32 frequently, which not only causes huge economic losses and ecological damage, but also seriously  In general, each of the above ML models has been widely applied to landslide prediction and evaluation and compared the optimized results. Therefore, the objective of the present paper is to:

78
(1) determine the landslide susceptibility assessment factors by multi-source data fusion and 79 correlation factor analysis; (2) optimize SVM and RF models by using a particle swarm  belt is cut by a series of near-SN trending faults and it forms many secondary depression areas.

102
Under the influence of multi-stage tectonic movements, joint fissure is developed in rock mass 103 and rock is weathered seriously, which provides the basic conditions for the formation of

184
Landslide susceptibility evaluation has been carried out in nine main processes (

217
where represents the number of training samples, represents the dimension of the input 218 vector, ‖ ‖ represents the norm of the superplane normal vector, and is the displacement term.

219
The Lagrangian multiplier rule is introduced to find the extreme value, and the auxiliary 220 function is generated as follows: where the is Lagrange multiplier. method is adopted to divide the susceptibility into five levels: extremely high, high, middle, low, 246 and extremely low areas (Fig. 4a).  Table 2: 255 Table 2 The main steps of the PSO-SVM model 256 (1) Initialization: The initial parameters of the PSO-SVM model are set, including species size, iteration times, learning factor, inertia weight, initial particle and particle initial velocity. The particle vector represents a SVM model corresponding to different and .
(2) Optimization: In the process of particle optimization, each solution of the optimization problem is called a particle in the search space. The particle adaptation value (f i ) is calculated according to the fitness function. Adaptive function is the measure basis of the selection individual, and the individual is evaluated by the fitness function.

(3) Replacement:
On the basis of the objective function, the adaptive value of each particle (f i ), the population individual optimal solution f i (p best ), and the population global optimal solution f i (p gbest ) were calculated and compared. If f i ＜ f i (p best ), then the optimization solution of the previous round is replaceed with the new adaptation value (f i ), and the particles of the previous round is replaced with the new particles, and then the f i (p best ) of each particle is compareed with the f i (p gbest ) of all particles. If f i (p best )＜f i (p gbest ), the optimal solution of each particle is used to replace the optimal solution of all the original particles, and the current state of the particles is saved at the same https://doi.org/10.5194/nhess-2020-251 Preprint. Discussion started: 27 August 2020 c Author(s) 2020. CC BY 4.0 License. time.

(4) Determination:
If the f i of the individual in the population meets the requirements, or if the evolutionary algebra is terminated, then the calculation is ended, and the particle individual corresponds to the optimal and combination, otherwise go to step (2) to continue the iteration.

(5) Set up the PSO-SVM model:
The global optimal PSO-SVM model is obtained by using the optimal parameters of the SVM with the optimal and combination to train the training samples. The susceptibility of landslide is quantitatively evaluated and divided into five levels: extremely high, high, middle, low, and extremely low areas (Fig. 4b).

267
The construction process of the RF model for landslide susceptibility assessment in 268 Zhaoping County is as Table 3: 269 Table 3 The main steps of the RF model (2) Get multiple training datasets: The K new training subsets of {D 1 , D 2 , …, D K } were obtained by K times random sampling with replay from the original training data set D by using Bagging algorithm. At the same time, each of the K training subsets contains n instances, in which there is repetition.

(3) Training to generate decision tree:
For each training subset D i (1≤i≤K), the decision tree without pruning is generated by the following procedure: Firstly, let the number of predictive attributes in the training sample be M, F (F<M) attributes are randomly chosen from M to compose a random characteristic subspace X i , and those as the split attribute sets of the present node of the decision tree. In the process of generating the RF model, the value of F remains unaltered; Secondly, the node was split according to the optimal split attribute of each node selecting from the random feature subspace X i by the decision tree generation algorithm; Thirdly, every tree grows completely and has no pruning process. The corresponding decision tree h i (D i ) is generated by each training set D i ; Fourthly, the FR model of {h 1 (D 1 ), h 2 (D 2 ), …, h i (D i )} was generated by combining all the generated decision trees. And the corresponding classification result of {C 1 (X), C 2 (X), …, C K (X)} is obtained by using testing of each decision tree h i (D i ) with test set sample X; Finally, according to the classification results of K decision trees, the final classification results corresponding to the test set sample X was determined by classification results with large number of decision trees by voting method.

(4) Dividing levels:
According to the above steps, the landslide susceptibility of Zhaoping County is divided into 5 levels (Fig.   4c). 271 In order to further compare the performance of different models in the evaluation of the 272 susceptibility of the landslide, the parameters of the weighted FR are optimized by the PSO 273 algorithm, and the main steps are as Table 4: 274 Table 4 The main steps of the PSO-FR model 275

Weighted random forest based on particle swarm optimization algorithm (PSO-RF)
(1) Initialization: The initial parameters of the PSO-FR model are set, including number of decision tree R, pruning threshold ε, number of predicted test samples X, and initial value of random attributes m.
Using rhe Bootstrap algorithm, R training sets are randomly produced, and X pre-test samples are selected in each training set.

(3) Generating decision tree:
A total of R decision trees is generated by using the rest of the samples of each training set. In the process of generating decision trees, m attributes are selected from all attributes as the decision attributes of the present node before each attribute is selected.

(4) Determination:
When the number of samples included in the node is less than the threshold ε, the node is taken as the leaf node, and the mode of the target attributes is returned as the classification result of the decision tree.

(5) Setting up the PSO-SVM model:
When all decision trees are produced, each decision tree is pre-tested and its weights are calculated by using the following formula: where , is the classified correct number of samples of r decision trees, and X is the number of pre-tested samples.

(6) Calculation of the classification results:
The classification results of the model are calculated by the following formula: Taking the classification results as the fitness values, the PSO algorithm is applied to optimize the parameters of formula (6) iteratively and determined the parameters of the final RF model.

(8) Running
Finally, the optimized parameters are input into the model, and the output results of the model are obtained.
According to the results, the susceptibility of landslide is divided into five levels (Fig 4d).  Simultaneously, Fig. 4 indicates that the high susceptibility levels for landslide is mainly  the RF and PSO-RF models to the proportion of landslide samples, it is necessary to carry out 343 sample screening before using RF and PSO-RF models to evaluate the susceptibility of landslide.

344
In order to further verify the accuracy of the four ML models, the ratio of grid number of 345 landslide points that fall into different susceptibility levels was counted, as shown in Table 5: 346  Table 5 indicates that the proportions of hazards points falling into extremely high and high

373
The improvement of performance for landslide susceptibility models is still the focus of 374 widespread concern in the disaster research community, because the capability of the models is   At the same time, the results described in the present study proved that the prediction results 390 of four ML models are consistent with the field survey results by comparing Fig. 4 and Fig. 6,   391 which verified the validity of the four ML models again. This also proved that the ML models 392 have excellent performance in evaluating and predicting the occurrence of landslide. Furthermore, the results can provide informational service and decision support for landslide early warning, 394 land use planning and environmental management for local government departments.

395
In addition, our study found that the10 disaster-related factors selected in this paper can fully 396 reflect the natural geological and ecological environment characteristics of the study area, and 397 have a great correlation to the occurrence of landslide disasters. Simultaneously our study also 398 found that the selection of training samples will affect the susceptibility evaluation results during 399 the process of landslide susceptibility evaluation using four ML methods. It is worth mentioning 400 that there is a great difference between the extremely low and extremely high susceptibility