Construct and evaluate the classification models of six types of geological hazards in Bijie city, Guizhou province,China

Debris flow, landslide, ground collapse, collapse and ground collapse are the dominating geological hazards in Bijie city, Guizhou province, which is situated in the area with high natural hazards in China. The primary purpose of this study is to construct different classification models by using the disaster conditioning factors of geological hazards and to evaluate the performance of the models in the classification of geological hazards in Bijie city. At the same time, the nonlinear relationship between various geological hazards and conditioning factors will be discussed. Firstly, the manual field survey 5 data of Bijie city in 2019 were applied to construct and draw inventory map of six geological hazards. Then 16 conditioning factors were established from various data sources. According to the ratio of 70:30, the geological hazard location points were randomly divided into the training and validation set to complete the training and verification process of the classification models. In order to select the optimal subset of the conditioning factors, the multicollinearity of these factors was assessed using tolerances and variance inflation factors(VIF) and Pearson’s correlation coefficient, and factors with multicollinearity 10 were excluded to optimize the model. Subsequently, ten classification models were structured, and the models were verified and compared by using the receiver operating characteristic(ROC), precision, sensitivity, Kappa coefficient and F1 values.In addition, the Friedman test was used to identify statistically significant differences between the results of the classification model used in this research. In general, average Area Under Curve (AUC) values under the ROC curves of the 10 classification models is above 0.8, indicating that all models have a corresponding high prediction ability. Among them, the average AUC 15 value(0.941), AUC values for individual geological hazards (collapse: 0.949, ground crack: 0.907, ground collapse: 0.952, landslide: 0.830, displacement flow: 0.963, slope: 0.922),Kappa coefficient (0.845), Macro F1(0.851) and Micro F1(0.878) of SVM all had the highest values.

practice by combining the characteristics of data sets and research areas. Therefore, in this study, we should establish , analyze and compare the performance of different classification models in the research area based on the disaster conditioning factors of geological hazards, so as to obtain the classification model applicable to the classification of geological hazards in Bijie area.
On this basis, we will also study the nonlinear relationship between all kinds of geological hazards and conditioning factors in research area. This paper is mainly composed of the following parts. The first part mainly elaborates the theoretical research foundation, 65 research purpose and main research content of this paper. The second part describes the general situation of the research area and its resources and environment. The third part recommends the classification methods and theories involved in this research. The fourth part is the data preparation stage, which introduces the data used in the research in detail. The fifth part is the conclusion and analysis of this research. The discussion and conclusion are located in the sixth part of the article.

Study area 70
Bijie city is located in the northwest of Guizhou province, where geological disasters are relatively serious. It lies between longitude 105°36 '-106°43' and latitude 26°21 . The total area of the city is nearly 26,900 square kilometers, and the highest elevation in the territory is 2,900.6 meters, which is also the highest point in Guizhou and the lowest elevation is 457 meters.The population of the study area is 70.298 million people, of whom 25.88% population belong to minority. Bijie belongs to the humid climate of the northern subtropical monsoon, with abundant rainfall. The average annual rainfall is 854-75 1444mm,while average annual sunshine and annual temperature were between 1140h and 1450h and 13.2°respectively.There are 37 kinds of proven mineral resources, of which coal occupies a dominant position, with a distribution area of 4,000 km2, occupying 45.9% of Guizhou's coal resources ). Near-horizontal (dip angle below 8°) and gently inclined (dip angle between 8°and 25°) coal seams are the main coal seam outputs in Bijie.The basin area of the Yangtze River Basin within the city is 25,600 km2, and the basin area of the Pearl River Basin is 1,239 km2, accounting for 95.39% and 4.61% 80 of the city's total area, respectively. Structurally, Bijie has a complex geological environment due to extensive faults (Yin et al., (2016)), which is dominated by karst topography and mountains and hills. The terrain in the area is high in the west and low in the east. The main structural changes that control the tectono-stratigraphic framework of Bijie and its adjacent areas are the Duyun, Guangxi, Central Indosinian, Late Yanshanian, and Himalayan movements. The outcrops are mainly sedimentary rocks, less magmatic rocks. In sedimentary rocks, the majority of them are carbonates. Next is the coal measure sand shale, 85 then is the purple sand shale and the purple sand mudstone, finally is the argillaceous rock kind.

Factors analysis
There may be a high correlation between the conditioning factors in the initial data set, which leads to the wrong systematic analysis, so we adopted multicollinearity analysis in statistics to solve this problem (Dormann et al., (2013)). The variance de-90 composition proportions (Schuerman, (1983)), the conditional index (Belsley,1991), Pearson's correlation coefficients (Booth et al., (1994)), and and the variance inflation factors (VIF) and tolerances (Hair, (2009);Dan and Richard, (2012)) are both multicollinearity quantization methods. Among them, VIF and tolerances are often used to check the multicollinearity of conditioning factors in disaster research (Bui et al., (2011)), while Pearson's correlation coefficients method is widely used in various fields (Dormann et al., (2013)).

95
In order to check the multicollinearity, the VIF and tolerances measure the variation in the standard errors of the conditioning factors; thus, the higher the standard errors, the greater the multicollinearity (Allison, (1999)). The VIF is greater than 10 or the tolerance is less than 0.1 indicating a potential multicollinearity problem in the data set (Hair, (2009) ;Keith, (2006)). Pearson's correlation coefficients method was used to evaluate the correlation coefficient of the two conditioning factors, and its formula was defined as formula 1. Pearson's correlation values> 0.7 dicates a high collinearity between the two factors (Booth et al., 100 (1994)).
Where cov(X, Y ) is expressed as the covariance of the conditioning factors X and Y , and σ X are σ Y the standard deviation of X and Y , respectively.

Logistic regression(LR)
Logistic regression can reveal the relationship between the target variables and multiple prediction variables and predict the occurrence probability of an event (Cox, (1959) formula of LR is as follows: Where α is a constant, n is the number of independent variables, x i (i = 1, 2, ..., n) is the predictor variables and β i (i = 1, 2, ..., n)is the coefficient of the LR.

Linear discriminant analysis(LDA)
Linear discriminant analysis is required to achieve low coupling between classes and high aggregation degree within classes, that is, the values of the intra-class dispersion matrix are smaller and the values of the inter-class dispersion matrix are larger (Fisher, (1936);Mclachlan, (2004)). Suppose an n-dimensional space has m samples and c classifications, respectively expressed as x 1 , x 2 , .., x m , n i represents the number of samples belonging to class i. LDA needs to select the n-dimensional column vector ϕ that maximizes J f isher (ϕ) as the projection direction. The formula of J f isher (ϕ) is as follows: Where µ is the mean of all samples and µ i is the mean of samples of class i.

Naïve Bayes Classifier(NBC)
Naïve Bayes Classifier assumes that the influence of the value of any attribute on a known class is independent of the values of the remaining attributes (Maron and Kuhns, (1960)). The purpose is to use the joint distribution of feature output Y 125 and feature X to solve the posterior probability, and based on the value of posterior probability for classification (Han et al., (2011)). For a sample set D with m samples, n features, and C k sample categories: n , y 1 ), (x 1 , x n , y m )}. The NBC can be expressed as:

130
Among them, P = (Y = C k ) is expressed as a prior probability, P (X j = x j |Y = C k ) is expressed as a conditional probability, and the denominator is expressed as a full probability according to the concept, so it is the same for all Ck. The above-mentioned posterior probability can be further expressed as:

Multi-layer perceptron (MLP)neural network 135
Multi-layer perceptron neural network is one of the most effective artificial neural networks. It consists of an input layer, one or more linear threshold units called hidden layers, and a final layer linear threshold unit called output layer. Its basic structure is shown in Fig.2. Each layer except the output layer includes biased neurons and is fully connected to the next layer. For each training instance, the MLP neural network algorithm first performs forward prediction and obtains the measurement error, then traverses each layer in reverse to measure the error contribution of each connection, and finally adjusts the connector weights 140 slightly to reduce the error (Haykin, (1998); Kavzoglu and Mather, (2003)). The weight initialization of the MLP neural network not only affects the final convergence result and convergence speed, but also avoids the problems of gradient vanishing and gradient explosion (Salakhutdinov and Hinton, (2006)). There are many methods for weight initialization. In this study, we used the Xavier initialization method (Glorot and Bengio, (2010)), which initializes the parameters to a uniform distribution in the following range: Where n k and n k+1 are represented as the number of neurons in the input layer and the output layer

Support vector machine(SVM)
The original support vector machine is a type of binary classifier, whose basic model is a linear classifier defined as spacing maximization on the feature space. It can also be used as a nonlinear classifier to solve the classification problem of nonlinear 150 data sets by introducing a kernel function (Vapnik, (1998)). In order to construct support vector machine classifiers suitable for multi-classification, there are two mainstream methods: the first method is to construct multiple type two classifiers and combine them to achieve multi-class classification, such as one-against -rest (Bottou et al., (1994)), one-against -one (Knerr et al., (1990)), directed acyclic graph SVM(DAGSVM) (Platt et al., (2000)).The second method is to directly consider the parameter optimization of all sub-classifiers simultaneously in an optimization formula (Weston and Watkins, (2005)). In this 155 paper, one-against-one method is adopted (Hsu and Lin, (2002)). The basic idea is to construct a second-class SVM classifier for any class of the training set, that is, for an M-class problem, we need to construct a 2 −1 M (M − 1) sub-classifier. When testing, all sub-classifiers are used to process all test data, and the category with the most votes is the category of test data.

Decision Tree(DT)
The learning of decision tree adopts a top-down recursive method. Its basic idea is to construct a tree with the fastest entropy 160 decline based on the measure of information entropy. The entropy value is 0 at the leaf node, and its basic structure is shown in Fig.3 (Xie and Liu, (2010)). At present, there are three algorithms of decision tree model: ID3,C4.5 (Quinlan, (1993)) and CART (Breiman, (2001)). In this study, C4.5 algorithm was selected for classification. The information gain measurement adopted by ID3 has an inherent bias, which preferentially selects the feature with more attribute values, owing to a relatively large information gain. In order to avoid this deficiency, the gain ratio is used as the criterion for selecting branches in C4.5.

165
The information gain ratio will penalize the feature with more values by introducing an item called split information. The split information of calculation feature A to data set D is expressed in formula 1, and the information gain ratio is expressed in formula 2. In addition, C4.5 compensates for the inability of ID3 to handle the continuum of feature attribute values.

SplitInf ormation(D, A)
Where n is the number of values of feature A, D i is the number of sub-samples with the same value of feature A, and g(D, A)is the information gain

K-nearest neighbor(KNN)
Among the existing classification methods, K-nearest neighbor classification is a simple, effective and nonparametric method (Cover 175 and Hart, (2003)). The overall idea is to make D as the training data set. When test set D appears, compare D with all samples in D and calculate the similarity (or distance) between them. Select the most similar samples in the first k from D, and the category of D is determined by the category with the most occurrence in the samples of k closest neighbors. The key part of the k-nearest neighbor algorithm is the distance function. The familiar measurement methods of distance are Euclidean distance, cosine, correlation and Manhattan distance etc. In the above common distance measurement methods, the difference between 180 different characteristics of the training data set and the test set is treated as the same by default. In practice, this approach often fails to meet the requirements. For example, in this study, the measuring unit of altitude is meters and the measuring unit of slope is degree. If these two attributes are treated equally in the calculation of distance, the judgment will be wrong because of the difference between the measurement standards of different characteristics. For this reason, in this study, we adopt the Mahalanobis distance, which is not affected by the dimension, and it is proposed by P. C. Mahalanobis to represent a mea-185 surement method of data covariance distance. It is an effective method to calculate the similarity of two unknown sample sets.
Different from Euclidean distance, it takes into account the relation between various characteristics and is scale independent.
The purpose of Mahalanobis distance is to normalize the variance, so as to make the relation between features more consistent with the actual situation. The formula is as follows: Among them, x = (x 1 , x 2 , x 3 , ..., x p ) T is an element in the X sample with p variables, and µ = (µ 1 , µ 2 , µ 3 , ..., µ p ) T is expressed as the mean.

Adaptive Boosting(Adaboost)
Adaptive Boosting is the most famous representative of Boosting algorithms (Freund and Schapire, (1995)). It changes the probability distribution of the data by increasing the weight of the samples misclassified by the previous weak classifier and 195 reducing the weight of the samples correctly classified. In this way, the data that are not properly classified will receive more attention from the weak classifier in the next round due to its increased weight. In addition, Adaboost will increase the weight of the weak classifier with small error rate and make it play a bigger role in the voting. However, the weight of the weak classifier with large error rate is reduced, so that it plays a small role in the voting. The specific implementation process of Adaboost adopts the idea of iteration. Each iteration only trains a weak classifier, and the trained weak classifier will participate 200 in the use of the next iteration (Fig.4). From right to left, you can see the final sum and the sign function, and before you see the left sum, the dotted line in the graph shows the iterations of the different rounds. Each iteration will train the Weak Classifier(i) through the data set data and data weight W(i), and obtain its classification error rate, so as to calculate its Weak Classifier weight alpha(i). Then, through the method of weighted voting, all weak classifiers are allowed to conduct weighted voting to get the final prediction output, and the final classification error rate will be calculated. Finally, if the final error rate is lower 205 than the set threshold (such as 5%), the iteration ends. If the final error rate is higher than the set threshold, the data weight will be updated as W (i + 1).

RandomForest(RF)
Random forest is an algorithm that integrates multiple trees with the idea of integrated learning (Breiman, (2001)). Its basic unit is the decision tree, and the output category is determined by the mode of the output category of the individual decision tree.In a random forest, random is the core, and its randomness has two meanings. The first is that it will randomly put back in the original training data and select the same amount of data as the training sample. The second is to establish a decision tree 225 by selecting some features from the random selection features.These two kinds of randomness make the correlation between each decision tree small and further improve the accuracy of the model.  as the main reason. In the study, we will use 70% of the six geological hazards to train various classifiers, and the remaining 30% will be used to verify the classification accuracy of the classifier. What needs to be declared is to make the training results more accurate,the selection of the set is based on the random principle.

Geological hazards conditioning factors
Based on the nature of Bijie area, data availability, and quasi-empirical and statistical criteria in the literature, we will choose altitude , aspect ,slope, curvature , plan curvature , profile curvature , stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI),distance to faults , distance to rivers , distance to roads , impact of mining activities,lithology , landuse and rainfall as the landslide controlling factors (Zhou et al., (2016)). Among them, elevation, slope, 250 aspect, curvature, plane curvature, profile curvature, SPI, TWI, STI, distance to faults, and lithology are all geologically induced factors. Rainfall is a meteorological trigger, while distance to roads, impact of mining activities and landuse are artificial triggers (Bai et al., (2010)).
Elevation, aspect, and slope have been considered as the most important terrain factors closely related to geological hazards  2018)). The mining activities of coal resources will also cause geological hazards especially landslides, since mining will increase the vertical motion of the ground, which will likely lead to lateral expansion of subsidence. Therefore, disasters are divided into two categories according to whether they are in the mining area or not. Disasters in the mining area are represented as 1, while 260 those in the non-mining area are represented as 0.The factors of distance to faults, rivers and roads have an important impact on the spread and size of geological hazards in the study area (Pham et al., (2015), Pham et al., (2016)). The data of faults ,rivers and roads data was acquired from Bijie's geographic database,and then were applied by buffer analysis respectively.
Moreover, according to the position of the hazards, the values of different grades will be assigned, and the closer the fault and other factors are, the higher the corresponding values will be.Landuse came from map of land types surveyed in Bijie. Rainfall 265 data was based on the average daily rainfall of 7 counties in the Bijie from 2017 to 2019.Lithology plays a crucial role in the formation of geological hazards,as it can be directly related to the slope stability and different lithology will also affects slope deformation (Guo et al., (2015); Saha et al., (2002)).The lithology in this research is based on the lithology map in Bijie.

Geological hazards conditioning factors analysis 270
In this study, multicollinearity among the conditioning factors were identified using the tolerances and variance inflation factors (VIF) ( Table 2) and Pearson's correlation coefficient (Fig.6).The results show that the profile curvature and the plane  This is followed by a correlation of curvature and plane curvature of 0.832. At the same time, the two values are greater than 0.7, indicating that the curvature, profile curvature and plane curvature are directly strongly collinearity. Therefore, through the above analysis, we need to remove the plane curvature and profile curvature from the prediction processes.

Parameters for classification models
In this section, we will use the 10 classification models introduced in Sect. 3 for the classification and modeling of 6 types 280 of geological hazards in Bijie area. The source code of all the classification methods mentioned earlier is implemented in aspect, slope,curvature, SPI,STI,TWI, lithology,rainfall, landuse,mine or not, distance to rivers, faults and roads were selected to establish the classification models. In this study, the parameters of various classifiers are shown in Table 3. In LR modeling, L2 regularization and Newton method were used as loss function training samples, and the logistic regression equations of 6 types of geological hazards was obtained as follows. LDA used singular value decomposition(SVD) to solve the optimization algorithm. Gaussian naive bayes algorithm is selected in NBC model and the prior probability of sample data is calculated by 290 maximum likelihood estimation(MLE). In the MLP model, ReLU was selected as the activation function, and random gradient descent with minbatch equal to 64 was selected for algorithm optimization in the training. The network structure was set to include 3 hidden layers. The SVM kernel function is set to the gaussian kernel function with a coefficient of 1.5, and the penalty coefficient of the objective function is set to 20. In the decision tree model, information entropy is used to calculate the impurity of nodes. In order to solve the over-fitting problem caused by the impurity of optimization, the maximum depth of 295 the decision tree is limited to 5, and all the child nodes must contain at least 3 training samples, otherwise branching will not occur. The KNN model sets the number of adjacent nodes to 5. In Adaboost and GBDT models, in order to prevent overfitting, the maximum iteration number is set to 50, the learning rate is 0.5, and the maximum iteration number is set to 100 in RF.

305
The receiver operating characteristic(ROC) curve (Hanley and McNeil, (1983)) is cutoff-indepedent, allowing intermediate states to exist. The Area under Curve (AUC) refers to the area under the ROC curve, which can be used to intuitively evaluate the performance of the classifier. Its value is between 0.1 and 1, and a classifier with a larger AUC value is considered better.Using the validation data set, ROC curve, AUC evaluated the performance of 10 classifier models and the results of ROC curve and AUC are shown in Fig.7. It can be seen that all models have good classification ability, and the ROC curve basically shows a 310 steep trend, that is, the higher the TPR, the lower the FPR. This is because TPR represents the coverage of the model's predicted As a multi-classification problem, we also used common multi-class evaluation indicators which includes Kappa index, Macro F1, Micro F1 to evaluate the classification models,the results are shown in Table 4 below. The value of Kappa index of all models ranges from 0.560 to 0.845, indicating that the prediction results and observation results are basically consistent.
Usually, Macro F1 and Micro F1 are used to evaluate the average performance of the whole classification and classification with high both values works well (Liu C et al. ,2017). In all models, SVM has both the highest Macro F1 and Micro F1 values,335 which is consistent with the results reflected by the AUC value. Figure 9. Precision and sensitivity of ten classification models Finally, the 5% significance level Friedman test was used to perform statistical nonparametric significance tests on the results of all classification models. The results showed that the p-value (0.269) was greater than 0.05. Therefore, the original hypothesis was accepted and the classification results of the model were not significant difference.  incidence of geological hazards. Debris flows, landslides, unstable slopes, ground crack, ground collapse and collapse are the major geological hazards in Bijie city. Using conditioning factors to construct different classification models, analyzing and comparing the performance capabilities of the models, and studying the nonlinear relationships between conditioning factors and geographical disasters are the focus of this study. In this article, we evaluated and compared a total of ten classification models, including discriminant analysis and logistic regression in traditional classification methods, as well as many new clas-360 sification machine learning methods and techniques (such as Adaboost, GBDT and random forest algorithms in ensemble learning) ,and some of the most popular methods (such as MLP neural network models and SVM).

Variable importance
In general, in terms of the classification results of the six types of geological hazards, the average AUC values under the ROC curves of the ten classification models selected in this study are all greater than 0.8 (Fig.7). It can also be found that SVM is significantly better than other models in terms of overall ( (2013)). Therefore, this study also provided precision and sensitivity assessment measures (Fig.8). It can be seen that the values of precison and sensitivity are between 0.603-0.967 and 0.600-0.917, respectively.

370
At this time, although SVM still has the most maximum value, the highest accuracy and sensitivity of different geological disasters are no longer completely focused on the same model. The highest precision value of debris flow(0.735) belongs to RF model, and the highest sensitivity value of ground collapse(0.870) belongs to MLP neural network model.
Kappa index, Macro F1 and Micro F1 are also used to evaluate these classification models.  (Irigaray et al., (2007); Costanzo et al., (2012)), although for different geological hazards have some selection criteria factor method is proposed, such as GIS matrix (Cross, (1998)) Cross combination method and linear correlation (Irigaray et al., (2007)), the final choice of conditioning still need to combine the nature of the study area, data availability, literature and statistical standards (Zhou et al., (2016)). In general, topography, geology, hydrology, meteorology, and the impact of human activities have been widely used as geological hazard regulators.

385
When establishing the classification model, some of the conditioning factors in the initial data set will not bring good prediction ability to the model, but will generate noise and thus reduce the performance of the model. Therefore, feature selection is needed before establishing the model (Martínez-Álvarez et al., (2013)). In this study, we used VIF, tolerance, and Pearson's correlation coefficient method to detect multicollinearity between the conditioning factors in the initial data set (Table2 and Fig.6). The results showed that the tolerance and VIF values of profile curvature and plane curvature did not reach the threshold 390 value, and in Pearson's correlation coefficient method, the curvature showed strong collinearity with profile curvature and plane curvature. Therefore, through feature selection, we will remove the plane curvature and profile curvature from the classification model. After the construction of the classification model, we evaluated the contribution of the conditioning factors to different geological hazards and different models (Fig.9). We find that the same conditioning factors have disparate meanings for different geological disasters. For example, altitude and slope are the most important factors in collapse, but are far less important in 395 surface crack than mining activities. In the meantime, the same conditioning factor for the identical geological hazard will have diverse meanings in different models. For example, excluding altitude and slope, the importance of other factors to collapse will be changed depending on the types of models used. In the LDA model, the contribution rate of mining activities is only 0.025. Whereas, its contribution rate is the third most importance after elevation (0.164) and slope (0.131) in the SVM model.
Apart from the differences, we can also find some similarities. The contribution rates of elevation, slope and rainfall are 400 higher in landslide, debris flow and unstable slope. This is reasonable considering that more than 90% of the landslides (94.4%), debris flows (98.3%) and unstable slopes (96.3%) in Bijie all occur above 1000m. The average slope at the occurrence of landslide is 15.542°, debris flow is 15.421°, and unstable slope is 17.713°. As for rainfall effects, 87.5% of landslides, 66.1% of debris flows and 63.1% of unstable slopes will be affected by rainfall. For ground collapse and ground crack, the disasters caused by human factors are far more than those caused by natural factors (Table 1), which can be explained that the 405 contribution rate of mining activities to both is the largest in all models.