A multivariate statistical method for susceptibility analysis of debris flow in southwestern China

Southwestern China is characterized by many steep mountains and deep valleys due to the uplift activity of the Tibetan Plateau. The 2008 Wenchuan earthquake left large amounts of loose materials in this area, making it a severe disaster zone in terms of debris flow. Susceptibility is a significant factor of debris flows for evaluating their formation and impact. Therefore, there is an urgent need to analyze the susceptibility to debris flows of this area. To quantitatively predict the susceptibility of the area to debris flows, this study evaluates 70 typical debris flow gullies, which are distributed along the Brahmaputra River, Nujiang River, Yalong River, Dadu River, and Ming River, as statistical samples. Nine indexes are chosen to construct a factor index system and then to evaluate the susceptibility to debris flow. They are the catchment area, longitudinal gradient, average gradient of the slope on both sides of the gully, catchment morphology, valley orientation, loose material reserves, location of the main loose material, antecedent precipitation, and rainfall intensity. Following this, an empirical model based on the Type I quantification theory is established for susceptibility prediction for debris flows in southwestern China. Finally, 10 debris flow gullies upstream of the Dadu River are analyzed to verify the reliability of the proposed model. The results show that the accuracy of the statistical model is 90 %.

Abstract. Southwestern China is characterized by many steep mountains and deep valleys due to the uplift activity of the Tibetan Plateau. The 2008 Wenchuan earthquake left large amounts of loose materials in this area, making it a severe disaster zone in terms of debris flow. Susceptibility is a significant factor of debris flows for evaluating their formation and impact. Therefore, there is an urgent need to analyze the susceptibility to debris flows of this area. To quantitatively predict the susceptibility of the area to debris flows, this study evaluates 70 typical debris flow gullies, which are distributed along the Brahmaputra River, Nujiang River, Yalong River, Dadu River, and Ming River, as statistical samples. Nine indexes are chosen to construct a factor index system and then to evaluate the susceptibility to debris flow. They are the catchment area, longitudinal gradient, average gradient of the slope on both sides of the gully, catchment morphology, valley orientation, loose material reserves, location of the main loose material, antecedent precipitation, and rainfall intensity. Following this, an empirical model based on the Type I quantification theory is established for susceptibility prediction for debris flows in southwestern China. Finally, 10 debris flow gullies upstream of the Dadu River are analyzed to verify the reliability of the proposed model. The results show that the accuracy of the statistical model is 90 %.

Introduction
Debris flows are a common geological hazard in mountainous areas that transport large amounts of sediment downslope and cause serious damage to dwellings, roads, and other structural facilities. China has a chiefly mountainous topography and is one of the most debris-flow-prone countries in the world. Up to March 2019, approximately recorded 50 000 debris flows have occurred in China (Di et al., 2019). A significant percentage of these debris flows are distributed in southwestern China, particularly in the Wenchuan earthquake area, where large amounts of loose material were produced by the earthquake-induced landslides (Xu et al., 2012;Huang et al., 2015;Dai et al., 2017).
Due to the complex nature of debris flows, it is quite difficult to fully understand their initiation mechanism and precisely forecast their occurrence (Brayshaw and Hassan, 2009;Gao et al., 2019). The uncertainty of debris flows poses a significant threat to human life in downstream areas (Schürch et al., 2011). Debris flow susceptibility expresses the occurrence possibility of debris flow in an area with respect to its geomorphologic characteristics (Kappes et al., 2011;Bertrand et al., 2013). Therefore, susceptibility analysis is an essential step for conducting risk assessments of debris flow hazards (Di et al., 2019;Zou et al., 2019).
Debris flow susceptibility analyses include two steps: (1) identification of the potential source areas and (2) prediction of the possible deposition areas (Kang and Lee, 2018). In the literature, a large number of prediction models have been proposed for the susceptibility analyses of debris flows. For the first step, statistical models that use various environmental factors contributing to possible instabilities are well established. For example, Blahut et al. (2010) performed susceptibility assessment for the source areas of landslide-induced debris flows in the Valtellina Valley based on bivariate statistics. Bertrand et al. (2013) used two multivariate statistical models, a linear discriminant analysis (LDA) and a logistic regression (LR), to analyze the debris flow susceptibility of upland catchments. Jomelli et al. (2015) proposed a Bayesian hierarchical probabilistic model to investigate how debris flows respond to environmental and climatic variables in the French Alps. Carrara et al. (2008) discussed the application of different statistical models to debris flows in Val di Fassa, Trentino, Italy. Lucà et al. (2011) compare bivariate and multivariate statistical models for the evaluation of gullying susceptibility in northern Calabria, southern Italy, and concluded that multivariate statistical models were found to be the best model for predicting the debris flow susceptibility of the study area. For the second step, the concept "angle of reach" was widely used in the empirical models to predict the run out distance of debris flows (Hürlimann et al., 2012;Horton et al., 2013). Recently, many numerical models have been proposed to simulate the propagation of debris flows and predict the deposition area. For example, Pirulli and Sorbino (2008) analyzed the propagation of potential debris flows in southern Italy using two numerical codes RASH3D and FLO2D. Beguería et al. (2009) proposed a two-dimensional model based on numerical integration of the depth-averaged motion equations to predict debris flow propagation over complex terrain near Lienz, East Tyrol, Austria. Huang et al. (2015) presented a numerical model based on the smoothed particle hydrodynamic (SPH) method to calculate the run out distance of catastrophic debris flows that occurred in the Wenchuan earthquake area. Gregoretti et al. (2016) used a cell model to simulate a debris flow that occurred on the Rio Lazer. Moraci et al. (2017) performed debris flow susceptibility zoning of debris flows in the Province of Reggio Calabria based on the SPH method. Some recent analysis methods of debris flow susceptibility can be found in Cama et al. (2017), Prieto et al. (2018), and Rosatti et al. (2018).
The previous studies mentioned above have attempted to conduct debris flow susceptibility analysis in specified regions. Southwestern China is characterized by steep mountains and deep valleys, and is strongly affected by the uplift activity of the Tibetan Plateau. Moreover, southwestern China has abundant loose material after the 2008 Wenchuan earthquake. Therefore, a series of large-scale debris flows have been occurred during the rainy seasons in southwestern China (Wu et al., 2019). In the literature, many models for debris flow risk prediction in this area have been proposed. For example, Xu et al. (2012) assess debris flow susceptibility based on an information value model and Geographic Information System (GIS) in Sichuan, China. Wang et al. (2016) adopted a self-organizing map method to analyze the susceptibility to debris flows at the Wudongde Dam site in southwestern China. Li et al. (2017) carried out a susceptibility analysis on debris flows also in the Wudongde Dam area using the fuzzy C-means algorithm (FCM). Recently, Liu et al. (2018) presented a comprehensive risk assessment model based on semiquantitative methods to quan-tify the risk level of each zone in southwestern China. Di et al. (2019) developed a gradient boosting machine (GBM) to predict the susceptibilities of debris flows in southwestern China. Wu et al. (2020) implemented logistical regression models to identify the areas that are susceptible to debris flow formations in Sichuan Province, China. Through the above research, some promising results have been achieved concerning the susceptibility analysis of the debris flows in southwestern China. This work aims to provide a multivariate statistical method for susceptibility analysis of debris flow in southwestern China. A total of 70 debris flow gullies in southwestern China were analyzed, and nine key indicators were extracted through the initial analysis of the debris flows. Through multivariate statistics, an empirical formula of susceptibility was established, which was then validated with the data of the 10 debris flow gullies upstream of the Dadu River. It is worth noting that this work is confined to identifying the potential debris flow source areas in southwestern China and neglects the run out of the phenomenon.
2 Characteristics of the debris flows in the study area Southwestern China is characterized by steep mountains and deep valleys and is strongly affected by the uplift activity of the Qinghai-Tibet Plateau. Furthermore, there is abundant loose material and rainfall in this area. Therefore, it is a severe disaster zone in terms of debris flow. In the past 3 years, 70 typical debris flows distributed along the Brahmaputra River, Nujiang River, Yalong River, Dadu River, and Ming River are investigated. The location of the debris flows is shown in Fig. 1, and some typical debris flows are shown in Fig. 2. Based on the field investigation, the characteristics of the five water catchments are summarized as follows.
1. Upstream of the Brahmaputra River, 18 debris flow gullies along the Dagu River and Jiexu River are investigated. The lithology in this area is the irruptive rock of the late Yanshanian-Himalayan epoch, with a wide distribution of granodiorite. The average annual rainfall in this area is about 540 mm and concentrates mostly in summer. Large-scale ice-melting-type debris flow once occurred in this region. However, in recent years, the debris flows in this area are mainly caused by precipitation. Material reserves are abundant in the valleys, whereas unstable materials are found less frequently and the deposit zone is small. It is found that most of the debris flows in this area are in the decline phase, and that most debris flow gullies are in the low-frequency category.
2. In the midstream of the Nujiang River catchment, 11 debris flow gullies located in the Zuogong River section are investigated. The stratum mainly includes the Permian Nacuo group slate and Triassic Wapu group marble. As this region is located in the subtropical zone south of the Himalayas, it is characterized by a humid climate and plentiful precipitation. This leads to an extensive distribution of debris flow gullies.
3. Midstream in the Yalong River catchment, 27 debris flow valleys are investigated, which belong to a plateau climate zone with complex meteorological and hydrological conditions. The concentricity and suddenness of rainfall provide hydraulic conditions for the debris flow breakouts. Collapses and landslides in the valley occur frequently, which provide abundant material resources for the debris flow occurrence. Moreover, the debris flow activity is intensified by unreasonable human engineering activities, such as deforestation and accumulation of highway waste residue.
4. In the Dadu River catchment, 42 gullies in the midstream and the upstream are surveyed. This area is characterized by intense new tectonic movement, high earthquake intensity, and rock fragmentation on the mountain surface. Debris flow, collapse, and other geological disasters are widely distributed, and the deposit zone of the debris flow is large. The maturity of the valley is high.
5. In the Minjiang River catchment, the Wenchuan River section are surveyed, and 32 debris flows are investi-gated. This region is characterized by abundant loose materials, frequent debris flows, and a high possibility of the breakout of large-scale debris flows. Most of these debris flows are intensive in activity and have not declined in recent times.

Investigation and statistical data
In total, 70 debris flow gullies distributed in five water catchments in southwestern China were investigated from the gully outlet to the watershed over the past 3 years. This work includes the investigation of the watershed terrain, geological structure, outbreak scale, loose material distribution, processes of occurrence and movement, frequency of debris flows, and so on. The role of each factor causing instability of the source materials are analyzed. The antecedent precipitation can reduce the soil shear strength, and has an important influence on the formation and the scale of debris flows (Shieh et al., 2009). Therefore, the precipitation data before the outbreak of debris flows was collected from local meteorological bureaus and used as one of the main influence factors to assess the susceptibility of debris flows in this study. In this work, the antecedent precipitation is classified into three categories: inadequate, medium, and adequate. The classification criteria are listed in Table 1.

Field test
Bulk density tests and soil screening tests are carried out in the 70 debris flow deposit areas. Figure 3 shows the results of the bulk density tests. The bulk densities of the soil material in the debris flow deposits are mainly between 1.3 and 1.8 g cm −3 , and the average bulk density is about 1.48 g cm −3 . The results of the screening test show that the material composition in the deposit zone is mainly composed of block gravel mixed soil, the content of the block gravel is 30 %-50 %, the content of silt and clay is about 20 %-40 %, and the rest of the deposit material is breccia. The reason for the high content of coarse stone soil is that the collapse phenomenon is quite common due to the active crustal movement in the study area.

Drilling and geophysical prospecting
The geologic condition in the active debris flow gullies in southwestern China is very complicated. To investigate the material composition and the thickness of the deposit area, the geological drilling was conducted in the active debris flow gullies along the Dadu River, Yalong River, Yaluzangbo River, and Minjiang River. The drilling information, such as the drilling location, drilling depth, and soil characteristics, are provided in Table 2.

Statistical technique
The statistical techniques can be grouped into bivariate and multivariate methods. A bivariate statistical method analyses each parameter individually; therefore, the calculation and application in bivariate statistical models are straightforward and efficient (Suzen and Doyuran, 2004). On the other hand, a multivariate statistical method considers the interaction of all parameters in controlling the occurrence of a phenomenon and is considered one of the best methods in predicting debris There is no antecedent precipitation or very little antecedent precipitation, which is not enough to make the surface soil moist.

Medium
The antecedent precipitation is intermittent or lower, and the soil is wet or muddy.

Adequate
The precipitation lasts for several days, and the soil layer is full of water. Water accumulated in some low-lying areas, and the drainage is not smooth.
flow susceptibility (Lucà et al., 2011). Hayashi's quantification theory is a well-known multivariate statistical method developed by Hayashi (1961). The Type I quantification theory applies multiple linear regression methods, which can simultaneously process qualitative and quantitative variables, and evaluate the weight of each variable. Therefore, it is widely used in various fields (Matsumura, 2004;Ishihara et al., 2007;Inoue et al., 2009;Shen and Chen, 2018). In this method, the qualitative and quantitative variables can be mutually transformed based on a reasonable principle. Therefore, this method has very good applicability for processing the quantitative and qualitative influencing factors of debris flow risk. In Hayashi's Type I quantification theory , qualitative variables are termed items. All possibilities for each item are termed categories. A dummy variable δ i (j , k) is introduced in the method to express the response of an item and the category for each sample: if response of ith sample in the category k of item j to the corresponding external criterion; i = 1, 2, . . ., n; j = 1, 2, . . ., m 0, otherwise , where n is the number of samples and m denotes the number of items. The response matrix X can be expressed as a n × p order matrix composed of all categories δ i (j , k): δn (1, r1) δn(2, 1)· · · δn (2, r2) · · · δn(m, 1)· · · δn (m, rm) To establish a quantitative analysis model, the qualitative and quantitative in situ observations are used to fit the linear relationship between the concerned independent variable and the dependent variable. In Hayashi's Type I quantification theory , the random variable changes with the m variables: where y i represents the susceptibility of the ith debris flow gully, r j is the number of categories of the item j , b j k is a constant coefficient depending on category k in item j , and ε i is a random error.
To establish an analysis model of debris flow susceptibility, some necessary steps should be followed based on Hayashi's Type I quantification theory: (1) building an index system, (2) selecting samples and assigning values, (3) establishing the analysis model using single slopes, (4) conducting a significance test of the regression equation and each variable, and (5) applying this analysis model to regional debris flow hazards evaluation.

Indexes and categories in the statistical model
There are many factors that affect debris flow formation and development. From the perspective of source material of the debris flows, the main influence factors are catchment area, loose material position, and loose material reserves. The antecedent precipitation and H 1p rainfall intensity are the main generating conditions of debris flows. Aside from this, the catchment morphology, longitudinal gradient, average gradient of slope on both sides of the gully, and valley orientation are the main factors affecting the development of debris flows. Therefore, these nine indexes (listed in Table 3) are selected in this study to assess the susceptibility of debris flows. Each factor is classified into certain categories according to the values shown in Table 4.

Sample quantification
A total of 70 debris flow gullies in southwestern China are selected as the sample to evaluate the performance of the statistical model. Detailed information about these debris flow gullies is given in Table 5. The values of the samples are assigned according to Eq. (1), and the response from each category is obtained. The sample data then can be transformed into a "0-1" reflection matrix. mixed with gravel containing a small amount of Gully 101 • 52 11 E boulder. The particle size of the gravel, breccia, and boulder are 2-3, 10, and 40 cm, respectively. The soil content in this layer is up to 70 %. The lower layer is mainly composed of gravel and sand, and the particle size is relatively uniform, generally 5-8 cm.
The roundness of the particles is good, and the content of fine particles is low.    Average gradient of slope on both sides of gully ( • ) x 4 Catchment morphology x 5 Valley orientation x 6 Loose material reserves (10 4 m 3 km −2 ) x 7 Main loose material position x 8 Antecedent precipitation x 9 H 1p rainfall intensity (mm)

Statistical model based on Hayashi's quantification theory
When the quantitative theory and regression analysis take the binary-state variables 0 and 1, the equation can be revised as the following linear regression expression: Based on Eq. (4) and matrix derivation regression calculation, the contribution values of each item are obtained, as shown in Table 6. Substituting the numerical values in Eq. (4), the susceptibility prediction model of debris flow is established, which can be represented as follows: Table 4. Grading criteria of the evaluation indexes in the prediction model of debris flow susceptibility.
5 Validation and discussion

Fitting degree analysis
R 2 is the fitting degree, which is widely used to evaluate the accuracy of prediction models. As shown in Table 7, the fitting degree of the proposed model is 71.8 %, which shows that this model can precisely predict the susceptibility of debris flows in southwestern China.

Self-test coincidence rate
The values of each index are used in the established model to calculate the predicted values of the susceptibility based on Eq. (5), and then the predicted values are compared with the actual susceptibility. In this study, self-test coincidence rate is defined as the ratio of the predicted result to the actual susceptibility. As shown in Fig. 4, the predicted values of debris flow susceptibility are graded. For the calculated results listed in Table 8, the prediction accuracy for the low- No.
x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Susceptibility  Table 6. Score values of each index after normalization.

Residual error analysis
Residual error is the difference between a group of values observed and their arithmetical mean. As shown in Fig. 5, the residual error of the model mainly fluctuates by ±0.45, which indicates that the regression line can fit the field value well, and the residual frequency is approximately close to the normal distribution.

Verification of proposed model
The Kaka basin is located on the upper part of the Dadu River, southeast of the Qinghai-Tibet Plateau. The valley is deep and the river runs from north to south. The regional topography is characterized by high altitudes in the east and low altitudes in the west. The terrain is composed of high mountains with elevations of 2000 m. There are three layers of wide valley mesas, and the uplift of mountains and river erosion is significant in this area. The river elevation in the Kaka basin is approximately 1800 m, the river width is 140-185 m, and the slope angle is approximately 45-60 • .
The main faults are denoted as F1, F5, F5-1, F6, and F7 in Fig. 6. The strike is in a northwest direction, and they have a 40-60 • angle with the river. A series of debris flow gullies have occurred in the basin. A total of 10 typical debris flow gullies upstream of the Dadu River are selected as samples for the model validation (as shown in Fig. 7, and listed in Table 9). The accuracy of the established model is verified through the comparison with field investigation results. Table 9 provides the relevant basic data for the samples. Each secondary index is transformed into a 0-1 mode, and all the samples are adopted to construct a 9 × 26 matrix. Table 10 shows the predicted susceptibility from the proposed model, and the actual susceptibility obtained by the field investigation. The comparison shows that the accuracy rate of the model is 90 %, and only the prediction result of the Linong Gully deviates from the actual susceptibility. Therefore, detailed field investigation was then carried out to analyze the debris flow susceptibility in the Linong Gully. Figure 8 shows the catchment of the Linong Gully. The total area of the catchment is about 10.09 km 2 , and the total amount of loose material is about 4.04×10 6 m 3 . The soil material, as shown in Fig. 9, is mainly composed of block and crushed stone. Their particle sizes are generally 10-40 cm. In the calculation process, the catchment area is quite large and the loose material per catchment area is relatively very small, as shown in Fig. 8. Based on the data, the prediction susceptibility of the Linong Gully is 2.421, which is very close to the high susceptibility threshold value 2.5. Therefore, although there is a minor deviation, it can still be concluded that the proposed model can perform well in predicting debris flow susceptibility in southwestern China.

Conclusions
Debris flows frequently occurred in southwestern China and resulted in severe damage to dwellings and lifelines. Based on Hayashi's Type I quantification theory , an initiation susceptibility model of debris flows in southwestern China was proposed in this work. The following conclusions can be drawn.
1. According to the topography and geomorphology characteristics in southwestern China, the following nine indexes were used as evaluation factors of debris flow initiation susceptibility: the catchment area, longitudinal gradient, average gradient of the slope on both sides   of the gully, catchment morphology, valley orientation, loose material reserves, location of the main loose material, antecedent precipitation, and rainfall intensity.
2. A total of 70 typical debris flow gullies distributed along the Brahmaputra River, Nujiang River, Yalong River,