Application of GA–SVM method with parameter optimization for landslide development prediction

Prediction of the landslide development process is always a hot issue in landslide research. So far, many methods for landslide displacement series prediction have been proposed. The support vector machine (SVM) has been proved to be a novel algorithm with good performance. However, the performance strongly depends on the right selection of the parameters ( C andγ ) of the SVM model. In this study, we present an application of genetic algorithm and support vector machine (GA–SVM) method with parameter optimization in landslide displacement rate prediction. We selected a typical large-scale landslide in a hydro-electrical engineering area of southwest China as a case. On the basis of analyzing the basic characteristics and monitoring data of the landslide, a single-factor GA–SVM model and a multifactor GA–SVM model of the landslide were built. Moreover, the models were compared with single-factor and multifactor SVM models of the landslide. The results show that the four models have high prediction accuracies, but the accuracies of GA–SVM models are slightly higher than those of SVM models, and the accuracies of multi-factor models are slightly higher than those of single-factor models for the landslide prediction. The accuracy of the multi-factor GA– SVM models is the highest, with the smallest root mean square error (RMSE) of 0.0009 and the highest relation index (RI) of 0.9992.


Introduction
Prediction of the landslide development process is a critical task in landslide research (Sornette et al., 2004;Helmstetter et al., 2004;Corominas et al., 2005;Gao, 2007).Accurate prediction can provide a scientific guide to landslide prewarning and forecast and engineering control as quickly as possible.However, it is not easy to accurately predict the evolution behavior of landslides.This is mainly because of geometrical complexity, nonlinearity of the displacementtime relationships and a large number of interplaying factors, hardly taken into account by prediction models (Crosta and Agliardi, 2002).
The most common method of predicting the development process is to build suitable models according to the development mechanism and monitoring data of landslides.So far, many models have been put forward (Saito, 1965;Voight, 1989;Crosta and Agliardi, 2002;Lu and Rosenbaum, 2003;Feng et al., 2004;Helmstetter et al., 2004;Neaupane and Achet, 2004;Sornette et al., 2004;Randall, 2007;Mufundirwa et al., 2010).They can be roughly classified into four categories: deterministic physical models, statistics models, nonlinear models and numerical simulation models (Li et al., 2012).Among them, the nonlinear models are considered to have the greatest potential for coping with difficult and complicated problems.Especially, artificial intelligence methods represented with neural networks (NNs) have been widely used in landslide prediction recently (Lu and Rosenbaum, 2003;Neaupane and Achet, 2004).However, some problems have appeared in the practical applications of NN methods because of imperfect theory, such as being suitable only for large data sets and having both easily occurring local minimum and weak generalization ability.Therefore, we need to find a better method for landslide development prediction.
The support vector machine (SVM) is a new machine learning method originally developed by Vapnik and his Published by Copernicus Publications on behalf of the European Geosciences Union.X. Z. Li and J. M. Kong: Application of GA-SVM method co-workers.The method works on Vapnik-Chervonenkis (VC) dimension theory of statistic learning theory (SLT) and structural risk minimization principle (Cortes and Vapnik, 1995).It seeks an optimal compromise between the complexity (the learning accuracy for certain training samples) and learning capacity (the predicting ability for any other samples) of models according to limited samples, in order to obtain the best generalization ability (Cortes and Vapnik, 1995;Cristianini and Shawe-Taylor, 2000).It can preferably resolve problems with small samples, and nonlinear and high dimensions.Hence, the method has been widely used in the fields of classification and regression (Oliveira et al., 2004;Khan et al., 2006).Recently, a few researchers have begun to try to apply the method in landslide and slope research.For example, Yao et al. (2008), and Ballabio and Sterlacchini et al. (2012) applied it to landslide susceptibility mapping and assessment and obtained good results.Samui (2008) used it to predict safety status and factors of slopes, and indicated that the SVM model gives a better result than the result of artificial neural networks (ANNs) for safety factor prediction of slopes.
Although SVM theory has been widely used in some fields, its application results do not reach the expected results of the theory.According to the related references (Cherkassky and Ma, 2004;Lessmann et al., 2005;Min and lee, 2005), the selection of kernel functions and parameters is one of main factors affecting the application results.At present, parameters of the SVM model are manually selected by experience, there being lack of a guide of mature theory.Genetic algorithm (GA) is a global optimization algorithm with good robustness, which was first suggested by John Holland in 1975 (Goldberg and Holland, 1988).GA can be used to automatically recognize some parameters of SVMs (Lessmann et al., 2005;Pourbasheer et al., 2009).Hence, we present an application of GA-SVM in landslide development prediction.
The paper is organized as follows: Sect. 1 starts with literature associated with landslide development prediction and features and applications of related prediction methods.Section 2 introduces the SVM, GA and GA-SVM methods.Section 3 presents a typical large-scale landslide case.Application results are described in Sect. 4. Discussion and conclusions are presented in Sect. 5.

SVM for regression
As the detailed description of SVM theory can be found in various references (e.g., Cortes and Vapnik, 1995;Cristianini and Shawe-Taylor, 2000), here we only introduce some key points of SVM for regression (SVMR).
SVMR has two types: linear regression and nonlinear regression.For linear regression, first consider the problem us-ing a linear regression function to fit the data {x i , y i }, i = 1, 2, . . ., n, x i ∈ R n , y i ∈ R, where ω is an adjustable weight vector, b is scalar threshold, x is the input and y is the output, R n is n-dimensional vector space and R is one-dimensional vector space.In order to find a function as flat as possible f (x) that gives a deviation ε from the actual output (y), a smallest ω would need to be found.It can be obtained by minimizing the Euclidean norm ω 2 (Smola and Scholkopf, 2004;Samui, 2008).This can be written into a convex optimization problem as follows: Minimize : Subject to : Considering the existence of some permissible error, slack variables ξ i and ξ * i are introduced into the above optimization problem.Equation (2) becomes Minimize : where the constant C > 0 shows the penalty degree of the sample with error exceeding ε and is called a penalty factor.
A dual problem of Eq. ( 3) can be obtained by using the optimization method.
where a i and a * i are Lagrange multipliers.Solving the above optimization problem, the fitness function of SVM can be given by where k is the number of support vectors, and the samples (x i , y i ) corresponding to a i − a * i = 0 are support vectors.For the nonlinear problem, the origin problem can be mapped into a high-dimensional feature space by some nonlinear transformation (Cristianini and Shawe-Taylor, 2000).In the feature space, the inner product operation of linear problem can be substituted by kernel functions, i.e., K x i , x j = ∅(x i ) • ∅(x j ).Therefore, Eqs. ( 4) and ( 5) can be written as Subject to : where K x i , x j is a kernel function that measures the similarity of distance between the input vector x i and the stored training vector x (Feng et al., 2004).The meanings of other parameters are same as for the parameters mentioned above.At present, four basic kernel functions have been widely used.They are (Cristianini and Shawe-Taylor, 2000) -linear: K(x i , x j ) = (x i , x j ).
-radial basis function (RBF): -sigmoid kernel: Here, γ , r, and d are kernel parameters.In this study, we mainly used RBF as kernel function of the SVM model for landslide prediction, because the function has strong nonlinear mapping ability.

Genetic algorithm (GA)
GA, an adaptive optimizing method with overall searching function, was devised by simulating the genetic evolution mechanism of biology in the natural environment (Whitley, 1994).The method simulates the copying, crossing and variation phenomena in the process of natural selection and heredity.Starting from any initial population, a group of new better-adapted individuals can be generated by randomly selecting, crossing and variation operations.Therefore, by unceasing evolution from generation to generation, a bestadapted individual (the optimal solution of the optimal problem) can be acquired at last.It has the advantages of global optimality, implicit parallelism, high stability and wide usability.The method has been widely used in computer science, engineering management and social science (Lessmann et al., 2005;Pourbasheer et al., 2009;Choudhry and Garg, 2009).In this study, we mainly use GA to search for the parameters (C and γ ) of the SVM model for landslide development prediction.

GA-SVM model
In order to build an effective SVM model, the parameters (C and γ ) of the model need to be chosen properly in advance (Lin, 2001).The parameter C determines the tradeoff cost between minimizing the training error and complexity of the SVM model.With a bigger C value, the predictive accuracy of the training sample is higher.However, this may cause an over-training problem.The parameter γ of the RBF kernel function defines a nonlinear mapping from input space to high-dimensional feature space.The value of γ affects the shape of RBF function.Hence, the parameters (C and γ ) have a powerful influence on the efficiency and generalization performance of the SVM model.At present, the choice of the parameters lacks the guide of mature theory, mainly depending on experiences.A grid-search technique was presented by Lin (2001).However, the grid algorithm is time consuming and does not perform very well (Gu et al., 2011).
According to some related research in different fields, GA is proved to be a better choice to determine the parameters (Lessmann, 2005;Pourbasheer et al., 2009).It can reduce the blindness of human-made choice and improve the predicative performance of the SVM model.Therefore, we choose GA to search for the optimal parameters of the SVM model for landslide prediction in this study.The basic flowchart of the GA-SVM method can be seen in Fig. 1.
The algorithm can be realized by a parameter optimization procedure designed by Y. Li of Beijing Normal University based on the libsvm-mat toolbox, which was developed by Lin of National Taiwan University (Chang and Lin, 2001).

Landslide case study
Here, we selected a typical large-scale landslide in southwest China as a case.

Basic characteristics of the landslide
The landslide is located on the left bank of reservoir head of a hydro-electrical power station in southwest China, which is about 600 m away from the axis of the reservoir dam.The landslide height is 500-700 m, and its average width and volume are about 700 m and 5 million m 3 respectively.The origin slope in landslide area belongs to a monoclinic dip slope.The slope direction is 210-215 • .The landslide body borders 1400 m elevation, the terrain below the elevation of 1400 m is gentle, with an average gradient of 22-25 • ; while the terrain above the elevation of 1400 m is steeper, with a gradient of 35-45 • .The landslide has three free faces, and there are several gullies and multi-level gentle slopes on the uneven slope face (Fig. 2).
The upper part of the landslide body is composed of basalt with blocky structure; its lower part is composed of layered sedimentary rock.The landslide body can be divided into three zones from upstream to downstream, according to lithology and material composition characteristics and the continuity of the slip surface.Zone I, an ancient landslide area, is located in the upstream side of the landslide, with a trench and valley landform.Zone II, a creep area of rock that is the main deformation area of the landslide, is located in the middle of the landslide, with a ridge landform.Zone III, a shallow-surface landslide area, is located in the downstream side of the landslide, with a ridge landform (Fig. 2).Our research focus is on Zone II.

Monitoring data of the landslide
In order to ascertain the basic characteristics of the landslide and evaluate its stability and development tendency, an overall monitoring system was gradually put in practice in 1992 and started to operate in April 1998.The system, based on the geological and geomorphological features of the landslide, uses a variety of landslide monitoring techniques and instruments with different precision to comprehensively monitor the landslide from the surface to the underground.The monitoring instruments have TCA 2003 automatic total station, SINCO sliding and fixed inclinometer, Ni002A level, and MD4281 deformation measuring instruments of rock mass.The monitoring items include precise geodetic survey, drilling monitoring, footrill monitoring, meteorological observations and an engineering geological survey.On the landslide body of Zone II, 110 monitoring points were set up, including 11 monitoring points of surface displacement, 4 drilling monitoring points for observing deep displacement of the landslide and groundwater temperature and level, 49 monitoring points for vertical and horizontal displacements of two footrill soleplates, 7 monitoring points for groundwater flow and temperature in the footrills, and 39 crack monitoring points for surface and buildings.The system with large scale, high accuracy and many items was at the industry leading standard at that time.In order to ensure sufficient accuracy, monitoring instruments are always regularly serviced and renewed, and intensive observations were made after the reservoir impounding.So far, we have accumulated a large amount of detailed monitoring data for the landslide.The long-term and continuous monitoring data provide a good basis for studying in detail the deformation law and mechanism of the landslide.
In this study, we choose the footrill monitoring data of the creep body in Zone II from April 1998 to December 2005 to deeply analyze the relationships between the landslide displacement rate and rain, reservoir water and groundwater.The displacement rate is calculated on the basis of the monitoring displacement values.For contrast, the index values are normalized by the min-max normalization method.The method performs a linear transformation on the original data.Suppose that min a and max a are the minimum and maximum values for attribute A .The method maps a value v of A to v in the range [0, 1] by computing v = (v − min a )/(max a − min a ). (8) The analysis results of the landslide are shown in Figs.3-5 after using the above transformation.
Figure 3 shows that there is a good relationship between landslide displacement rate and rainfall, and the peaks of the displacement rate generally lag behind the rainfall peaks.As can be seen from Fig. 4, the impact of reservoir water level   changes on the landslide mainly manifests in the early stages of storing water.The displacement rate of the landslide increased significantly after the reservoir started to store water in 1998.Afterwards, the impact of the reservoir water level on the landslide gradually decreased.The changes of the displacement rate showed a gradual decrease trend with the fluctuations of the water level.Figure 5 shows that there is a significant relationship between groundwater flow and displacement rate.They were consistent and reached peak levels at almost the same time.
Based on the above analysis results and the engineering geological survey results, the development process of the landslide is affected by rain and reservoir water as well as by other factors.However, the deformation is mainly affected by rainfall conditions, except that the changes of reservoir water level also had a powerful effect on the landslide in the early stages of storing water.

Application results
In this section, we analyze the development of the landslide, based on the above-mentioned GA-SVM method and the monitoring data analysis results.We respectively built a single-factor GA-SVM model and a multi-factor GA-SVM model for the landslide.

Single-factor GA-SVM prediction result
Firstly, we take the average monthly displacement rate of the landslide from April 1998 to December 2005 (93 data points) as a factor for building model.The earliest 62 data points were chosen as training samples, and the other 31 were considered as test samples.We built a single-factor SVM model for the landslide development prediction, and determined the parameters (C and γ ) of the model by GA.The GA had a generation number of 100, population size of 20.The search range of C and γ parameters is [0, 100].The process-searched optimal parameters by GA can be seen in Fig. 6.We obtained a best C parameter of 7.9155, and a best γ parameter of 0.13504.The model with the best parameters has the smallest mean square error (MSE).The prediction result of the GA-SVM model with the optimal parameters is shown in Fig. 7.
Figure 7 shows that the monitoring data are in good agreement with the prediction result of the single-factor GA-SVM model.

Multi-factor GA-SVM prediction result
Secondly, we take the average monthly displacement rate, average monthly reservoir water level, monthly rainfall and average monthly groundwater flow of the landslide from April 1998 to December 2005 as main factors for building a model.Similarly, the earliest 62 data points of the four factors were chosen as training samples, and the other 31 data points of the four factors were considered as test samples.We also built a multi-factor SVM model for the landslide development prediction, and determined the parameters (C and γ ) of the model by GA.The parameters of GA and the search range of C and γ parameters are identical to those of the single-factor GA-SVM.The process of obtaining the parameters and the prediction results of this model can be seen in Figs. 8 and 9.

Comparison of GA-SVM and SVM prediction results
In order to evaluate the prediction performance of the above GA-SVM models, we also built single-factor and multifactor SVM models of the landslide by using the same training samples as the GA-SVM models, and obtained the model parameters (C and γ ) by using the grid-search method (Figs. 10 and 11).
The prediction accuracy of the SVM and GA-SVM models can be evaluated by two indexes.They are respectively root mean square error (RMSE) and relation index (RI).Generally, the smaller the RMSE and the higher the RI, the higher the accuracy of the model is.They can be calculated by using the following formulas (Li et al., 2012): where X (0) (k) is the observed value and X(0) (k) is the predicted value of the models, and n and X(0) are the size and average value of the data sequence X (0) (k).The prediction accuracy indexes of the models are shown in Table 1.As can be seen from Table 1, the prediction models have very high accuracies, with the RI values relating the predicting and monitoring values reaching 0.99.The accuracies of GA-SVM models are slightly higher than those of SVM models, and the accuracies of multi-factor models are slightly higher than those of single-factor models.Among the models, the accuracy of the multi-factor GA-SVM models is the highest, with the smallest RMSE of 0.0009 and the highest RI of 0.9992.

Discussion and conclusions
SVM is a new machine learning method with good performance in solving small-sample, nonlinear and The disadvantage of the method mainly lies in its complicated theory.For the purpose of a wide application, some researchers have developed some toolboxes for the SVM method, such as the libsvm toolbox (Chang and Lin, 2001).Despite the above advantages, the generalization performance of the SVM models strongly depends on the right choice of its kernel functions and the parameters (C and γ ) (Cherkassky and Ma, 2004;Lessmann et al., 2005).Hence, it is vitally important to reasonably determine them.GA is an adaptive optimizing method with overall searching function.In order to avoid the blindness of the parameter selection of the SVM model, we select GA to automatically search for the parameters of the model for landslide prediction.
In this study, we took a complicated large-scale landslide in a hydro-electrical engineering area of southwest China as a case.The landslide is located in the upstream reach of a hydropower station.Its development process is affected by many factors, such as rain, reservoir water, groundwater and human activity as well as the natural features of the landslide body.Moreover, the factors interrelate and interact with each other.We present an application of the GA-SVM method with parameter optimization in landslide displacement rate prediction.GA and SVM are organically combined by using GA to automatically search for the parameters of the singlefactor and multi-factor SVM models of the landslide.
In addition, we also built the single-factor and multi-factor traditional SVM models of landslide prediction.By comparing, we find that the accuracies of the GA-SVM models are slightly higher than those of the SVM models, and the accu-racies of multi-factor models are slightly higher than those of single-factor models for landslide prediction.Among the models, the accuracy of the multi-factor GA-SVM models is the highest, with the smallest RMSE of 0.0009 and the biggest RI of 0.9992.
The application results indicate that SVM and GA-SVM models have good prediction performance for landslide development tendency, and GA is an effective way for the selection of parameters of the SVM models.Because of the complexity of landslides and diversity and randomness of factors that influence them, the application of SVM and GA-SVM methods in the landslide development prediction has significant potential.

Fig. 2 .
Fig. 2. The whole view of the landslide.

Fig. 3 .
Fig. 3.The relationship between the displacement rate and rainfall for the landslide.

Fig. 4 .
Fig. 4. The relationship between the displacement rate and reservoir water level for the landslide.

Fig. 5 .
Fig. 5.The relationship between the displacement rate and groundwater flow for the landslide.

Fig. 6 .
Fig.6.The fitness curve of searching for the optimal parameters of the single-factor SVM by GA.

Fig. 7 .
Fig. 7.The curves of the monitoring and predicting values of the displacement rate of the landslide with time.

Figure 9
Figure 9 also shows that the monitoring data have good agreement with the prediction result of the multi-factor GA-SVM model.

Fig. 8 .
Fig. 8.The fitness curve of searching for the optimal parameters of the multi-factor SVM by GA.

Fig. 9 .
Fig. 9.The curves of the monitoring and predicting values of the displacement rate of the landslide with time.

Fig. 10 .
Fig. 10.The 3-D contour map of the parameter selection of the single-factor SVM model by using grid-search method.

Fig. 11 .
Fig. 11.The 3-D contour map of the parameter selection of the multi-factor SVM model by using grid-search method.

Table 1 .
The accuracy comparison of the SVM and GA-SVM models for the landslide.