Machine learning modelling for predicting soil liquefaction susceptibility

. This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The ﬁrst machine learning technique which uses Artiﬁcial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is ﬁrmly based on the theory of statistical learning theory, uses classiﬁcation technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [ (N 1 ) 60 ] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [ (N 1 ) 60 and peck ground acceleration ( a max /g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.


Introduction
Liquefaction is a phenomenon whereby a granular material transforms from a solid state to a liquefied state consequently of the increase in pore water pressure.The effective stress of the soil reduces, therefore, causing loss of bearing capacity.There are three types of damage occurring during liquefaction.First is that ground lateral spreading and failures of dam embankment are the particular types of landslides which could be classified as liquefaction (Keefer, 1984).Second is that sand blows and ground cracks are the surface manifestations of liquefaction in Correspondence to: P. Samui (pijush.phd@gmail.com)soil.Third is that building settlement and/or severe tilting are the hazardous consequences of the liquefaction.The Damages attributed to the earthquake-induced liquefaction phenomenon have cost society hundreds of millions of US dollars (Seed and Idriss, 1982).Therefore, the assessment of the liquefaction potential due to an earthquake at a site is an imperative task in earthquake geotechnical engineering.The liquefaction susceptibility of soil depends on the earthquake parameter and soil parameter.One of the most important earthquake parameter is the maximum epicentral distance (Papadopoulos and Lefkopoulos, 1993).Kramer (1996) has described the different soil parameters such as fraction finer than 0.005 mm, liquid limit, natural water content, liquidity index, gradation, particle shape, initial state of the soil, etc.A procedure based on standard penetration test (SPT) and cyclic stress ratio (CSR) was developed by Seed and his colleagues (1967Seed and his colleagues ( , 1971Seed and his colleagues ( , 1983Seed and his colleagues ( , 1984) ) based on the use of peck ground acceleration (PGA = a max /g) to assess the liquefaction potential of soil, and is now in standard use around the world.Liao et al. (1988) and Cetin (2000) used a probabilistic framework to model the variability and uncertainty inherent to the problem of liquefaction.Goh (1994) successfully applied Artificial Neural Network (ANN) for the determination of liquefaction susceptibility of soil.However, ANN models have some limitations such as the black box approach, arriving at local minima, slow convergence speed and over fitting problems (Park and Rilett, 1999;Kecman, 2001).
In this paper, two machine learning techniques (ANN and support vector machine, SVM) have been adopted to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake.The epicentre of the earthquake was at 23.87 • N, 120.75 • E (Juang et al., 2002).This earthquake caused a lot of damage, in particular large scale liquefaction in Central Taiwan (Juang et al., 2002).Several SPT tests were conducted subsequent to the earthquakes in the different sites P. Samui and T. G. Sitharam: Liquefaction susceptibility: machine learning modelling (Mingjian Shiang, Taiping City, Wufeng Shinag, Yuanlin Jen, Taichung Harbour, Chiuanshing, Mingjian Shiang, Chanbing Industrial Park, Nantou City, etc.) and the results have been published (Hwang and Yang, 2001).The SPT test also gives the information about fines content, depth of water table, clay size content, D 50 , SPT energy ratio, etc. Information (liquefaction, soil profile, etc) about Chi-Chi earthquake was given by different researchers (Lin et al., 2001;Lee et al., 2001Lee et al., , 2001;;Wang et al., 2002Wang et al., , 2003Wang et al., , 2004;;Sokolov et al., 2002;Shou and Wang, 2003;Yuan et al., 2004;Chu et al., 2004;Ku et al., 2004;Chua et al., 2004).ANN has been used with multilayer perceptrons (MLPs) that are trained with Levenberg-Marquardt backpropagation algorithm.The Support Vector Machine (SVM) based on statistical learning theory has been developed by Vapnik (1995).This study employs SVM as a classification technique.Two sets of analyses were carried out -first by using the two input parameters, corrected SPT [(N 1 ) 60 ] and CSR and the second one using (N 1 ) 60 and a max /g.The developed models have been tested for different case histories available globally (Goh, 1994).A comparative study has also been carried out between the developed ANN and SVM models.

Methodology
In this paper, two models (ANN and SVM) have been adopted for prediction liquefaction susceptibility.Brief descriptions of the two models developed for our study are given below.

ANN model
In this study, MLPs that are trained with Levenberg-Marquardt Backpropagation algorithm has been used (Hagan and Menhaj, 1994).MLPs are perhaps the best-known type of feed forward networks.It has generally three layers: an input layer, an output layer and an intermediate or hidden layer.In the backpropagation training process, the network error is back propagated into each neuron in the hidden layer, and then continued into the neuron in the input layer.The modification of the connection weights and biases depend on the distribution of error at each neuron.The global network error is reduced by continuous modifications of connection weights and biases.An error goal is set before the network training, and if the network error during the training becomes less than the error goal, the training has to be stopped.Levenberg-Marquardt backpropagation algorithm is a variation of Newton's method and is well-suited to ANN training.The theory and implementation of Levenberg-Marquardt Backpropagation has been given by More (1977).
The main scope of this study is to implement the ANN backpropgation methodology in the prediction of liquefaction susceptibility based on the actual SPT field data from Chi-Chi, Taiwan earthquake by developing two models (MODEL I and MODEL II).This study uses the database collected by Hwang and Yang (2001).Out of the total 288 datasets, a total of 164 data are for the sites which are liquefied and 124 are for non-liquefied sites after the earthquake.The liquefaction susceptibility of a soil mass during an earthquake is dependent on both seismic and soil parameters.So, in MODEL I, the input parameters are the corrected SPT value [(N 1 ) 60 ] and cyclic shear stress ratio (CSR).To use these data for classification purpose, a value of −1 is assigned to the liquefied sites while a value of 1 is assigned to the non-liquefied sites so as to make this a two-class classification problem.So, the output of the model will be either 1 or −1.The data is normalized against their maximum values (Sincero, 2003).In carrying out the formulation, the data has been divided into two sub-sets: such as  (Shahin et al., 2000).CSR has been used as an input parameter in MODEL I. CSR has been calculated from the following formula (Seed and Idriss, 1971), where, σ v is total overburden stress, σ v is effective overburden stress, γ d is stress reduction factor and MSF is magnitude scaling factor.Because of the difficulty and cost constraint of obtaining high-quality undisturbed samples, it is very difficult to get a reliable value of σ v and σ v .So, it is a very difficult task to determine CSR value accurately.
The purpose of the development of MODEL II is to predict liquefaction based on (N 1 ) 60 and a max /g.So in MODEL II, the input variables are (N 1 ) 60 and a max /g.In MODEL II, the same training dataset, testing dataset and normalization technique have been used in MODEL I. MODEL II has also been verified for the additional 85 case histories (which were not part of training or testing dataset used earlier to develop the model) available globally (Goh, 1994).Both programmes (MODEL I and MODEL II) are constructed using a neural network tool box in MATLAB (Demuth and Beale, 1999).

SVM model
SVM has originated from statistical learning theory pioneered by Boser et al. (1992).Since SVM is a relatively new technique, a brief explanation of how it works is given below.More details can be found in many publications (Boser et al., 1992;Cortes and Vapnik, 1995;Gualtieri et al., 1999;Vapnik, 1998).Consider the training datasets which consists of k training samples represented by (x 1 ,y 1 ),...,(x k ,y k ), where x i ∈ R N is an N-dimensional data vector with each sample belonging to either of the two classes labelled as y i ∈ {+1,−1}.In this study, for MODEL I x = CSR,(N 1 ) 60 and for MODEL II x = a max g,(N 1 ) 60 .To use the SPT data for classification purposes, a value of −1 is assigned to the liquefied sites while a value of 1 is assigned to the nonliquefied sites, so as to make this a two-class classification problem.The equation of a hyperplane that does the separation is Where x is an input vector, w is an adjustable weight vector, b is a bias, R N is N-dimensional real vector space and R is one dimensional real vector space.For the linearly separable class, a separating hyperplane can be defined for the two classes as Sometimes, due to the noisy or mixture of classes of training data, variables ξ i > 0, called slack variable, are used to account for the effects of misclassification.So Eq. ( 3) can be written in the following way The optimal hyperplane is located where the margin between two classes is maximized and the error is minimized.The support vectors of the two classes lie on two hyperplanes, which are parallel to the optimal hyperplane and are defined by w • x i + b = ±1.The margin between these planes is 2 w .Maximization of this margin can be achieved by solving the following constrained optimization problem, Minimize : The constant (called capacity factor) 0 < C < ∞, a parameter defines the trade-off between the number of misclassification in the training data and the maximization of margin.This optimization problem ( 5) is solved by Lagrangian Multipliers (Vapnik, 1998).According to the Karush-Kuhn-Tucker (KKT) optimality condition (Fletcher, 1987), some of the multipliers will be zero.The nonzero multipliers are called support vectors (see Fig. 1).In conceptual terms, the support vectors are those data points that lie closest to the optimal hyperplane and are, therefore, the most difficult to classify.The value of w and b are calculated from w = l i=1 y i α i x i and b=− 1 2 w x +1 + x −1 , where x +1 and x −1 are the support vectors of class labels +1(No liquefaction) and −1(liquefaction), respectively.The classifier can then be constructed as: where sign is the signum function.
It gives +1(No liquefaction) if the element is greater than or equal to zero and −1(liquefaction) if it is less than zero.
In case linear supporting hyper plane is inappropriate, the SVM maps input the data into a high dimensional feature space through some nonlinear mapping (Boser et al., 1992).This method easily converts a linear classification learning algorithm into a nonlinear one, by mapping the original observations into a higher-dimensional nonlinear space so that linear classification in the new space is equivalent to nonlinear classification in the original space.Kernel function has been introduced instead of feature space ( (x)) to reduce computational demand (Cortes and Vapnik, 1995;Cristianini and Shwae-Taylor, 2000).To get the Eq. ( 6), same procedures have been applied as in the linear case.
Radial basis function has been used as kernel function in this study.In SVM, For MODEL I and MODEL II, the same training dataset, testing dataset, input variables and normalization technique have been used as in the ANN model.The application of SVM for this study requires the www.nat-hazards-earth-syst-sci.net/11/1/2011/Nat.Hazards Earth Syst.Sci., 11, 1-9, 2011 proper selection of C value.The identification of optimal value of C is largely a trial and error process.However, there are guidelines that can be used for selecting C. A large C assigns higher penalties to errors so that the regression is trained to minimize error with lower generalization while a small C assigns fewer penalties to errors; this allows the minimization of margin with errors, thus, a higher generalization ability.If C goes to infinitely large, SVM would not allow the occurrence of any error and result in a complex model, whereas when C goes to zero, the result would tolerate a large amount of errors and the model would be less complex.

Results and discussion
For predicting liquefaction susceptibility, the two input variables (CSR and (N 1 ) 60 ) are used for ANN model for MODEL I. Hence, the input layer has two neurons.The only output is the 1 or −1 and, therefore, the output layer has only one neuron.In ANN model, the optimum backpropagation networks that can be obtained in the present study are a three-layer feed forward network.Figure 2 shows the final architecture of the ANN model with one hidden layer.In this study, the transfer function used in the hidden layer is logsig.
The expression of logsig is given below: The tansig transfer function has been used in the output layer.
The expression of tansig is given below: The number of neurons in the hidden layer is determined by training several networks with different numbers of hidden neurons and comparing the predicted results with the desired output.Using too few hidden neurons could result in huge  The performance of training data is 94.55%.According to the results of network training, the network has successfully captured the relationship between the input parameters and output.In order to evaluate the capabilities of the ANN model, the model is validated with new data that are not part of the training dataset.In this case, the performance of ANN model is 88.37%.Figures 4 and 5 illustrate the plot between CSR and (N 1 ) 60 for training and testing dataset, respectively.These figures provide a design assessment chart that can be used to estimate the liquefaction resistance of soils.
In MODEL II, the input variables are a max /g and (N 1 ) 60 .So, the input layer has only two neurons.The output of the model is 1 or −1.Hence, the output layer has only one neuron.MODEL II uses three layer feed forward network with 5 neurons in the hidden layer and it has been shown in Fig. 6.The variation of MSE with epochs has been shown in Fig. 3.For MODEL II, the converged results have been achieved at 295 epochs (see Fig. 3).The performance of training and testing dataset is 94.05 and 87.20, respectively.So, there is a marginal reduction of  Figures 7 and 8 demonstrate the plot between PGA and (N 1 ) 60 for the training and testing dataset, respectively.The user can use these figures for separating liquefiable and non-liquefiable soil.This study indicates that the two input parameters [PGA and (N 1 ) 60 ] are sufficient to determine liquefaction susceptibility of the soil.There is no need to calculate the value of CSR.For global data, the performance of ANN model is 70.58%.Figure 9 depicts the plot between PGA and (N 1 ) 60 for global data using the ANN model.The separation between liquefiable and non-liquefiable soil is quite the same for these three figures (7, 8 and 9).Since there is no rule in selecting the C value of SVM, it is necessary to investigate the impact of C on testing performance (%) as well as the number of support vectors for each kernel.The training and testing performance (%) of SVM has been determined by using the following 9 but for SVM. Figure 10 depicts the effect of C on testing performance (%) and the number of support vectors for MODEL I using radial basis function.Figure 10 demonstrates that the testing performance (%) attains maximum value at C = 10.Generally it can be seen, from Fig. 10, that the number of support vectors is decreasing when C < 120 and tend to flatten after C ≥ 120.For the best model, a high testing performance (%) as well as less support vectors is desirable.The design value of C and width of radial basis function (σ ) is 30 and 0.4, respectively.The number of support vector is 37. Training and testing performance of MODEL I are 96.04% and          shows that SVM is a powerful computational tool to analyse the complex relationship between soil and seismic parameters in liquefaction analysis.As further field case records become available, the performance of the SVM can be improved.The developed SVM is simpler to apply than the method by Seed et al. (1971).Only minimal processing of the data are required, essentially to obtain values of (N 1 ) 60 , for a given a max /g.

Conclusions
ANN and SVM models have been developed for predicting liquefaction susceptibility of soil based on SPT data.For ANN model, the procedures to determine data division, data normalizing technique, network architecture selection, transfer function and the number of epochs are outlined.
For SVM, The effect of C on testing performance (%) and the number of support vectors has been investigated.The MODEL II presented clearly that only two parameters [(N 1 ) 60 and a max /g] are sufficient input parameters for predicting liquefaction susceptibility of a site with depth.
The performance of the developed models is encouraging for global dataset.The user can use the developed models (SVM and ANN) as accurate and quick tools for the determination of liquefaction susceptibility of soil without any manual work such as using tables or charts.Comparison between SVM and ANN model indicates that SVM is a better model than ANN for predicting liquefaction susceptibility of soil based on SPT data.
Edited by: M. E. Contadakis Reviewed by: two anonymous referees

Fig. 10 .
Fig. 10.Variation of Testing Performance (%) and Number of Support Vectors with C values for MODEL I using radial basis function kernel.

Figure 11 .
Figure 11.Plot between CSR and (N 1 ) 60 for MODEL I using radial basis function for training dataset.

Fig. 11 .Figure 12 .
Fig. 11.Plot between CSR and (N 1 ) 60 for MODEL I using radial basis function for training dataset.

Fig. 12 .
Fig. 12. Plot between CSR and (N 1 ) 60 for MODEL I using radial basis function for testing dataset.

Fig. 13 .Fig. 14 .
Fig. 13.Variation of Testing Performance (%) and Number of Support Vectors with C values for MODEL II using radial basis function kernel.

Figure 15 .
Figure 15.Plot between PGA and (N1)60 for MODEL II using radial basis function for testing dataset.

Fig. 15 .
Fig. 15.Plot between PGA and (N 1 ) 60 for MODEL II using radial basis function for testing dataset.

Figure 16 .
Figure 16.Plot between PGA and (N1)60 for global data using radial basis function.
(Goh, 1994) uncertainty arises from r d , unit weight, water table depth, etc. Figures 14 and 15 represent the plot between PGA and (N 1 ) 60 for training and testing dataset, respectively.The developed MODEL II has been verified for the different 85 case histories (which were not used either training or testing) available globally(Goh, 1994).In this case, the model performance is 74.12%.So, SVM can be used as a practical tool for the prediction of liquefaction susceptibility of soil based on PGA and (N 1 ) 60 .Figure16depicts the plot between PGA and (N 1 ) 60 for global data using SVM model.A comparative study has been done between developed ANN and SVM model and it has been shown in Table1.For training dataset, the performance of ANN and SVM model is comparable.But for testing dataset and global data, SVM model outperforms ANN model.The use of the structural risk minimization principle in defining the cost function provided more generalization capacity with the SVM compared to the ANN, which uses the

Table 1 .
ANN model uses all training data for the final prediction.Whereas, SVM model employs only the support vector for final prediction.Therefore, the developed SVM produces a sparse solution.Sparseness means that a significant number of the weights are zero (or effectively zero), which has the consequence of producing compact, computationally efficient models, which in addition are simple and, therefore, produce smooth functions.This study Comparison between ANN and SVM model.