Natural Hazards and Earth System Sciences a Genetic-algorithm Approach for Assessing the Liquefaction Potential of Sandy Soils

The determination of liquefaction potential is required to take into account a large number of parameters, which creates a complex nonlinear structure of the liquefac-tion phenomenon. The conventional methods rely on simple statistical and empirical relations or charts. However, they cannot characterise these complexities. Genetic algorithms are suited to solve these types of problems. A genetic algorithm-based model has been developed to determine the liquefaction potential by confirming Cone Penetration Test datasets derived from case studies of sandy soils. Software has been developed that uses genetic algorithms for the parameter selection and assessment of liquefaction potential. Then several estimation functions for the assessment of a Liquefaction Index have been generated from the dataset. The generated Liquefaction Index estimation functions were evaluated by assessing the training and test data. The suggested formulation estimates the liquefaction occurrence with significant accuracy. Besides, the parametric study on the liquefaction index curves shows a good relation with the physical behaviour. The total number of mis-estimated cases was only 7.8% for the proposed method, which is quite low when compared to another commonly used method.

Abstract.The determination of liquefaction potential is required to take into account a large number of parameters, which creates a complex nonlinear structure of the liquefaction phenomenon.The conventional methods rely on simple statistical and empirical relations or charts.However, they cannot characterise these complexities.Genetic algorithms are suited to solve these types of problems.A genetic algorithm-based model has been developed to determine the liquefaction potential by confirming Cone Penetration Test datasets derived from case studies of sandy soils.Software has been developed that uses genetic algorithms for the parameter selection and assessment of liquefaction potential.Then several estimation functions for the assessment of a Liquefaction Index have been generated from the dataset.The generated Liquefaction Index estimation functions were evaluated by assessing the training and test data.The suggested formulation estimates the liquefaction occurrence with significant accuracy.Besides, the parametric study on the liquefaction index curves shows a good relation with the physical behaviour.The total number of misestimated cases was only 7.8% for the proposed method, which is quite low when compared to another commonly used method.

Introduction
Soil liquefaction is a type of ground failure related to earthquakes.It takes place when the effective stress within soil reaches zero as a result of an increase in pore water pressure during earthquake vibration (Youd, 1992).Soil liquefaction can cause major damage to buildings, roads, bridges, dams and lifeline systems, like the earthquakes in Niigata (Japan, M s =7.5), Anchorage (Alaska, M w =9.2) (Seed and Idriss, 1971) and many other places.
In the last few decades, there have been a large number of studies that investigated the liquefaction phenomena (Yalcin et al., 2008;Cetin et al., 2004;Ulusay et al., 2000).NCEER (1996) and NCEER/NSF (National Center for Earthquake Engineering Research/National Science Foundation, 1998) have worked for a consensus on liquefaction assessment methods and/or parameters and they have offered some modifications on existing methods (Youd et al., 2001).The most popular approaches use the standard penetration test (SPT) and cone penetration test (CPT) to determine factor of safety (Seed and Idriss, 1971;Tokimatsu and Yoshimi, 1983;Seed and DeAlba, 1986;Robertson andWride 1997, 1998;Youd and Idriss, 1997;Youd et al., 2001).Iwasaki et al. (1978Iwasaki et al. ( , 1982) ) suggested a liquefaction potential index (LPI), which describes a range rather than a number, and it was modified by Sonmez (2003) and Sonmez and Gokceoglu (2005)."Chinese criteria" is another method to express the liquefaction hazard in a determined extent (Seed et al., 1984(Seed et al., , 1985;;Finn et al., 1994;Andrews and Martin, 2000).
In situ test data are very common in deciding the liquefaction hazard in geotechnical engineering.The first suggestion to use those data is proposed by Seed and Idriss (1971).It is based on the SPT test and was modified by Seed et al. (1985) and Youd et al. (2001).CPT has been employed for about three decades (Robertson and Campanella, 1985;Seed and DeAlba, 1986;Mitchell and Tseng, 1990;Stark and Olson, 1995;Olsen, 1997;Robertson and Wride, 1998).The pros and cons of the SPT and CPT can be traced throughout literature (Lunne et al., 1997;Youd et al., 2001;Yuan, 2003).Nevertheless, these methods are widely used in practice and offer ease of application in many cases, especially for sandy soils.
Robertson and Wride developed an interaction diagram based on the cyclic resistance ratio (CRR) and corrected CPT tip resistance, q c1N , for liquefaction assessment (1998).It is suggested for earthquakes with M w of 7.5, and sands with FC≤5% and median grain size, D 50 , of 0.25-2.0mm.To apply the method to soils with FC>5%, Robertson and Wride's (1998) method also includes a correction of q c1N for soils with higher FC.
Although existing methods utilize a limited number of parameters, liquefaction phenomena inherently involve many seismic and soil parameters.New modelling methods that do not employ simple statistical and empirical relations or charts may help for improved assessment of liquefaction phenomena.GA is one of the best tools to understand the complicated relations among the parameters.In this study, a new method is proposed for the liquefaction assessment of sandy soils.GAs were utilized to evolve the final formulation.A parametric study and comparison with Robertson and Wride's (1998) widely used method were carried out for the validation of the proposed method.

Genetic algorithms
GAs are stochastic optimization methods and are inspired by the evolution theory.In the solution process, they simulate natural selection mechanisms and are effectively used in many engineering applications.Although they started using them extensively after Goldberg's famous book (1989), GAs were first introduced by Holland (1975).The processes of reproduction, crossover and mutation are simulated by the procedures of GAs to maintain improved solutions and to generate all the better offspring, to make the solutions close to the objective function (Tung et al., 2003).GAs have been verified to have more advantages than the classical optimization methods in complex engineering problems.Natural hazards and their estimation include complex natural behaviour, affected by several parameters.Therefore, GAs are effectively utilized for the evaluation of natural hazards (Iovine et al., 2005;D'Ambrosio et al., 2006) and geotechnics (Simpson and Priest, 1993) in some previous studies.
GAs start with a random initial set of solutions, which is called the population.Individuals in the population are called chromosomes, which are probable solutions of the problem.Usually chromosomes are sets of binary strings.By evolving chromosomes through an iteration step, a new set of chromosomes, generation, is formed.Each generation is a combination of old and new chromosomes.This evaluation process is carried out by 3 operations crossover, mutation and selection.
Crossover is the operation of generating offspring chromosomes by combining usually two parent chromosomes.An offspring has features of both parents.Firstly, two individuals are selected for crossover and a random cut-off point is selected for a crossover.Then, each chromosome is cut at that point and the right parts of the strings are swapped.This simplest crossover method is illustrated in Fig. 1.
The number of crossovers is determined by crossover probability, which is defined before running GAs, in each generation.Crossover probabilities up to 80% give satisfying outcomes in many applications (Coley, 1999).
Mutation is the operation of changing a randomly selected bit among all chromosomes from 1 to 0 or 0 to 1.It is an essential operator of the GAs because it prevents premature loss of genetic information from the population, which is highly probable in small populations.Contrary to crossover, smaller mutation probabilities like 1-2% are preferred to satisfy stability of the population (Gen and Cheng, 1997).
Nat. Hazards Earth Syst.Sci., 10, 685-698, 2010 www.nat-hazards-earth-syst-sci.net/10/685/2010/Using the selection operator, population, which is expanded by mutations and crossovers, is reduced to its original size.Selection is based on the fitness values of the individuals.The fittest individuals have more than a chance to be selected to the next generation with respect to weaker individuals.Elite individuals are the ones with the highest fitness.As a result of these procedures, new generations are supposed to have greater fitness values than older generations.However, the best solution in a generation may not survive to the next one.Therefore, an elitism strategy may yield faster solutions.A small number of elites is usually preferred to prevent premature solutions (Gen and Cheng, 1997;Coley, 1999).Fitness value of a chromosome is calculated by fitness function defined by the user, which is a mathematical definition of the optimization problem.The fittest individual represents the optimum solution of the problem in concern.
3 Liquefaction assessment by GA approach

GA code
A type of software named GALIQ (Genetic Algorithm LIQuefaction) has been developed in a Microsoft Visual C# .NET environment.A flow diagram of the code is illustrated in Fig. 2. It starts to run with a randomly generated first population.Then the population is subjected to crossovers, mutations and then the new population is selected as usual.To stop the code, end conditions are defined.The code either runs for 3000 generations at maximum or it will stop at 500 generations without any improvement in the solution.The code tries to minimize errors to have a better estimation of aimed parameters.In typical cases of GA applications, GAs are programmed such that they optimize coefficients of linear or quadratic simple forms of estimation functions.However, GALIQ has no predefined functions, coefficients of which are to be optimized.Instead, terms and sub-functions are also parameters to be optimized by the GA code.After successive generations, software determines which parameters are to be used in the formulation.GALIQ generates many LI estimation functions based upon Eq. (1): (1) X i are function coefficients and exponents to be optimized by GALIQ; f i variables are predefined GA functions; t i stands for the variable soil/earthquake parameters to be determined by GALIQ.Probable values of X i , f i and t i variables are shown in Fig. 3.The objective functions (F1, F2) shown in Eqs. ( 2) and (3) were used to generate Liquefaction Index (LI) estimation functions.The desired estimation values were 1 (liquefaction) and 0 (no liquefaction) in the database.The estimations of LI functions using F1 were targeted to get as close to 1 or 0 as possible.To accomplish this, the root mean square error (RMSE) has been obtained for each individual as an objective function.
The estimation does not necessarily satisfy 1 and 0 in the second objective function.The liquefaction is expected to happen, if LI is higher than 0.5.In this fitness function, only misestimated values have been used to calculate RMSE.In other words, correct estimations were not included in RMSE even if they were different from 1 or 0. Therefore, by focusing on incorrect estimates, the LI function was more effectively forced to take correct values with this modified fitness function.
The GA models developed by F1 and F2 objective functions are given in Tables 1 and 2. The maximum generation number is 3000 and the elite ratio between successive generations is 1% in all solutions.That is, 1% of the individuals with highest fitness values are directly transferred to the next generation without any selection process.The roulette wheel selection method is adopted because of the increased selection of individuals with high fitness value (Gen and Cheng, 1997).The selection is based on spinning a wheel and expecting it to stop on any slice of the roulette wheel randomly.Sixteen solutions were obtained for each fitness function.They were obtained by using varying parameters of population size, mutation ratio and crossover ratio.Table 2 summarizes the variations in parameters.

Liquefaction data
A database has been constructed from CPT and laboratory data of 242 case studies.The data consist of in situ case studies from different regions of the world collected by several researchers (Youd and Bennet, 1983;Arulanandan et al., 1986;Shibata and Teparaksa, 1988;Bennet, 1989Bennet, , 1990;;Tuttle et al., 1990;Kayen et al., 1992;Charlie et al., 1994;Mitchell et al., 1994;Suzuki et al., 1995;Stark and Olson, 1995;Boulanger et al., 1997;Toprak et al., 1999;Olson, 2001).The database includes an equal number of liquefied and non-liquefied randomly selected cases.In the overall dataset, 200 cases were used for training and 42 cases were used for testing.Dataset separation into training and testing sets are based on random selection.The same datasets are used throughout the study.Upper and lower limits of the parameters used in the dataset are given in Table 3. Training and testing data are given in Appendix A and B, respectively.

GA solutions
For the two run series, 32 different LI functions were developed.For the best two solutions of each series, a number of mis-estimations and the best fitness function values of F1 (RMSE) and F2 (modified RMSE) are given for training and test data in Table 4. S2 has the best average performance.S1 showed poor performance in terms of both number of mis-estimations and RMSE.This is mainly because of inefficiency of the selected fitness function.
The best LI function in terms of RMSE is S2M6, the formulation of which is given in Eq. ( 4).It has the minimum number of mis-estimations and has the best RMSE for training and overall datasets.S2M8 also showed a similar performance in terms of RMSE however, its number of misestimations is a bit higher than S2M6.Therefore, the S2M6 function is proposed for this study.If the LI values calculated by this formulation are greater than 0.5, they indicate a high probability of liquefaction, whereas smaller values stand for non-liquefaction cases.LI = −5.13• SSSR 4.39 7.5 + 2.29 In Fig. 4, the performance of Robertson and Wride's (1998) formulation is tested with the training dataset used in this study.Although the method gives reasonable results for liquefied cases, non-liquefied cases are badly estimated in general.In total, 39% of the cases were mis-estimated by the formulation.This may introduce safer results, however, such mis-estimations may cause an 0.9213 1.1586 2.7525 125.402 87.783 7.0065 0.300 6.539 increase in costs for liquefaction mitigation works.The total number of mis-estimated cases (7.8%) by the suggested method is quite a bit lower when compared to Robertson and Wride's (1998) method, which is widely used in the literature.

Parametric study
The S2M6 equation, which has the best performance of genetic algorithm solutions, was used for the parametric study.
In order to run the parametric model, reference data, representing the average soil conditions of the dataset is established.The reference parameters are listed in Table 5. Earthquake magnitude is taken as M w =7.5 to remove the magnitude correction factor in the SSSSR value.
In the parametric study, it has been examined how the variations in mean grain size (D 50 ), groundwater level (GWT), tip resistance (q c ), and maximum ground acceleration (a max ) affect the liquefaction index (LI).Figure 5 illustrates the results of equation S2M6.The figure demonstrates that if D 50 is greater than 0.2 mm, the LI rises with increasing acceleration values.However, the LI value falls below 0.5 if D 50 is smaller than 0.15 mm (Fig. 5a).In fact, LI values for soils with D 50 smaller than 0.2 mm are uncertain as the LI does not increase for greater a max values.
According to the proposed formulation, increasing clay and silt content reduces the LI and liquefaction susceptibility.The LI values increase up to D 50 value of 0.4 mm, which are evidence of higher sand content in soil.
The formulation allows calculating the LI for different levels of a specific borehole location.Therefore, many LI values can be calculated for a borehole.According to the pro-posed formulation, GWT do not play a crucial role over a critical value for the liquefaction susceptibility at a specified level.For example, LI values in Fig. 5b are plotted for LI of soils at a depth of 6.54 m from the ground level, while GWT depth varies.For this case, there is not a noticeable change at LI values for GWT depths between 0 and 3.6 m.Then, LI value dramatically reduces for GWT values deeper than 3.6 m.That is, the LI value for GWT=2 m is greater than GWT=4 m.The study, which encompasses several cases in different depths, shows that GWT does not have any effect on LI, if the ratio of GWT depth to soil level, for which the LI value is calculated, is lower than 0.56.Contrary to that, the LI radically decreases when the ratio is higher than 0.56.While the ratio of GWT depth to soil level is getting closer to 1.0, which means soil level where LI is calculated is near to the GWT, the LI tends to go lower than 0.5.
Figure 5c illustrates the relation between LI and tip resistance.As is expected, the LI decreases with increasing tip resistance.
According to the parametric study, there is no discrepancy between the results of the parametric study and the known physical behaviour of liquefaction.Although there are some studies that mention liquefaction cases in clay or silty soils (Ishihara, 1984(Ishihara, , 1985(Ishihara, , 1993)), the liquefaction hazard certainly reduces with increasing clay or silt content (Wang, 1979), which is also the case for a max levels of 0.5 g according to the proposed formulation.Ground water is also an essential input for liquefaction phenomena.The formulation shows no certain liquefaction above the level of GWT.Of course, it is not possible to claim that formulation fully characterises the actual behaviour.However, it does not have an important discrepancy and can be used for liquefaction assessment.

Results and conclusions
This study suggests a new computing method of the liquefaction index (LI) by a GA approach based on CPT data.LI, which is computed by SSSSR, SSSR 7.5 , D 50 , a max , r d , σ vo , σ vo , q c , GWT and z gives an index value that declares if liquefaction potential exists or not.LI stands for no liquefaction when the value is lower than 0.5 or vice versa.
The mis-estimation ratio of the model is 7.5% in training and 9.5% in test data.Robertson and Wride's method (1998) is selected as a benchmark for comparison as it is widely used for liquefaction estimation.The proposed model in this study provides better estimates.The parametric study of the developed model shows agreement with the expected soil behaviour.
On the other hand, it should be noted that the method may be misleading if it is used out of dataset limits.Another important point is that the GA software (GALIQ) was run to fit a function to get either 1 or 0 from the inputs.Therefore, LI values less than 0.5 stand for no liquefaction (0) the others stand for liquefaction (1).This means that any LI value less than 0.5 means no liquefaction, whether it is 0.4 or 0.1.Values greater than 0.5 all have the same meaning, i.e., liquefaction hazard.Therefore, LI=0.2 actually does not imply safer conditions than LI=0.4.It may give misleading results if used for hazard categorization (like high, medium or low hazard), as it only categorizes soils as liquifiable or non-liquifiable.
The number of parameters involved in LI calculation includes many parameters.Some of them (for example, a max or z) are to be defined by the user to calculate the LI for a specific depth and a max level.The others represent site characteristics.However, to determine all of the parameters, many testing techniques are required.For instance, q c can be determined by CPT tests, but D 50 can not.This will certainly increase the cost of the liquefaction assessment as many different techniques are to be applied at the site to use the method.
Although the method has some difficulties, LI is a good measure for the assessment of liquefaction potential according to results of this study.

Appendix A
Training data set.

Fig. 3 .
Fig. 3. Probable values of X i , f i and t i variables.

Table 1 .
Runned series for GA models.

Table 3 .
Minimum, maximum and average values of parameters used in dataset.

Table 4 .
Performance of the best two solutions in each series.

Table 5 .
The reference soil characteristics in parametric study.