Risk assessment of liquefaction-induced hazards using Bayesian 1 networks based on standard penetration test data

Liquefaction-induced hazards such as sand boils, ground cracks, settlement, and lateral 13 spreading are responsible for considerable damage to engineering structures during major earthquakes. 14 Presently, there is no effective empirical approach that can assess different liquefaction-induced hazards 15 in one model. This is because of the uncertainties and complexity of the factors related to seismic 16 liquefaction and liquefaction-induced hazards. In this study, Bayesian networks (BNs) are used to 17 integrate multiple factors related to seismic liquefaction, sand boils, ground cracks, settlement, and 18 lateral spreading into a model based on standard penetration test data. The constructed BN model can 19 assess four different liquefaction-induced hazards together. In a case study, the BN method outperforms 20 an artificial neural network and Ishihara and Yoshimine’s simplified method in terms of Accuracy, 21 Brier score, Recall, Precision, and area under the curve of receiver operating characteristic (AUC of 22 ROC). This demonstrates that the BN method is a good alternative tool for the risk assessment of 23 liquefaction-induced hazards. Furthermore, the performance of the BN model in estimating 24 liquefaction-induced hazards in Japan’s Northeast Pacific Offshore Earthquake confirms its correctness 25 2 and reliability compared with the liquefaction potential index approach. The proposed BN model can 1 also predict whether the soil becomes liquefied after an earthquake and can deduce the chain reaction 2 process of liquefaction-induced hazards and perform backward reasoning. The assessment results from 3 the proposed model provide informative guidelines for decision-makers to detect the damage state of a 4 field following liquefaction. 5

and reliability compared with the liquefaction potential index approach.The proposed BN model can also predict whether the soil becomes liquefied after an earthquake and can deduce the chain reaction process of liquefaction-induced hazards and perform backward reasoning.The assessment results from the proposed model provide informative guidelines for decision-makers to detect the damage state of a field following liquefaction.

Introduction
The prediction of liquefaction potential and assessment of liquefaction-induced hazards are two significant and closely related problems.The former aims to determine whether the soil becomes liquefied after an earthquake, whereas the latter not only needs to predict whether liquefaction-induced hazards occur after soil liquefaction but also assess the severity of different hazards induced by liquefaction.The prediction of liquefaction potential in foundation soils is only the first step in assessing liquefaction hazards.This has been well studied in recent decades, such as by simplified methods (Seed andIdriss 1971, 1982;Starks and Olsen 1995;Stokoe and Nazarian 1985) based on standard penetration test (SPT), cone penetration test (CPT), and shear wave velocity measurements, laboratory testing, numerical methods, as well as empirical liquefaction models (Goh 1994;Pal 2006;Toprak et al. 1999) based on historical data.What is more important to engineers is the effect of liquefaction-induced hazards on foundations or superstructures after seismic liquefaction, although relatively few studies have focused on this issue (Juang et al. 2005).
Field evidence of liquefaction-induced hazards in historical earthquakes mainly consists of sand boils, ground cracks, the settlement and tilting of structures, and lateral spreading failures.Several methods have been proposed to quantify these hazards, including numerical simulations, laboratory tests, and field testing.Although recent advances in physical model experiments and the computational modelling of liquefaction-induced ground deformation are quite promising, there are some critical unresolved problems.For instance, without a perfect physical numerical model for totally describing the complicated mechanic characteristics of soils, it is expensive and difficult to obtain and test high-quality undisturbed samples of loose sandy soils.Therefore, empirical liquefaction models based on historical earthquake databases are best suited to providing a simple, reliable, and direct means of assessing liquefaction-induced hazards in the field of geotechnical earthquake engineering (Zhang et al. 2002).In terms of empirical liquefaction methods, the liquefaction potential index (LPI) has been used to characterize liquefaction-induced hazards worldwide (Iwasaki et al. 1982).Several subsequent approaches built on the LPI, such as the damage severity index (DSI) (Juang et al. 2005), which evaluates the severity of liquefaction-induced ground damage at or near foundations, the Ishihara-inspired LPIISH, extended by Maurer et al. (2015), which was found to be consistent with observed surface effects and showed improvement over LPI in mitigating false-positive predictions, and the liquefaction severity number (LSN) developed by Tonkin and Taylor (2013), which was developed using liquefaction damage observations from the Canterbury Earthquake Sequence to reflect the damaging effects of shallow liquefaction on residential land and foundations.In addition, generalized analytical or empirical techniques for estimating a single type of ground failure (e.g.settlement or lateral spreading) induced by liquefaction have been proposed in recent decades (Youd and Perkins 1987;Youd et al. 2002;Goh and Zhang 2014;Ishihara and Yoshimine 1992;Zhang et al. 2002;Wu and Seed 2004;Cetin et al. 2009;Juang et al. 2013).With the rapid development of computer technology and mathematical techniques, many new artificial intelligence methods for assessing liquefaction-induced ground deformation have been developed based on historical data (Wang and Rahman 1999;Baziar and Ghorbani 2005;Javadi et al. 2006;Garcia et al. 2008;Rezania et al. 2011).
However, these methods cannot assess sand boils and ground cracks, and can only estimate a single type of hazard, e.g.lateral spreading or settlement.Because there is no generic model for calculating or assessing sand boils, ground cracks, lateral spreading, and settlement simultaneously, and then evaluating the overall severity of hazards induced by liquefaction after an earthquake, it is necessary to develop a framework for assessing all types of liquefaction-induced hazards at a given site following an earthquake.The primary objective of this paper is to use Bayesian network (BN) methods to integrate soil liquefaction, LPI, the four types of hazards (ground cracks, sand boils, lateral spreading, and settlement) induced by liquefaction, and the severity of liquefaction-induced hazards (describing the overall situation of a site) into one model based on historical SPT data.This would allow us to deduce the chain reaction process of hazards, from an earthquake event to seismic liquefaction to liquefaction-induced hazards, thus enhancing the existing simplified methods that only assess one single liquefaction-induced hazard.The BN model is trained and tested separately using two different real-world datasets.The results given by the BN model for the evaluation of liquefaction-induced hazards are compared with those from an artificial neural network (ANN) model to verify the effectiveness and robustness of the proposed approach.
The remainder of this paper is organized as follows.In Section 2, we explain why the BN method is used to assess the hazards induced by seismic liquefaction.The construction of a BN model for liquefaction-induced hazards is presented in Section 3. Section 4 describes the case study used to verify the effectiveness and robustness of the BN model against an ANN model and Ishihara and Yoshimine's simplified method (Ishihara and Yoshimine, 1992), and defines the performance indexes used in the comparison.In Section 5, the advantages and results of the BN model are discussed in comparison with those of the ANN model and the LPI method.In Section 6, the BN model is applied to evaluate the hazards induced by liquefaction in the 2011 Tohoku earthquake in Japan.Section 7 discusses the results obtained in this study, and Section 8 presents our conclusions and ideas for future work.

Why use a Bayesian network?
The assessment of liquefaction-induced hazards is a complex engineering problem because of the heterogeneous nature of soils, the large number of factors involved, and the uncertainties associated with these factors.The existing methods were either developed statistically or could only assess one type of hazard, such as settlement or lateral spreading.Additionally, they do not consider the effects of uncertainties on the model performance, especially the purely data-driven approaches, which ignore the effects of empirical knowledge or domain knowledge on the assessment of liquefaction-induced hazards.
However, the latest developments in BN technology can combine empirical knowledge and historical data to provide new opportunities to develop better tools for complex problems in probabilistic terms, such as the problem of liquefaction-induced hazards.BNs are one of the most effective theoretical models for knowledge representation and reasoning under the influence of uncertainty and highly non-linear relationships among variables (Pearl 1988).Firstly, BNs offer a rational and coherent theory under the condition of various uncertainties (e.g.uncertainties in parameters, models, and domain knowledge) and complexities that are described in terms of subjective beliefs or probabilities to reflect the interdependent relationship between variables.Moreover, they can integrate different types of domain knowledge and multi-source information or various quantitative and qualitative factors into a consistent system, and facilitate multiple hazards and their interdependencies within a single model.In particular, this allows not only sequential inference (from causes to results) but also reverse inference (from results to causes) under conditions of complete and even incomplete data, and provides an efficient framework for the probabilistic updating and assessment of component performance when new evidence emerges.
In recent decades, BNs have been widely applied for risk analysis in the field of engineering, such as for catastrophic risk (Li et al. 2010a;Li et al. 2010b;Li et al. 2012), earthquake risk damage (Bayraktarli et al. 2005;Bayraktarli and Faber 2011;Bensi et al. 2009 and2014), embankment dam risk (Zhang et al. 2011;Xu et al. 2011;Peng and Zhang 2012), landslide hazards (Song et al. 2012;Liang et al. 2012), and soil liquefaction (Bayraktarli 2006;Hu et al. 2015).However, the application of BNs in assessing liquefaction-induced damage has never been reported.An important sign is that the number of relevant publications in this field over the period 2001-2015 (obtained by querying 'BN' and 'risk analysis' in the Web of Science database) increased from 3 to 50 (as shown in Fig. 1).In the past five years, BN technology has become popular with engineers and researchers for the assessment of risk.BN techniques are known to be a robust method for risk analysis.

Probabilistic reasoning of BNs
BNs combine graph theory and statistics using arcs or links with conditional probabilities.The inference algorithms are based on the Bayesian rule, chain rule, and conditional independence rule as follows: where P(Y) is the prior probability, P(X|Y) is one's belief in hypothesis X upon observing evidence Y, which is known as the posterior probability, and P(Y|X) is the likelihood that Y is observed if X is true.
 is a set of values for the parents of Xi.
A generic BN model for liquefaction-induced hazards (as shown in Fig. 2) is constructed with domain  3) that considered 12 factors: the magnitude of the earthquake (ME), epicentral distance (ED), duration of the earthquake (DE), peak ground acceleration (PGA), fines content (FC), soil type (ST), average particle size (D50), SPT number (SPTN), vertical effective stress (σv'), groundwater table (GT), depth of soil deposit (DSD), and the thickness of the soil layer (TSL).In terms of seismic parameters, the liquefaction potential will increase with increases in ME, DE, and PGA, and lower values of ED.In terms of soil parameters, the anti-liquefaction behaviour of the soil is strongly related to the FC value: as FC increases up to 30%, the liquefaction strength decreases, but when FC exceeds 30%, the liquefaction strength increases with FC; when FC>50% (silt and sandy silt), the soil is hardly liquefied.In addition, the FC value determines the type of soil.
Normally, purified clay and silt cannot be liquefied, whereas poorly graded sand and silty sand are easily liquefied.The bigger the average particle size, and the bigger the SPT number, the smaller the probability of soil liquefaction.In terms of field conditions, deeper soil deposits have greater vertical effective stress.This is more difficult for the increase in pore water pressure to overcome, so soil liquefaction cannot easily occur.In addition, a shallow groundwater table and thin soil can partly reduce the probability of soil liquefaction.Thus, a state node (LPI) and output nodes (sand boils, ground cracks, lateral spreading, settlement, and SLH) should be added to the existing BN model of liquefaction potential (shown in Fig. 3) based on the generic BN model in Fig. 2. A new BN model for liquefaction-induced hazards (shown in Fig. 4) was constructed according to domain knowledge of the hazards in Table 1.The ground slope, which affects GC and LS, was not considered in the BN model of liquefaction-induced hazards because associated data were not collected in the present study.
Earthquake liquefaction-induced hazards are a chain reaction, originating with the earthquake event and proceeding to soil liquefaction and its pertinent hazards.Different input values result in different liquefaction states and different degrees of liquefaction.The outputs of the former system (e.g.LP) are used as input information for the latter system, resulting in different hazard events (e.g.sand boils, lateral spreading).The whole process of earthquake liquefaction-induced hazards can be described as follows: at the beginning of an earthquake, the earthquake parameters, soil characteristics, and field conditions are considered as control variables, and their prior probabilities are calculated by parameter learning.The posterior probability of the output variable (e.g.LP) can then be inferred to estimate whether an event could be triggered.If the event occurs, its conditional probability is replaced by the posterior probability, which is considered as the evidence variable for input.Finally, a posterior probability of the latter event (e.g.LP) is calculated using the new conditional probability of the former event to estimate its grade.The above process is repeated until the grades of all hazard events have been identified.
4 Case study

Dataset
In this study, the dataset consists of 442 SPT borings from post-earthquake in-situ tests at liquefied ( 245SPT borings) and non-liquefied (197 SPT borings) sites in Taiwan, Japan, and the USA.Of these, 332 SPT borings (184 liquefied sites and 148 non-liquefied sites) were used to train the BN model, and the remaining 110 SPT borings were used to test the effectiveness and robustness of the BN model.Because of incomplete data (e.g. the proportion of missing data for D50 is ~15.2%, the proportion of missing data for vertical effective stress is ~29.4%, and the proportion of missing data for the thickness of soil layer is ~38.9%), an expectation-maximization (EM) algorithm (Lauritzen 1995)  Daly City (California, USA) earthquake (Mw=5.3)and the 1987 Whittier Narrows (USA) earthquake (Mw=5.9)were taken from Cetin et al. (2000).Data from the 2011 Tohoku earthquake in Japan (Mw=9.0) were provided by the Research Center for the Management of Disasters and the Environment at Tokushima University, Japan.The observed liquefaction effects induced by these earthquakes include sand boils, settlement, ground cracks, and lateral spreading (as shown in Fig. 5), resulting in the destruction of cropland, blocking of channels, and severe damage or collapse of many buildings, highways, bridges, harbour facilities, and other infrastructure components.
The grading standard of liquefaction and liquefaction-induced hazards according to domain knowledge is presented in Table 2, e.g.LPI is divided into four grades according to Iwasaki (1982), non-liquefaction (LPI = 0), slight liquefaction (0 < LPI ≤ 5), moderate liquefaction (5 < LPI ≤ 15), and serious liquefaction (LPI > 15).SLH is divided into four grades according to disaster experience in the field of engineering, as described in Table 3.According to the descriptions of SLH, a statistical summary of liquefaction-induced hazard data is presented in Fig. 6.It can be seen that (1) liquefaction does not have to induce hazards, but the occurrence of liquefaction-induced hazards is based on liquefaction; (2) LPI is not a good index for describing the severity of liquefaction-induced hazards, because the efficacy of the LPI framework and accuracy of derivative liquefaction hazards are uncertain, e.g.serious liquefaction according to LPI occurs in the absence of SLH (see Fig. 6 (1)) and slight liquefaction according to LPI occurs when severe SLH are observed (see Fig. 6 (4)).As a rule, the bigger the LPI, the greater the severity of the corresponding liquefaction-induced hazards; (3) SB, S, GC, and LS are macroscopic phenomena of liquefaction-induced hazards, and there is a trend that the bigger the values of these indexes, the more severe the SLH; (4) the classifications for the four different types of hazards in Fig. 6 almost accords with the descriptions of the field ground damage status in Table 3.
Fig. 7 shows ratios of all influence factors for the severe status of the SLH.It is easily seen that most severe damage sites suffered from big or super earthquakes with long loading, their epicentral distances were close to the earthquake sources, and their PGA was sufficiently high.As for soil characteristics, pure sand or silty sand with moderate FC and moderate D50 values result in severe damage, unlike most sites with gravelly soil and sandy silt.The damage phenomena indicate that, even though gravel and sandy silt are not easily liquefied, if the earthquake is sufficiently strong to cause liquefaction, severe damage can be expected.The small SPTN means that the sandy soil is so loose that settlement and lateral spreading are more likely after liquefaction, because loose sand is more easily compressed and flows better during seismic liquefaction.As for field conditions, the shallow sandy soil layer has low effective stress and the groundwater table is near to the ground surface.Such zones are likely to suffer from severe damage.The above laws fit well with practical engineering experience.The sum of ratios of three variables, such as D50, σv', and the thickness of the soil layer, is not equal to 1 because of the missing data mentioned at the start of this section.

Performance indexes
In this section, to comprehensively evaluate the performances of the two probabilistic models for liquefaction-induced hazards, several performance indexes are introduced.These are the Accuracy, Prediction, Recall, area under the curve of the receiver operating characteristic (AUC of ROC), and Brier score.The details of these indexes are described as follows.
The Accuracy is a measure of the percentage of correctly classified instances for each class.This metric is widely used for measuring the overall performance of a classifier.For instance, an Accuracy of 0.9 indicates that 90% of the data can be correctly classified.However, it does not mean that the accuracy of each class is 90%; the accuracy of one class may be high, whereas that of the others may be very low.The Brier score (Brier 1950) is used to measure the quality of probabilistic forecasts for discrete events.
Suppose that on each of n occasions, an event can occur in only one of r possible classes.On the ith occasion, the forecast probabilities that the event will occur in classes 1, 2,3, , r are 12 , , , respectively.The Brier score (B) is then defined by where 1 1, 1,2,3, , . ij E takes a value of 1 or 0 according to whether the event occurred in class j or not.For instance, in the case study described in this paper, 110 SPT borings are used for testing (n = 110), SLH has four classes (none, minor, medium, and severe; r = 4), and a probability or confidence statement ( ij f ) is given for each SPT boring instance.The Brier score ranges from 0-2, where B = 0 denotes a perfect prediction and B = 2 denotes the worst possible prediction.

Comparison of predictive results
Table 4 compares the predictive results given by the BN model (see Fig. 4) and the ANN model using the same parameters.In terms of Accuracy, except for LS, the BN model scores higher than the ANN model for the other types of hazards and SLH, and comparing the Brier score, the BN model scores lower than the ANN, except for LS and SLH.These results indicate that the overall performance of the BN model is better than that of the ANN model.As for each type of hazard induced by liquefaction and SLH, the Recall, Precision, and AUC of ROC scores obtained by the BN model for each class are generally higher than those of the ANN, which also suggests that the BN model is better than the ANN model.Therefore, the proposed BN approach is better than the ANN technology, and its performance is acceptable for monitoring and forecasting seismic liquefaction-induced hazards.In addition, in terms of the computation time, the BN model (using the EM algorithm) outperforms the ANN model (containing 20 hidden layers and using a radial basis function), requiring 36 iterations (about 19.8 CPU s) to converge to a stable state.This convergence rate is faster than that of the ANN model.
Furthermore, there are no effective simplified methods for estimating ground cracks and sand boils, and simplified methods for calculating lateral spreading (Bartlett and Youd 1995, Wang and Rahman 1999, Goh et al. 2014) require the free face ratio or ground slope, which were not included in the data collected for this study.Therefore, ground cracks, sand boils, and lateral spreading cannot be estimated by simplified methods.However, settlement can be calculated by the simplified method proposed by Ishihara and Yoshimine (1992), hereafter referred to as the I&Y method.Table 4 clearly indicates that the predictive results of data-driven methods such as BN and ANN are better than those of the simplified I&Y method, but the simplified approach gives a specific value (as shown in Fig. 8), rather than an interval value or probability.In addition, the simplified method is constructed using only the relationships among the relative density, the factor of safety against liquefaction (FL), and the volumetric strain (εv).The factor of safety against liquefaction is obtained by integrating the earthquake intensity and SPTN using empirical formulas or empirical coefficients, and thus may introduce calculation errors that result in considerable prediction errors, such as the small settlement predicted in Table 4, where the precision of the simplified method is only 0.069.However, the data-driven methods integrate multiple factors of liquefaction-induced hazards into a model, thus providing better predictive performance than simplified methods.

Causal reasoning using the BN model
Based on the developed BN model, the probabilities of the liquefaction-induced hazards were inferred through causal reasoning.The third column in Table 5 lists the posterior probabilities of all grades of LP, LPI, and its induced hazards.It can be seen that when the input variables regarding earthquake parameters, soil characteristics, and field conditions are unknown, the probabilities of all grades of each output variable are similar, except for LS and SB, which have a serious imbalance in the data of different grades.However, when a site is determined to be liquefied and the probability of a positive LP status becomes 100%, the fourth column shows that the probabilities of LPI = 'none' and all hazards decrease to some extent while the probabilities of other LPI states and all hazards increase significantly.
Furthermore, if the site is seriously liquefied, the probability of LP = 'yes' and LPI = 'serious' becomes 100%, as seen in the fifth column of Table 5.The probabilities of all grades (except 'none') for all hazards continue increasing, with GC occurring with 66.1% probability, serious sand boils occurring with 69.6% probability, big LS occurring with 9.5% probability, big settlement occurring with 49.8% probability, and severe SLH occurring with 64.1% probability.This shows that liquefaction-induced hazards are much more severe at seriously liquefied sites.Macro-liquefaction phenomena, such as GC and serious SB, are also observed, and the probabilities of the 'big' status in other hazards continue to increase slightly, as seen in the sixth column of Table 5.Thus, the predictive results are close to the actual situation.Therefore, according to the above deduction process, the BN model can calculate the posterior probability of LP based on the conditional probabilities of input variables for estimating whether a site is liquefied or not.If it is liquefied, its posterior probability will be considered as input information for predicting the latter variable.Such reasoning gives all predictive results of liquefaction-induced hazards.In addition, when the prior probabilities of all input variables, such as the earthquake parameters, soil characteristics, and field conditions, have been determined in advance, the predictive performance for all hazards will improve significantly.For instance, consider a site that suffered a long-duration super earthquake.Surveys show that the SLH is severe with big settlement, no lateral spreading, serious sand boils, and ground cracks.The input variables of the site indicate that the epicentral distance is near, the PGA is higher, the soil type is sand with some fine particles, the D50 value is medium, the SPT number shows that the sand is loose, the σv' value is small, the groundwater table is shallow, and both the depth and thickness of the sand layer are moderate.The reasoning probability value of LP is 99.9%, LPI is identified as serious with 43.8% probability, and GC has a 51.4% probability of not occurring, which does not match the survey results.According to the input information, SB is identified as 'many' with 76.5% probability, LS is identified as 'none' with 85.0% probability, settlement is identified as 'big' with 53.1% probability, and SLH is identified as 'severe' with 52.6% probability.The site is then determined to be a liquefied area with serious liquefaction degree, so LP should be 100% and the probability of LPI = serious should also be 100%.The probabilities of all hazards will also change.GC occurs with 100% probability, which matches the survey results, LS is identified as 'none' with 100% probability (an increase of 15%), settlement is identified as 'big' with 100% probability (an increase of 46.9%), and SLH is identified as 'severe' with 100% probability (an increase of 47.4%).

Diagnostic reasoning using the BN model
To detect situations that are more likely to result in severe damage, the most probable explanations of LP (Yes), LPI (Serious), GC (Yes), SB (Many), LS (Big), and S (Big) are inferred using the diagnostic reasoning capabilities of the BN model.The results are presented in Table 6.It can be seen that loose silty sand (medium D50) containing moderately fine particles deposited shallowly (small σv') on a site with a low underground water level is more likely to suffer from liquefaction following a super earthquake of moderate duration and moderate epicentral distance.The most probable explanations for GC and SB = 'many' are the same as those for LP under conditions of serious or moderate soil liquefaction, but the most probable explanations for LS = 'big' and S = 'big' are slightly different from those of LP in terms of PGA and soil type.The reason is that LS = S = 'big' requires more seismic intensity than occurrences of sand boils and ground cracks, and sand flows more easily and undergoes greater compression after liquefaction than sand containing fine particles.In addition, LS = S = 'big' is often accompanied by many sand boils, whereas ground cracks may or may not occur.The above results agree with the analysis results in Fig. 7.In addition, if the soil characteristics, field conditions, and hazards are known, the earthquake intensity (magnitude of earthquake, duration of earthquake, PGA, and epicentral distance) resulting in liquefaction-induced hazards can be estimated using the backward inference ability of the BN method, which provides some references for aseismatic design.

Sensitivity analysis of liquefaction-induced hazards
Sensitivity analysis detects how much each factor impacts on the target variable.In this section, mutual information is used to assess the sensibility, which is a measure of the mutual dependence between two variables.The mutual information results for different liquefaction-induced hazards were computed separately in the BN model; the results are presented in Table 7.The thickness of the soil layer is the most sensitive variable for GC, and the relatively important factors are the depth of the soil layer, D50, and the duration of the earthquake.For SB, the groundwater table is the most sensitive variable, and the relatively important factors are the thickness of the soil layer, SPTN, duration of the earthquake, PGA, depth of soil layer, and σv'.For S, PGA is the most sensitive variable, and the relatively important factors are SPTN, the duration of the earthquake, and the depth of the soil layer.For LS, PGA is again the most sensitive variable, and the relatively important factors are D50, the thickness of the soil layer, the depth of the soil layer, and the soil type.These results are highly consistent with the domain knowledge in Table 1.Comparing the most sensitive factors and relatively important factors of the four types of liquefaction-induced hazards and SLH, the duration of earthquake, PGA, SPTN, depth of soil deposit, and the thickness of the soil layer are more important than the other factors, because they are present for more than three items.In these five factors, a combination of SPTN and the earthquake intensity (described by the duration of the earthquake and PGA) can detect the degree of soil liquefaction.The depth of the soil deposit and the thickness of the soil layer combine with the relative density (determined by SPTN) based on the degree of soil liquefaction to give the soil volumetric strain.
Consequently, liquefaction-induced hazards, e.g.settlement and lateral spreading, can be estimated.
Therefore, to mitigate seismic liquefaction-induced hazards, we can neglect the relative density of sandy soil, as the depth of the sandy soil deposit and the thickness of the sandy soil layer are the crucial factors.

Application of the BN model
The BN model described above was applied to assess the liquefaction-induced hazards in Japan's Northeast Pacific Offshore Earthquake of 11 March 2011.The research regions are Ibaraki prefecture, Chiba prefecture, Saitama prefecture, Kanagawa prefecture, and Tokyo city, which contain 196 investigation sites.The assessment results of the SLH are shown in Fig. 9, in which the blue circle denotes little to no liquefaction-induced hazards, the green circle denotes minor liquefaction-induced hazards, the yellow circle denotes medium liquefaction-induced hazards, the orange circle denotes severe liquefaction-induced hazards, and the red circle denotes a prediction error.In the 196 real fields, the prediction accuracies of the four types of liquefaction-induced hazards are 99.50% (lateral spreading), 81.63% (sand boils), 80.61% (settlement), 89.8% (ground cracks), and 84.1% (SLH).In addition, the prediction accuracies of the four different levels of SLH (Little to none, Minor, Medium, and Severe) are 79.83%,84.62%, 81.25%, and 79.83%, respectively, which demonstrate the validity of the BN model in general.The prediction accuracies of the LPI approach (Iwasaki et al. 1982) for the four different levels of SLH were found to be 36.96%,8.82%, 68%, and 42.22%, respectively, which are much worse than the prediction results of the BN model.
In this earthquake, areas with greater losses and a larger number of liquefaction sites are located in Ibaraki prefecture and Tokyo city, which are closer to the sea than the other places.These two regions contain 78 sites with different degrees of hazards, including approximately 50 sites where medium or severe disasters occurred.From Table 3, it is apparent that sites suffering medium or severe disasters were subject to sand boils, ground cracks, lateral spreading, and settlement, resulting in foundation failure.These foundation failures caused further damage to buildings and bridges to collapse.Therefore, the BN model of assessing liquefaction-induced hazards not only accurately assesses the range of lateral spreading and settlement, the quantity of sand boils, and the likelihood of ground cracks, but also accurately predicts the severity of hazards induced by liquefaction.It then qualitatively assesses disasters that may occur to buildings or other structures according to engineering experience regarding foundation damage and structural collapse.These results provide engineering guidelines for the prevention and mitigation of structural issues following natural disasters.

Discussion
This paper described a probability model for liquefaction-induced hazards using BN technology.As a means of probabilistic inference, BNs offer several specific advantages over other methods in the evaluation of catastrophes, and can support a good platform for integrating different kinds of hazards and their interdependencies into a consistent system (Li et al. 2010b).However, existing empirical methods for estimating hazards induced by seismic liquefaction can only assess a single type of ground failure and cannot predict ground cracks and sand boils (e.g. the empirical formulas constructed by Youd et al. (1987Youd et al. ( , 2002)), the multiple linear regression (MLR) model constructed by Goh and Zhang (2014) for estimating lateral spreading, and the different simplified procedures for estimating the settlement proposed by Ishihara and Yoshimine (1992), Zhang et al. (2002), Wu and Seed (2004), and Juang et al. (2013)).The LPI approach can quantify the liquefaction severity of a site by providing a unique value for the entire soil column instead of several safety factors per layer.However, calibrating LPI to determine the liquefaction severity is difficult, and the efficacy of the LPI framework and accuracy of derivative liquefaction-induced hazards are uncertain (Maurer et al. 2014).When the LPI value is big (LPI>15), the phenomena of settlement and ground cracks may not occur, but when the LPI value is small (LPI<5), serious, long-duration sand boils and wide-scale lateral spreading with severe subsidence occur.Thus, the real SLH are largely inconsistent with the prediction results of the LPI approach, as demonstrated by Fig 6 and the prediction results in Section 6.In fact, LPI only reflects the degree of liquefaction at a site and cannot detect real situations of ground damage.As the relation between LPI and the types of liquefaction-induced hazards has not been examined systematically, it is possible that there may be a qualitative relation to some extent.
Comparing the BN method with the ANN method, although both use supervised learning, the BN method is a generative model, whereas the ANN method is a discriminative model.Therefore, the BN method can obtain the joint probability distribution of the parameters, enabling it to describe distributions of data in statistical terms and drawing on a strong probabilistic theory.This results in an objective interpretation and faster computation times than discriminative models such as the ANN method.Even when the sample size increases, the BN method gives rapid convergence to the true model.When the data contain hidden parameters, the BN method can still develop a robust model, but the ANN method cannot (Correa et al. 2009).In the BN model, each node denotes a random variable that has actual meaning and the link between two nodes implies causation; in contrast, the nodes in the ANN model are not random variables and have no actual meaning, with the links between nodes simply denoting a weighted functional relationship, such as causation or a logistical relationship.This makes it difficult to explain the results given by the ANN model.In addition, except for predicting the different hazards induced by liquefaction, the constructed BN model can predict the liquefaction potential: the Accuracy of liquefaction potential using the test data in this study was 0.80.Using the ANN technology, a new model should be constructed by studying the training data to predict the Accuracy of liquefaction potential, whereas the BN model can make direct predictions without retraining.In particular, the BN method can reason forward and backward to assess the hazards induced by liquefaction with given earthquake parameters, soil parameters, and field conditions, or to determine the likely soil properties and field conditions once the hazards are known after an earthquake; the ANN method offers only forward reasoning.

Conclusion and future work
Given the uncertainty and complexity of liquefaction-induced hazards, this paper described a generic BN model for estimating the risk of different hazards induced by seismic liquefaction based on historical disaster data.This model provides a platform for integrating a variety of information sources from different fields and combines the different hazards induced by liquefaction into a single model.
The findings reported in this paper are as follows: (1) Compared with ANN technology using several performance indexes, the BN model achieves better Accuracy and a better Brier score for overall performance, and gives better Recall, Precision, and AUC of ROC for each damage state (e.g.sand boils, settlement).The computation time of the BN model is faster than that of the ANN method.This illustrates that the BN method is suitable for risk assessment of liquefaction-induced hazards influenced by multiple complex factors.Compared with the simplified I&Y method for estimating settlement, the data-driven methods (BN and ANN) were found to be superior.Furthermore, the performance of the BN model in estimating liquefaction-induced hazards in Japan's Northeast Pacific Offshore Earthquake demonstrates its correctness and reliability compared with the LPI approach.
(2) The BN model can deduce the process of a chain reaction of liquefaction-induced hazards and perform backward reasoning, such as inference from input variables (earthquake parameters, soil characteristics, and field conditions) to soil liquefaction to different hazard events, or from soil liquefaction to different hazard events to input variables.In addition, the most probable explanations for LP, Serious LPI, GC, many SB, Big LS, and Big S in the BN model were determined.This analysis showed that loose silty sand or sandy soil (medium D50) containing moderated fine particles deposited shallowly (small σv') on a site with a low underground water level is more likely to suffer liquefaction and the resulting hazards in the event of a super earthquake of moderate duration and epicentral distance.
(3) A sensitivity analysis of the various liquefaction-induced hazards indicates that the most sensitive factors are hazard-specific.The duration of the earthquake, PGA, SPTN, depth of soil deposit, and the thickness of the soil layer are more important than other factors; these factors contribute to the soil volumetric strain.
Because the occurrence of liquefaction may cause no damage, little damage, or severe damage to the ground surface or infrastructure, the BN model constructed in this study represents an important solution in terms of accurately assessing the severity of hazards after seismic liquefaction.The model results provide guidelines as to which sites should be prioritised, rather than dealing with all sites at which liquefaction has occurred, thus reducing the costs of disaster response.In future work, more historical data will be collected to update the conditional probability table and improve the BN model, especially historical data containing instances of small and medium lateral spreading, as there is a lack of such data in the present study.Additionally, utility and decision action nodes will be added to the BN model, enabling us to test how different actions will result in different hazards and different expected utilities of loss.The results may provide significant information for decision-making in terms of earthquake resistance and hazard reduction.
Therefore, evaluations of predictive capability based on Accuracy alone can be misleading when a class imbalance exists in a dataset.Indexes such as the Precision, Recall, and AUC of ROC should be used to further measure the performance of each class for a model or classifier.The Recall refers to the probability of detection of a class and measures the proportion of correctly predicted positive instances among all actual positive cases.If a classifier can achieve a higher Recall for a class, then it can detect more positive instances of the class.The Precision refers to the proportion of true positives among the instances predicted as positive for a single class, but cannot measure how the classifier detects the actual positive instances.A classifier with high Precision but lower Recall is less useful, because it cannot detect significant positive instances, especially in terms of risk assessment, where security and warning are major concerns.A good classifier should detect more positive instances and with relatively high prediction accuracy, have high Recall and acceptable Precision.The ROC curve is a graphical plot given by the false positive rate (the proportion of all negatives that still yield positive test outcomes) on the x-axis and the true positive rate or Recall on the y-axis, which can present an overly optimistic view of an algorithm's performance.The AUC of ROC is the area between the horizontal axis and the ROC curve, which is a comprehensive scalar value representing a classifier's expected performance.The AUC of ROC ranges from 0.5-1, with values closer to 1.0 indicating better precision.Therefore, the bigger the AUC of ROC value, the better the prediction performance of the classifier.

Figure 1 .
Figure 1.Increasing application of BN in risk analysis (update of Weber et al. 2012).

Figure 7 .
Figure 7. Ratios of all influence factors for the severe status of SLH.

Figure 8 .
Figure 8. Soil profile and estimate of settlement.

Figure 9 .
Figure 9. Assessment results of the severity of hazards induced by seismic liquefaction in the northeast area of Japan in the 2011 Tohoku earthquake.

2 Construction of a BN model for liquefaction-induced hazards
was used to train the 332 SPT data to obtain a conditional probability table for the BN model.EM was used as it is more robust than other algorithms and is suitable for datasets with many missing values.Briefly, the EM method is an iterative algorithm for determining the maximum likelihood estimation or maximum a posteriori estimation of parameters.A Bayesian net is iteratively applied to obtain a better one by conducting an expectation (E) step followed by a maximization (M) step until the algorithm has converged.In the E step, regular Bayesian net inference is used with the existing Bayesian net to compute the expected value of all the missing data, and then the M step finds the maximum likelihood Bayesian net, given the now extended data (e.g.original data plus expected values of missing data).Data from the 1999 Chi-Chi http://www.ces.clemson.edu/chichi/TW-LIQ/In-situ-Test.htm and http://peer.berkeley.edu/lifelines/research_projects/3A02/.Special 'small magnitude' data from the 1957

Table 1 .
Factors of liquefaction and its induced hazards and empirical modelling

Table 2 .
Grading standard for liquefaction and liquefaction-induced hazards.

Table 3 .
Description of the severity of liquefaction-induced hazards.There is a medium sand boil phenomenon, which has a short duration, small gushing quantity and small scale, the quantity of surface subsidence is less than 3% of the sand layer thickness that can cause structural damage, and tiny cracks in the ground occur, but there is no lateral spreading.SevereSerious liquefaction.There is a serious sand boil phenomenon, which has a long duration, large gushing quantity and wide scale, surface largely crazes, and lateral spreading and severe subsidence affect structures' services.The quantity of surface subsidence is more than 3% of the sand layer thickness.

Table 5 .
Posterior probabilities of partial output variables.

Table 7 .
Sensitivity analysis of seismic liquefaction-induced hazards.