Logistic regression applied to natural hazards : rare event logistic regression with replications

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, socalled rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.


Introduction
Natural hazards and risks are increasing, especially in developing countries (Alcántara-Ayala et al., 2006).Landslides are particularly affecting human occupation and socioeconomic development in mountainous areas of developing countries.Given the current population growth with increasing occupation of steep uplands, landslide risks are expected to increase in the future.Understanding the causal and controlling factors of landsliding is therefore important.It is known that extreme rainfalls, rapid snowmelt or seismic activities are the primary triggers of landslides (Brunetti et al., 2010;Tatard et al., 2010).Prediction of the timing of future landslide occurrence is rare, as landslide records often do not contain detailed information on the date of occurrence (Baum and Godt, 2010;Larsen and Torres-Sánchez, 1998).
Prediction of the areas that are particularly sensitive to landsliding through the development of stochastic or processbased susceptibility models has been the goal of extensive research (Brenning, 2005;Dai et al., 2002;Guzzetti et al., 2006;Komac, 2006).Various techniques have been used in the past to analyse landslide controlling factors (see overviews of Dai et al., 2002;Guzzetti et al., 1999;Huabin et al., 2005).Process-based susceptibility models often focus on the rheological parameters of the sliding mass, while stochastic models are mainly based on the biophysical site conditions.Frequently used stochastic techniques are discriminant analysis and regression techniques (Atkinson et al., 1998;Guzzetti et al., 2006).Logistic regression is particularly interesting for landsliding susceptibility analysis as it models the relationship between a dichotomous variable (presence/absence of landslide) and a set of independent biophysical site variables (controlling factors).This technique allows evaluating the probability of landslide occurrence and its significance, and has been widely used for landslide susceptibility mapping (e.g.Atkinson and Massari, 1998;Ayalew and Yamagishi, 2005;Dai andLee, 2002, 2003;Dai et al., 2001;Vanacker et al., 2003).
Logistic regression techniques have to be adapted to the specificities of landslide analysis, as landslides (like many other natural hazards) can be considered to be rare events (Demoulin and Chung, 2007).Rare events have occurrence frequencies that are low (Maalouf and Trafalis, 2011), with the number of events in the dataset dozens to thousands times smaller than the number of non-events.King and Zeng (2001a, b) have shown that rare events are difficult to predict as the standard application of logistic regression techniques can sharply underestimate the probability for rare events.Rare-event logistic regression was proposed by King and Zeng (2001a, b) to correct this bias by (i) an endogenous stratified sampling of the dataset, (ii) a prior correction of the intercept and (iii) a correction of the probabilities to include the estimation uncertainty.
In this paper, we first evaluate the use of probabilistic approaches to detect landslide controlling factors.Then, we build some concepts from probabilistic theory into rare event logistic regression analysis.This technique, called rare event logistic regression with replications allows overcoming some of the limitations of previous developments, and offers a robust variable selection.We apply this technique here for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

Landslide occurrence in Llavircay, as a case study
The Llavircay catchment was selected for the development of the rare event logistic regression technique with replications.The study area of 24 km 2 is located in the tropical Andes (Fig. 1), and is subjected to a warm and humid tropical climate (Winckell et al., 1997).The mean annual precipitation is about 1330 mm, the average temperature is 10 • C and the atmospheric humidity is high, 87 % on average (Acotecnic, 2006;INAMHI, 2008).The elevation varies from 2017 m to 3736 m and slopes reach up to 55 • .With a mean slope angle of 26 • , the topography can be considered as very rough.About one third of the area has slope angles that are above the mean angle of internal friction (estimated at 30 • according to Basabe, 1998).Landslides and creep are abundantly present in the area.Inventories of mass movements created from aerial photo interpretation and field campaigns revealed 206 landslides (reactivation excluded) between 1973 and 2010.They are mainly earth slides (translational slides) and earth slumps (rotational slides) according to the classification of Varnes (Summerfield, 1991).
Land cover change  was documented using four sets of archived aerial photographs for the time period 1963-1995, complemented with a field survey in 2010.Because of significant differences in quality and scale, between and within the aerial photographs, the land cover classification was performed manually using a WILD stereoscope.Six land cover classes were identified: (i) dense forest; (ii) degraded forest, as a result of selective logging (Sierra and Stallings, 1998); (iii) bushes, as a result of natural regeneration or so called matoral in Ecuador; (iv) pasture with sporadic trees; (v) pasture; and (vi) subpáramo and páramo corresponding to the natural shrub and grassland found at high altitudes in the Andes (Luteyn, 1999).More details on the land cover classification are given in Vanacker et al. (2000).Based on the time series of land cover data from 1963-2010, land cover change trajectories were created (Fig. 1).Land cover changed rapidly in this area, with half of the primary forest disappearing since 1963.In 2010, about half of the catchment was covered by trees, a quarter by páramo, subpáramo and bushes, and a quarter of the area is covered by pastures.

Database creation
Potential anthropogenic and biophysical explanatory variables of landslide occurrence have been selected based on literature and data availability.The following explanatory variables were included in the analyses: slope, distance to watercourse, distance to path, curvature and different trajectories of land cover change.The first three variables are quantitative, while the two last ones are qualitative variables composed of respectively three and five classes (Table 1).All our data are spaced in maps in a grid-cell mapping unit, as it is very common nowadays with GIS utilisation (Guzzetti et al., 1999).The GIS grid-data has been transformed into a matrix format: an attribute table in which the lines correspond with the 20 m resolution pixels of the catchment and the columns with the 11 potential explanatory variables.A similar attribute table was made for the landslide inventories.In order to avoid auto-correlation, we represented every landslide by one grid-cell (pixel) located in the centre of the shear plane.For the logistic regressions, one matrix was established including the matrix of GIS grid-data and spatial information on the observed landslide occurrence in the catchment.For all grid-cells, the value of a dichotomous dependent variable landslide indicates the presence (landslide = 1) or absence (landslide = 0) of a landslide.The matrices were imported in R software for the probabilistic and statistical analyses.

Probabilistic approach based on Monte Carlo methods
Probabilistic approaches are useful in landslide analyses, as they allow determining the probabilities of sliding for different biophysical and anthropogenic site conditions.The basic principle that is behind these analyses is the hypothesis that landslides have specific site characteristics that differ from the overall environmental setting of the area.A statistical nonparametric test that is commonly used to compare differences between groups is the Wilcoxon rank-sum test also called Mann-Whitney U test (Crawley, 2005).Such test works with ordinal data and with groups of more or less similar size.In our case, some explanatory variables, such as land cover trajectories, are nominal.Moreover, the group "event" (presence of landslide) is much smaller than the group "non-event" (absence of landslide).A comparison of the two groups based on their distribution is thus not appropriate.So we apply a probabilistic approach based on Monte Carlo methods (Sawilowsky, 2003;Vanacker et al., 2001).The main idea is to test if significant associations exist between explanatory variables and the location of landslides by comparing our landslide sample to a bundle of randomly selected samples in the study area.The probability of having a sample with a given distribution of observations over each class of a qualitative variable (or with a median value for a quantitative value) is given by its exceedance probability.So, if we randomly selected enough samples to approximate the reality in the catchment (according to Monte Carlo methods, Sawilowsky, 2003), a plot of their exceedance probability against any potential explanatory variable will give a curve that represents the distribution of randomly selected samples in the study area (Fig. 2).Thus, for every variable, the exceedance probability of the landslide sample can be derived from the plot and be compared with a given significance level.If the exceedence probability lies outside the probabilities for the confidence interval, we can conclude that the distribution of landslides over this explanatory variable is not random (Vanacker et al., 2001).
The first stage is to create the exceedance probability curve of the explanatory variable analysed.For a given year Y and an explanatory variable X, a simulation is composed of k samples of N randomly selected points.k is the number of samples needed to obtain a stable empirical probability distribution of the population, and equals 1000 according to our sensibility analysis for the Llavircay case-study (Fig. 3).N is the number of landslides observed in year Y .Samples are considered to be independent, as each sample contains less than 0.002 % of the entire population.Note that the explanatory variable X can be quantitative (e.g.slope) or qualitative (e.g.land cover trajectory).Code was written for R software to automate the procedure for simulations, and consists of the following steps: (1) import the matrix with the anthropogenic and biophysical explanatory variables for all grid cells in the catchment, (2) randomly select from this matrix N points, (3) calculate the median value of the explanatory variable X for the N points (if the variable is quantitative) or the frequency of each class (if the variable is qualitative), (4) repeat steps 2 and 3, k times, (5) summarise in a table the k median values (or class frequencies) obtained in step 3 and rank them in an ascending order, (6) calculate the exceedance probability of the k median values or class frequencies as follows: where F X (X j ) is the cumulative density function of X, k is the number of samples created for the simulation, X j is a given sample of the population, and j is the rank number after ordering the randomly selected samples, (7) plot the exceedance probabilities calculated in step 6 against the ranked sampled values calculated in step 5 (an example of such a plot can be seen in Fig. 2).
The second stage is to see if landslides are randomly distributed over the explanatory variable X.We derive the exceedance probability of the N landslides observed in year Y by transferring on the plot created in step 7 the median value (or class frequency) of the landslide inventory for variable X.We compare this exceedance probability with a given significance level.As frequently used in literature, the critical value (also called p-value) is here fixed at 5 %.If the exceedence probability lies outside the probabilities for the confidence interval, we can conclude that the distribution of landslides over this explanatory variable is not random (Fig. 2).We can thus assume that the explanatory variable X could be a controlling factor of landslide occurrence.This procedure is repeated for every explanatory variable X and every year Y .

Ordinary rare event logistic regression
Logistic regression is commonly used to analyse the dependency of a dichotomous variable, here landslide presence/absence, on a set of explanatory variables (Atkinson and Massari, 1998;Vanacker et al., 2003).The ordinary logistic model can be written as Eq. ( 2) (Kleinbaum and Klein, 2010): where p i denotes, in our case, the probability of an event as a function of m independent variables X and i ranges from 1 to m.The terms α and β are unknown parameters that are estimated from the data by the maximum likelihood method.This equation is often linearized by a logit transformation, the natural logarithm of the odd which is the ratio of the probability of events divided by the probability of non-events.The logit form of the model can be expressed as Eq. ( 3) (Kleinbaum and Klein, 2010): In the case of natural hazards, the total number of gridcells that are affected by an event (such as landslide occurrence) is often much smaller than the total number of gridcells in the study area.It is common that less than 1 % of the study area is affected by a natural hazard event.King and Zeng (2001a, b) have shown that ordinary logistic regression strongly underestimates the probability of occurrence of rare events.They developed for political sciences an adapted version of the logistic regression technique, so-called "rare event logistic regression", that includes three corrections measures for rare event data.They first recommended the utilisation of a choice-based (or case-control) sampling design based on endogenous stratified sampling (Ramalho, 2002).It consists of taking all the events (1 s) and a random selection of the non events (0 s).The proportion of events to non events is often set at one to ten (Beguería, 2006a).The use of choicebased sampling designs might significantly bias the estimation of the intercept term α.Therefore, a prior correction is needed to avoid sampling bias (King and Zeng, 2001a).The corrected intercept term, α 0 , is calculated based on the intercept estimate, α, and the fraction of 1 s in the population, τ , and the fraction of 1 s in the sample, γ as in Eq. ( 4): The second adaptation aims to correct for the underestimation of the probabilities when using the corrected intercept α 0 in Eq. (3).A correction factor C i is thus added to the estimated probability pi (Eq.5): For each observation, C i can be calculated from Eq. ( 6) (King and Zeng, 2001a;Van Den Eeckhaut et al., 2006): where pi is the event probability estimated using the biascorrected coefficient α 0 , X 0 is a 1 × (m+1) vector of values for each explanatory variable, X 0 is the transpose of X 0 and V (β) is the variance-covariance matrix.
The rare-event logistic regression was first applied in landslide susceptibility analysis by Van Den Eeckhaut et al. (2006).To our knowledge, this method has been applied in natural hazard analyses since then only by Bai et al. (2011) and Vanwalleghem et al. (2008).We slightly modified the methodological description of Van Den Eeckhaut et al. (2006) and automated the statistical procedure entirely in R software.For the endogenous stratified sampling, a proportion of 1:10 for the ratio of events to non events was used following Beguería (2006a).To avoid multi-collinearity among the independant variables, we calculated the Variation Inflation (VIF) and Tolerance (TOL) factors.All explanatory variables with a VIF > 2 and TOL < 0.4 were excluded from the stepwise logistic regression (Allison, 2001).From this selection, only the explanatory variables that significantly explain the landslide distribution pattern (at a significance level of 0.05) were included in the rare event logistic regression.The "relogit" function from the R package Zelig (Imai et al., 2009) was used to implement the rare event logistic regression.

Rare event logistic regression with replications
Rare event logistic regression with replications combines the strength of probabilistic and statistical methods.It is based on the statistical method of rare-event logistic regression (King and Zeng, 2001a;Van Den Eeckhaut et al., 2006), but it includes probabilistic techniques to estimate the robustness of the regression estimates (Beguería, 2006a).The main idea is to average the results of 50 replications of an ordinary rare event logistic regression made with 50 different endogenous stratified samples.A similar methodological step has been used in Van Den Eeckhaut et al. (2009) for improving the model reliability of a discriminant analysis.We could also see a resemblance with the bootstrapping aggregation (bagging) method (Breiman, 1996) even thought, in our case, we do not resample with replacement using the obtained sample of the population as a basis.
In our approach, we create new sub-samples of non-events (0 s) using the entire population as a basis.In this study, we select 50 sub-samples as a trade-off between model reliability and computational time (Andresen, 2009).This conforms to previous geomorphic studies (see for example Beguería, 2006a;Davis and Keller, 1997;Van Den Eeckhaut et al., 2009).In this case-study, for each of the 50 endogenous stratified samples, 10 N points of non-events (0 s) were randomly selected from the population (matrix of grid-cells) and joined to the N events.The procedure was automated in R software.
The first steps of this method are similar to the ordinary rare event logistic regression technique described above; and include the selection of explanatory variables based on collinearity criteria and significance level.The ordinary rare event logistic regression was repeated 50 times with the different samples.For the final results, only variables with a pvalue of 0.05 and present in at least 5 replications (10 %) are kept (following Beguería, 2006a).The final regression equation for landslide susceptibility is based on the explanatory variables that are robustly detected, and the regression parameters estimates are calculated as the average from the q parameter estimates from the repeated rare event logistic regressions (q being the number of replications for which the variables were significant).

Validation of the landslide susceptibility analyses
By definition, model validation allows assessing the accuracy and prediction power of a predictive model.It also allows comparing the performance of various models (Beguería, 2006b).Multivariate statistical models are frequently used for landslide susceptibility analyses, and a classification threshold or so-called cut-off value is often selected to classify the landslide susceptibility and assess hazards.The selection of the cut-off value is not straightforward, and different methodologies actually exist (Beguería, 2006b;Greiner et al., 2000).Receiver-operating characteristic plots (ROC plots) are an alternative solution to evaluate the model performance, as ROC plots contain information on the different model accuracies for a range of possible threshold values (Beguería, 2006b).They are constructed based on two statistical evaluation criteria that are not relying on the prevalence of events (1 s) in the sample: (i) the sensitivity (true-positive fraction) and (ii) the specificity (false-positive fraction) (Beguería, 2006b).The area-under-ROC (AUC) statistic allows evaluating the model's performance independently of a determined threshold value (Beguería, 2006b) so it gives rapidly an overall idea of the model goodness of fit.

Probabilistic approach using Monte Carlo simulations
The results for the Monte Carlo simulations are shown in Table 2 that gives the exceedance probabilities for 4 landslide inventories for the 11 explanatory variables.From Table 2, it is clear that the spatial distribution of the landslides is not random, and that systematic association with morphological and anthropogenic factors occurs.Landslides are significantly associated with steep slopes (exceedance probability ≤ 0.003), and tend to cluster close to paths and watercourses (Table 2).In Llavircay, plan curvature is not significantly associated with the landslide pattern.Land cover change trajectories significantly control the landslide pattern (Table 2): landslides are significantly rare where tree cover is present (such as the trajectories no change (forest -páramo) and forest degradation), but are significantly overrepresented in pastures (conversion to pasture or no change (pasture)) and area with strong inter-annual changes in vegetation cover (others).
Even though the sample size is sometimes small (n = 35 for the 1983 landslide inventory), this probabilistic approach is able to identify the explanatory variables that are significantly associated with the landslide pattern.Besides, this approach is widely applicable as it does not require any a priori distribution of the independent or dependent variables.The major drawback of this univariate probabilistic approach is the lack of information on the relative influence of the different explanatory variables on the landslide pattern.Besides, it is not possible to account for multi-collinearity, which makes it difficult to use the results of the Monte Carlo simulations directly as an input for landslide susceptibility maps.

Ordinary rare event logistic regression
All explanatory variables were included in the logistic regression, as no multi-collinearity was detected based on the VIF and TOL values.Variables that are significant at 5 % were included in the rare event logistic regression.Results can be written in the form of Eq. ( 7) where p i is the probability of landslide occurrence, here based on the landslide inventory of 1995 as an example (Table 3, Trial 1): Most of the shortcomings of the probabilistic methods can be solved with the rare event logistic regression.This multivariate analysis can combine a wide range of explanatory variables into one statistical analysis.The coefficients of the logistic regression allow to predict a logit transformation of event's presence probability and to create susceptibility maps (Van Den Eeckhaut et al., 2006;Vanwalleghem et al., 2008).Moreover, it is possible to determine the most important controlling factors by multiplying the maximum value of the variable in the dataset (MPV) with its regression coefficient (Vanwalleghem et al., 2008).This measure of parameter importance (MPI) indicates that distance to path and slope are the most important variables for predicting the landslide occurrence in 1995 (Table 3, Trial 1).
The major drawback of the rare event logistic regression is the dependency of the results on the endogenous stratified sampling.In Table 3, we give the outcome of the rare event logistic regression for landslide prediction based on the landslide inventory of 1995 for three different random samples of non-events (Table 3, Trials 1 to 3).When comparing the regression coefficient estimates, their standard errors and significance levels for the explanatory variables, we observe clear differences between the three predictive models.This example shows that ordinary rare event logistic regression can be strongly sample-dependent, and does not always lead to a stable detection of the controlling variables (Table 3).This sample dependence was also highlighted by Demoulin and Chung (2007).

Rare event logistic regression with replications
The predictive models for landslide occurrence based on the landslide inventories of four different years are resumed in Table 4.These results are obtained using the rare event logistic regression technique with replications (here 50 replications).The column "count" indicates the percentage of replications in which the explanatory variable was significant; and, thus, the number of values used for averaging the regression parameters (100 % = 50 replications).Table 4 clearly shows that many explanatory factors do not appear in every replication, although the fact that they are highly significant.For example, the trajectory no change (pasture) is present in only 18 % of the replications in 1995, although the variable is significant at 0.01.In the case of an ordinary rare event logistic regression, it would have been very likely that this variable was not included in the final regression model.
All studied years confounded, six out of the eleven potential explanatory variables were identified as being significant: slope, distance to watercourse, distance to path, conversion to pasture, no change (pasture) and the trajectory others  4).Only three variables are systematically and significantly associated with the landslide pattern for all years: the two land cover trajectories that are directly linked with pastures and the slope gradient.The variables distance to path and the trajectory others are present 3 out of 4 times.The variable distance to watercourse is only present in 2010.The number of significant explanatory variables is increasing with time, which might be due to the fact that the land cover heterogeneity increases with time (Fig. 1).As both the number and the spatial repartition of landslides change with time (Fig. 1), it is not abnormal to find changes in the explanatory variables with time (Table 4).
An analysis of the influence of each explanatory variable on the landslide probability, MPI (Vanwalleghem et al., 2008), indicates that the relative importance of the different controlling factors changes with time (Fig. 4).The most obvious change is observed for the topographical factor slope.For the early years 1973 and 1983, the slope gradient was detected as the most important controlling factor for landslide occurrence.In 2010, the slope gradient is only marginally important for explaining the spatial distribution of landslides within the catchment, while the anthropogenic variables such as the land cover trajectories that are linked with human disturbance are ranked higher in terms of variable influence (Fig. 4).The fact that two explanatory variables (distance to watercourse and path) have an exceptionally high variable influence in 2010 might be linked to data collection bias, as the 2010 landslide repertory is based on fieldwork (in contrast to the landslide inventories of 1973, 1983 and 1995 that are based on aerial photographs).This might explain why more landslides were observed in 2010 close to paths and watercourses, which are the two most important accessibility corridors in this remote area.

Validation of the landslide susceptibility analyses
For model validation, various methodologies exist to select the validation data (Chung and Fabbri, 2003).As the number of landslides in our database is small, we decided not to split our datasets in a calibration and validation set.Instead, we used the landslide inventories from the closest time period to evaluate the performance of the predictive models.As the landslide controlling factors are slightly changing through time, we hypothesise that the use of the landslide inventory of the closest time period has only a minor influence on the model evaluation.
In Fig. 5, we present the results of the evaluation of the four predictive landslide susceptibility models based on the landslide inventory of 1995.It includes the ordinary rare event logistic regressions based on three different random samples of non-events (see Table 3, Trials 1 to 3), and the rare event logistic regression with replications (see Table 4, year 1995).The AUC for all four models varies between 0.79 and 0.82 (Fig. 5), and we can consider that the predictive models are moderately accurate according to the arbitrary guideline of Swets (1988).As observed from the ordinary rare event logistic regression, the model performance of the three predictive models is sample-dependent (Fig. 5).The ROC and AUC vary between the three replications of the ordinary rare event logistic regression.The performance of the rare event logistic regression with replication is not significantly better than the ordinary rare event logistic regression models, but a conceptual improvement is made on the identification of the landsliding controlling factors.

Conclusions
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events.This specificity of natural hazards was only taken into account recently by adapting the ordinary logistic regression techniques for the analysis of rare events.This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling variables as the results can be strongly sample-dependent.
In this study, we developed a modified version of the rare event logistic regression technique.Our so-called rare event logistic regression with replications builds some concepts from probabilistic theory into rare event logistic regression analysis.It is based on the statistical method of rare-event logistic regression, but it includes Monte Carlo simulations to estimate the robustness of the regression estimates.The use of replications in the rare event logistic regressions allows avoiding instability of the results due to sampling bias.Our results demonstrate that rare event logistic regression with replications has a similar modelling quality as the ordinary rare event logistic regression techniques.It allows having a more robust selection of factors that are significant for explaining the spatial variation in the occurrence of natural hazards.This new technique was here developed for landslide spatial pattern analyses, but the concept is widely applicable for statistical analyses of natural hazards.

Fig. 2 .
Fig. 2. Plot of the exceedance probability against theoretical ranked randomly sampled values with a confidence interval of 95.

Fig. 4 .
Fig. 4. Evolution of the most important controlling variables through time.For the variables with * we took the absolute value of measure parameter importance (MPI).

Fig. 5 .
Fig. 5. ROC plot and AUC of validation datasets (n = 19 250) for the different rare event logistic regressions (details of them in Tables 3 and 4 -1995).

Table 1 .
Set of anthropogenic and biophysical variables included in the probabilistic and statistical analyses.
Fig. 3.According to the Monte Carlo principle, a sufficient number of randomly selected samples are needed to approximate correctly the true population (Llavircay catchment in this case).A simulation with 1000 samples provides a stable empirical approximation.

Table 2 .
Exceedance probabilities of the landslide inventoriesof 1973, 1983, 1995 and 2010for the 11 potential explanatory variables.The values that are significant at 5 % are highlighted in bold.

Table 3 .
Rare event logistic regression with landslide sample of 1995 for three trials, each of them having a different sample of non event; logistic coefficient (β); standard error on β (S.E); Wald statistic; variable significance (Pr(> |z|)); Odd ratio; maximum value of explanatory variable in dataset (MPV); measure of parameter importance (MPI).

Table 4 .
Summary of the rare event logistic regression with replications showing, for every year, percentage of replications in which the variable entered (count), logistic coefficient (β) and its standard error (S.E), Wald statistic, variable significance (Pr(> |z|)), Odd ratio, maximum value of explanatory variable in dataset (MPV) and measure of parameter importance (MPI).