Proportional loss functions for debris flow events

. Quantitative risk assessments of debris ﬂows and other hydrogeological hazards require the analyst to predict damage potentials. A common way to do so is by use of proportional loss functions. In this paper, we analyze a uniquely rich dataset of 132 buildings that were damaged in one of ﬁve large debris ﬂow events in Switzerland. Using the double generalized linear model, we estimate proportional loss functions that may be used for various prediction purposes including hazard mapping, landscape planning, and insurance pricing. Unlike earlier analyses, we control for confounding ef-fects of building characteristics, site speciﬁcs, and process in-tensities as well as for overdispersion in the data. Our results suggest that process intensity parameters are the most meaningful predictors of proportional loss sizes. Cross-validation tests suggest that the mean absolute prediction errors of our models are in the range of 11 %, underpinning the accurate-ness of the approach.


Introduction
Landslides are a major hydrogeological hazard in mountain ranges all over the globe. Debris flows are a particularly destructive form of fast-moving landslides. Over the past 25 yr, debris flows have killed more than 200 people and caused more than C5 billion in economic damages throughout the European Alps (rough estimate based on Guzzetti et al., 2005;Fuchs, 2009;Hilker et al., 2009). In Switzerland alone, debris flows have caused direct economic damages of C340 million, which equals about 5 % of the total amount of losses from hydrological hazards during this period (Hilker et al., 2009).
It is therefore not surprising that considerable effort has been made to understand the physical processes of debris flows (Iverson, 1997;Major and Pierson, 1992;Rickenmann, 1999;Sosio and Crosta, 2009). Less emphasis has been given to damages caused by debris flows. The quantification of potential losses is, however, a building block of any quantitative risk assessment (Bonachea et al., 2009;Bründl et al., 2009), and serves as the foundation for the planning of mitigation measures (Dai et al., 2002;Romang et al., 2003;Margreth and Romang, 2010).
As the size and force of a debris flow is a priori unknown, quantitative risk assessments are based on predictions of damages that could occur under different hazard scenarios. The term physical vulnerability is commonly used to refer to the proportional loss (also known as relative damage) that an element at risk faces as result of a specific hazard impact (Fuchs, 2009;Uzielli et al., 2008;Hollenstein, 2005). This impact is determined by two factors. One is the intensity of the hazard, which may be defined as "a set of spatially distributed parameters describing the destructiveness" of the hazard process (Hungr, 1997). The other is the susceptibility of the element at risk, which describes the propensity of a building or other infrastructure to suffer damage from a specific hazard impact (Uzielli et al., 2008).
Empirical analyses that relate damages from debris flows and other landslide processes to intensity and susceptibility factors are still rare; see Totschnig and Fuchs (2013) for a recent review of the literature. Most existing studies use either descriptive methods (e.g., Bell and Glade, 2004) or ordinary least-square (OLS) regression (e.g., Fuchs et al., 2007) to explain the proportional loss y = d/v, defined as the ratio of observed damage d to total value v.
While of practical value, there are some drawbacks with these conventional approaches. The descriptive approach does not make full use of the available information and allows only for qualitative damage predictions. OLS regression can incorporate intensity and susceptibility factors, but its applicability is limited by the fact that regression estimates are unconstrained, allowing for loss estimates to take on implausibly large or small values.
Since the proportional loss is by construction a relative measure, valid models must provide estimates of y restricted to the [0, 1] interval. One can constrain OLS to this range, but the resulting censored linear model bears some unattractive features (Kieschnick and McCullough, 2003;Fox, 2008). It is inherently unstable because it critically depends on the values at which minimal and maximal loss is predicted, it is inexpedient as soon as more than one predictor variable is included, the abrupt truncation of the linear slope of y is unrealistic as one would expect a smooth rather than a kinked relationship between the proportional loss and the predictor variables, and its error variance is most likely not constant (i.e., heteroscedastic) because regressions involving data from the unit interval typically display more variation around the mean than at the limits of the [0, 1] interval.
Generalized linear models (GLMs) provide a natural framework for the estimation of proportional data (De Jong and Heller, 2008). Geoscientists have previously made use of this property to predict the occurrence probability of debris flow events (Ohlmacher and Davis, 2003;Sepulveda and Padilla, 2008;Griffiths et al., 2004). Yet apart from recent studies by Totschnig et al. (2011) and Totschnig and Fuchs (2013), GLMs have to the best of our knowledge not been used to analyze proportional loss data of debris flow events.
Moreover, there is a caveat to the application of standard GLMs such as logistic or probit regression to proportional loss data. GLMs typically assume that the error variance is constant across observations. This assumption is likely to be violated whenever damages are spatially or temporally correlated. Ignoring the non-constant nature of the error variances may result in a misrepresentation of the true dispersion and erroneous inferences about the parameters of interest (De Jong and Heller, 2008).
In this paper we use the double generalized linear model (DGLM) introduced by Smyth (1989) to explore a unique dataset comprising insurance values of 132 buildings. These buildings were damaged by five debris flow events, which occurred in Switzerland during the late 1980s and early 2000s. While the DGLM provides the same flexibility as standard GLMs, it adjusts for the overdispersion typically observed in proportional losses, and hence lends itself for the statistical analysis of our dataset.
The remainder of the paper is structured as follows. Section 2 describes our empirical strategy, our dataset, and the statistical approach in more detail. Section 3 presents regression results and sensitivity tests that assess the prediction accuracy of the estimated proportional loss functions.
In Sect. 4, we sum up with some remarks on the usefulness of our approach for the practical management of debris flow hazards.

Conceptual framework
The conceptual framework of our empirical analysis of physical vulnerability toward debris flow events is illustrated in Fig. 1. Based on Dai et al. (2002), we identified three building blocks for the statistical analysis of debris flow risk: (1) the runout area, (2) the process intensity, and (3) the elements at risk. In the empirical analysis, we seek to explain observed proportional losses by predictor variables characterizing the three building blocks.
In the empirical analysis, we define the runout area by its roughness and its slope steepness. We characterize the process intensity by the impact angle between the tangent to the flow path of the debris flow at the point of impact and the plane tangent to the struck surface at the point of impact, the flow depth and velocity, and their interaction, which is widely accepted as an intensity indicator (Kreibich et al., 2009). Flow depth and velocity are used not only as criteria for distinguishing intensity zones for hazard mapping of debris flows (Loat and Petrascheck, 1997) but also in risk analyses in Switzerland (Bründl et al., 2009), which was the main argument for preferring them to other approaches such as, e.g., the velocity head used only for comparison reasons in this paper (see Sect. 2.2). Elements at risk are described by their type (concrete, lightweight, or mixed construction), purpose (agricultural, industrial, or residential use), and the presence or absence of object-specific protection measures such as reinforced walls or sheet pilings. Below, the data collection process is described in more detail.

Data
We focused on five major debris flow events that had occurred in the Swiss Alps during the late 1980s and early 2000s. According to the size classification by Jakob (2005) these events all belonged to class 4 debris flows with total volumes ranging from 10 000 to 75 000 m 3 . They caused damages with a total amount of C25 million on 455 buildings corresponding to average damages of roughly C50 000 per affected building. Of these 455 buildings, 132 were eventually selected for analysis. Data selection was based on the availability of sufficiently accurate documentation of the runout area, the process intensity, and the elements at risk (including their total values and the observed damages). Damage observations were split approximately evenly across hazard sites. Figure 2 details out the geographical distribution of the debris flow events and the analyzed damages.
In recent years debris flows in Switzerland have been registered in a central database (Hilker et al., 2009)  the event descriptions in this database are insufficient for object-specific statistical analyses. Therefore, we had to improve the data quality (see Romang, 2004;Kimmerle, 2002).
We assessed type and purpose of each of the buildings based on field observations and interviews with affected property owners and other witnesses of the debris flow events. Absolute and relative damages were ascertained based on insurance reports of actuarial building values and the costs of repair or replacement. The process intensity of the analyzed debris flow events was reconstructed from back modelingCE1 with FLO-2D (O'Brien et al., 1993) and back calculations of the flow velocity, flow depth, and impact pressures (Hungr et al., 1984). Information about the runout topography, roughness, and inclination was collected directly in the field. Table 1 gives a descriptive summary of the compiled dataset. Data quality is accurate for all parameters that could be directly observed or measured. The intensity parameters could, however, not be measured directly and their quantification is therefore fraught with some uncertainty. We reduced this uncertainty by combining back calculations, engineering rules of thumb, and eyewitness reports (Romang, 2004). In particular, we could reconstruct in many cases the maximum depo-  [-] 0.25 0.31 0 1 5-10 % inclination [-] 0.36 0 1 More than 10 % inclination [-] 0.23 0 1 sition height (H max D ) based on marks left on exterior walls. According to the Bernoulli principle, H max D equals the velocity head V 2 /(2 · g) (with g = 9.81 m s 2 TS1 ). By solving the equation for V = p 2 · g · H max D we obtained an estimate of the flow velocity, which we then compared to the flow velocity estimated based on the procedure described in Egli (2005). If the discrepancy between both estimates was larger than 1 m s 1 , we reassessed the flow velocity.
The outlined data validation process significantly improved the data quality. Yet we emphasize that in the aftermath of an event a clear distinction between debris flows and hyper-concentrated flows is not always possible because different hazard processes may coincide. For example, the runout area might first be hit by a debris flow surge and subsequently flooded and eroded again (Pierson, 2005). We are unable to differentiate between such transitional forms of flowing and flooding processes. For the purpose of our anal-ysis, this limitation seems acceptable. Totschnig and Fuchs (2013) provide further support for this assumption.

Statistical approach
The statistical modeling of proportional loss functions in the standard GLM framework is based upon three components (Fox, 2008): (1) the loss function, ⌘ i = X i ⇥ , where X i TS2 is a linear combination of predictor variables for object i, and is the corresponding vector of coefficients; (2) a probability distribution of proportional loss from the exponential family; and (3) an invertible link function g that maps the expected proportional loss E[y i ] = µ i onto the predictors so that g(µ i ) = ⌘ i .
Specific link functions have emerged for the analysis of proportional loss data (De Jong and Heller, 2008). The canonical link is the logit where v i and d i denote the total value of and the damage to object i both measured in units of k C. Since the log odds ratio approaches negative infinity as y i approaches 0, and moves toward positive infinity as y i approaches 1, loss predictions are restricted to the [0, 1] interval.
The logit link gives rise to the standard logistic regression model which can be evaluated by maximum likelihood methods. Applied to polytomous data, the model assumes that the proportional loss y i observed at object i is binomially distributed: y i ⇠ B(v i ,d i )8i 2 N , where N is the set of all observations. This implies that each object at risk can be divided into n i = v i /kCequal units, which the model treats as if they were independent Bernoulli trials with probability This distributional assumption becomes unrealistic when the damage potential is spatially or temporally correlated. A simple example is a debris flow hitting a residential building of total value v i , but only the ground floor is affected. Now, assume that the first floor bears less (or more) valuables than the ground floor. In this case, the dispersion across the observed units of proportional loss is larger than warranted by the constant error variance assumption of the logistic regression model: Var[y i ] = y i (1 y i ), where is a dispersion parameter common to all observations. The DGLM generalizes the above error variance by inclusion of an observation-specific dispersion parameter i , which is contingent on some predictor variables to account for systematic variations in the error variance (Smyth, 1989). The observation-specific dispersion parameter can be expressed as h( i ) = Z i ⇥ , where Z i is a vector of predictors and is the corresponding coefficient vector. (Note that Z i can be a perfect subset of X i . Identification is warranted as long as comprises a freely estimable intercept.) Please note the remarks at the end of the manuscript. Syst. Sci., 13, 1-10 The joint estimation of the dispersion submodel h( i ) and the loss submodel g(µ i ) increases the precision in the coefficient estimates of interest because observations with larger variation are penalized (i.e., they are down-weighted in the likelihood maximization). Standard errors for the predictors of both the loss and the dispersion submodels are calculated from the inverse Fisher-information matrix and Wald tests can be used to infer the significance level of a specific predictor (Smyth, 1989). The interpretation of the coefficient estimates from the DGLM with logit link is therefore identical to the interpretation of those obtained from a standard logistic regression model.

Nat. Hazards Earth
A virtue of this class of models is the ease with which coefficient estimates can be converted to and interpreted as risk ratios (De Jong and Heller, 2008). Risk ratios are particularly insightful to compare the vulnerability of different types of buildings. In general terms, the risk ratio is defined as where 0 x is the coefficient on the characteristic of interest x 0 . E ⇥ y|X 0 ,x 0 = 1 ⇤ and E ⇥ y|X 0 ,x 0 = 0 ⇤ denote the expected proportional loss for buildings that do (x 0 = 1) and do not have (x 0 = 0) this characteristic, keeping all other predictors X 0 fixed at sample means.
A common assumption is that the log-transformed risk ratio is normally distributed: log RR[x 0 ] ⇠ N µ, 2 . The mean is equal to the log of the expected risk ratio: µ = log E ⇥ y|x 0 = 1,X 0 ⇤ /E ⇥ y|x 0 = 0,X 0 ⇤ . Using the delta method (Fox, 2008), the variance can be approximated by 3 Results

Regression models
We present coefficient estimates for three specifications of the DGLM with logit link function. These specifications differ with respect to the predictor variables included in the loss and dispersion functions.

Model I TS3
The first model specification explains observed proportional losses solely through intensity predictors. The dispersion function includes a global constant and site-specific constants that control for differences in dispersion across debris flow sites. The loss function includes the impact angle (A), flow depth (D), flow velocity (V ), and their interaction (D · V ), which we take as an intensity indicator. We use log transformations to reduce skew in the distribution of these predictors. Note that log D · log V is a non-linear transformation of log D · V 2 , which others have proposed as a surrogate measure of impact force (Jakob et al., 2012;Kreibich et al., 2009). Regression results reported in Table 2 indicate that the expected loss is significantly determined by the four intensity predictors. The coefficient on the logged impact angle is positive, but the effect of a larger impact angle on the loss function is negligibly small. To see this, consider two impact angles A 1 and A 2 = 2 · A 1 . Since the predictor is log-transformed, the effect of doubling the impact angle while keeping everything else constant is 0.224 · (log A 2 log A 1 ) = 0.224 · log A 2 /A 1 ⇡ 0.155. In other words, the loss function increases by only 0.155 units for every doubling of the impact angle.
Coefficients on the flow depth and velocity are both negative, but the coefficient on the interaction term is positive. This implies that proportional losses grow with increasing flow depth (@y/@D > 0) and velocity (@y/@V > 0), but the growth rate is marginally decreasing (@ 2 y/@D 2 < 0, @ 2 y/@V 2 < 0). In Fig. 3, we visualize this relationship by plotting the proportional losses predicted from Model I for debris flow depths and velocities typically assumed in the design of local protection strategies (Egli, 2005). The reader is warranted that D and V are positively correlated (Pearson's rho = 0.26, P < 0.01), so that combinations with low (high) velocities and large (small) depths are unlikely to be observed in reality. Therefore, the interpretation of model predictions for such out-of-bounds combinations is meaningless.
The significant intercept of the dispersion function underlines that there is overdispersion in our data. Moreover, the dispersion varies across hazard sites and is significantly lower for the Chummerbach event, suggesting more homogenous damages at this hazard site.

Model II
The second model specification adds site-specific and objectspecific attributes to the loss function. A likelihood-ratio test confirms that the model fit is significantly improved by the inclusion of the additional predictors ( 2 9df : 15.8, P < 0.05). No differences are observed across the five hazard sites. With the exception of the insignificant coefficient on the impact angle, coefficient estimates on the intensity predictors are Please note the remarks at the end of the manuscript.
www.nat-hazards-earth-syst-sci.net/13/1/2013/ Nat. Hazards Earth Syst. Sci., 13, 1-10, 2013 similar to those obtained with Model I. This suggests sitespecific factors in Model II do not have a strong confounding relationship with the process intensity but help to explain variability in the data. Coefficient estimates on object-specific characteristics imply that proportional losses at industrial buildings are significantly larger than those observed for residential buildings, while proportional losses at agricultural buildings are statistically not different from those observed for residential buildings. Concrete buildings are significantly less vulnerable to debris flow damages, as are buildings that had local protection measures installed.
With Eq.
(1) we can study how much the construction type, the usage purpose, or the implementation of local protection measures reduces or increases the expected proportional loss. As an illustrative example, consider two buildings of the same type and value but one is protected by sheet pilings, while the other is not. The risk ratio is RR ⇥ x 0 = protection ⇤ = 0.62(CI : 0.42, 0.89), meaning that we expect a ⇠ 38 % Nat. Hazards Earth Syst. Sci., 13, 1-10 smaller loss at the protected building. Similar risk ratios for other predictors of interest are reported in Table 3.

Model III
This specification includes all available information in the loss function. Additionally, we amend the dispersion function with predictors of the flow height, flow velocity, and their interaction. In terms of statistical fit, the improvement over Model II is significant ( 2 10df : 55.9, P < 0.001). This improvement is largely due to the improved capture of heterogeneity in the dispersion function. Inclusion of the intensity predictors into the dispersion function accounts for variation in dispersion at the building level rather than the hazard site level.
The site-specific constants indicate that losses observed at the Zälgbach site are significantly smaller than those observed at the other hazard sites. Once we control for eventspecific factors, the other sites are statistically not different from each other. Information about the runout cover is not predictive. However, proportional losses increase with runout steepness.
Results of Model III support the finding that industrial buildings are more vulnerable to debris flow damages than residential buildings. Controlling for additional dispersion in the data, we now also find a significant effect on agricultural buildings. Concrete buildings and buildings where local protection measures were installed are significantly less vulnerable to debris flows than mixed or light constructions (Table 3). Coefficient estimates on the intensity predictors are again similar to those obtained with Models I and II, confirming that they are the most important loss predictors.

Prediction accuracy
We use leave-one-out cross validation (LOOCV) to assess the out-of-sample prediction accuracy of our models (Wilks, 2006). This resampling procedure repeatedly fits the DGLM using N 1 observations as training set to make a prediction for the left-out observation. Then, the predicted loss for this observation is compared to the observed loss and prediction accuracy is measured in terms of the mean absolute prediction error across every observation.
The mean absolute prediction errors for Models I-III are 0.108, 0.109, and 0.110, respectively (Table 2). This suggests that loss predictions based on the three models are on average within a range of ±11 % around the observed loss size and that none of the models outperforms the others in terms of prediction quality. This is, however, not to be confused with the statistical fit, which measures at the explanatory power of the included predictors.
To see whether there are ranges within the [0, 1] interval where the predictions are particularly inaccurate, we used Models I-III to predict the proportional losses on the full dataset, and compared predicted losses with observed losses. Figure 4 illustrates that all three models do a decent job of predicting small to medium losses. However, the models do not predict large losses or even total losses. We conclude that while our approach is useful to predict damages of common sizes, it is less suited to predict extreme damages. By definition, these are rare events and we miss sufficient observations in the extreme range of damages that would allow for us to better calibrate the model.

Conclusions
The assessment of potential damages is a key ingredient of any risk assessment study. As risk-based decisions play an ever more important role in natural hazard management  (Bründl et al., 2009), there has been growing interest in the development of predictive loss functions that could be used to quantify damage potentials for various natural hazards.
In this paper, we have outlined a framework to estimate proportional loss functions for debris flow events. The framework lends itself to the analysis of other hydrological hazards. Unlike standard regression models, the presented DGLM can deal with problems of overdispersion, and is therefore a valuable tool for empirical analyses of natural hazard damages.
We have used the DGLM to analyze what is maybe the richest dataset on debris flow damages to date in Switzerland. We find that flow depth, flow velocity, and their interaction are the most meaningful predictors of relative damage. Industrial and agricultural buildings tend to be more vulnerable to debris flows than residential buildings. This is most likely due to different value concentrations of the ground floor. Many agricultural and industrial buildings are single-story buildings, which by nature have an increased damage potential within the reach of debris flow surges. Residential buildings, in contrast, often have their values distributed over several stories so that the damage potential in the ground floor is relatively small compared to the total damage potential. This underlines the importance of controlling for overdispersion at the building level.
We find some insightful results with regard to building characteristics. These are of particular interest because property owners can alter them through self-protection efforts. Using the risk ratios reported in Table 3, the damage to a building without local protection measures such as sheet pilings or concrete enforcements is predicted to be 1.6-1.8 times as high as the damage to an otherwise identical nonprotected building. Similarly, the damage predicted to a concrete building is only about 60 % of the damage expected to an otherwise identical non-concrete building, suggesting that structural adaptation to the local environment is an effective means to reduce debris flow damages (Margreth and Romang, 2010).
How accurate are the results? As a LOOCV analysis has confirmed, proportional loss predictions generated with the DGLM approach are quite precise, with mean absolute prediction errors in the range of 11 %. From a practical point of view, this prediction quality is more than sufficient as the uncertainty introduced by the loss predictions is far smaller than that of other parameters entering the risk assessment for debris flow (Schaub and Bründl, 2010).
How do the results compare to the proportional loss functions found in recent studies by Totschnig et al. (2011) and Totschnig and Fuchs (2013)? We did some rough calculations comparing the proportional loss function obtained from our Model I with the best-fitting loss functions reported in Eq. (7) of Totschnig et al. (2011) and Eq. (3) of Totschnig and Fuchs (2013), respectively. If we keep flow velocity and impact angle fixed at sample means, we obtain loss predictions that are about 50 % lower than those of Totschnig et al. (2011); if we assume a flow velocity of 7 m s 1 (the upper range of the values Egli (2005) proposed for debris flows in flat terrain), loss predictions obtained from both models are within a range of 10 % (see Appendix A2 for a visualization). We conclude that, under reasonable assumptions, our loss predictions are in close agreement with those of Totschnig et al. (2011) and Totschnig and Fuchs (2013).
What is the practical value of the results? We may think of several ways to make use of proportional loss predictions by our approach. Here are but two examples. First, modern GIS techniques allow for combining proportional loss functions with hazard maps and databases of insurance contracts to create proper risk maps. Such risk maps could help hazard managers to identify at risk areas and to prioritize among mitigation needs based on the actual damage potential. Second, private property insurers could use the loss predictions to offer individualized contracts that better reflect each homeowner's risk. This would lead to a reduction in the overall volume of premiums paid by insurance holders because fair pricing reduces "moral hazard" -the tendency of the most-exposed insurance holders not to take self-protection measures, as there are no incentives to do so (Raschky and Weck-Hannemann, 2007). State building insurers (in 19 cantons of Switzerland) could use our results to get an overview as to where house owners should consider building local protection measures.
In conclusion, we believe that the presented approach bears great potential for the prediction of natural hazard damages. Needless to say that, with more data, we will be able to better calibrate the proportional loss functions and to im-Nat. Hazards Earth Syst. Sci., 13, 1-10, 2013 www.nat-hazards-earth-syst-sci.net/13/1/2013/ prove their prediction accuracy. It is therefore of great importance to build up and maintain central damage databases -a very good example is the HOWAS 21 database for flood losses ) -that allow for storing of event-specific information. Our analysis emphasizes that flow depths and heights, observed damages, and total values of damaged buildings are of particular interest.

Appendix A A1 Derivation of Eq. (2) by the delta method
The variance of the log-transformed risk ratio is a non-linear function, and hence has to be approximated. A common method to do so is the delta method; see, e.g., Fox (2008). Before we derive the variance of log RR[x 0 ] , we introduce a more compact notation. Definē y ij ⌘ y i | x 0 = j,X 0 8 i 2 N and j = {0, 1}, and f ⌘ logȳ 1 /ȳ 0 = logȳ 1 logȳ 0 .