Models for the predictions of monetary losses from floods mainly blend data deemed to represent a single flood type and region. Moreover, these approaches largely ignore indicators of preparedness and how predictors may vary between regions and events, challenging the transferability of flood loss models. We use a flood loss database of 1812 German flood-affected households to explore how Bayesian multilevel models can estimate normalised flood damage stratified by event, region, or flood process type. Multilevel models acknowledge natural groups in the data and allow each group to learn from others. We obtain posterior estimates that differ between flood types, with credibly varying influences of water depth, contamination, duration, implementation of property-level precautionary measures, insurance, and previous flood experience; these influences overlap across most events or regions, however. We infer that the underlying damaging processes of distinct flood types deserve further attention. Each reported flood loss and affected region involved mixed flood types, likely explaining the uncertainty in the coefficients. Our results emphasise the need to consider flood types as an important step towards applying flood loss models elsewhere. We argue that failing to do so may unduly generalise the model and systematically bias loss estimations from empirical data.

The estimation of flood losses is a key requirement for assessing flood risk and for the evaluation of mitigation strategies like the design of relief funds, structural protection, or insurance design. Yet loss estimation remains challenging, even for direct losses that can be more easily determined than indirect losses

Without standard loss documentation procedures in place, the highly variable losses caused by different flood types (e.g. pluvial, fluvial, coastal) can make loss modelling particularly challenging, especially where data are limited or heterogeneous. This lack of detailed or structured data motivates most modelling studies concerned with flood loss to assign just a single type of flooding to each event

In this context, multilevel or hierarchic models are one alternative and offer a compromise between a single pooled model fitted to all data and many different models fitted to subsets of the data sharing a particular attribute or group. Bayesian multilevel models use conditional probability as a basis for learning the model parameters from a weighted compromise between the likelihood of the data being generated by the model and some prior knowledge of the model parameters. These models explicitly account for uncertainty in data, low or imbalanced sample size, and variability in model parameters across different groups

In contrast to empirical models, synthetic models are developed based on expert opinion and offer a good approach to harmonise loss estimations. However, how these models rely on assumptions is problematic when preparedness and other behavioural variables are concerned. In general, synthetic models tend to reduce the variability in data and remain rarely validated

In this study, we use survey data from households affected by large floods throughout Germany between 2002 and 2013

Here we expand on the model of

In this study we use the data from a joint effort that conducted surveys among households affected by large floods throughout Germany to investigate various aspects of the flood damaging process more systematically. Beginning with the large Central European floods of 2002, this database has more than 4000 entries from 6 different flood events

These data go beyond addressing physical inundation characteristics and also include aspects of warning, preparedness, and precaution at the level of individual households. This gathering of socioeconomic information and building characteristics thus offers a broad view of the damaging process rarely found elsewhere

From this dataset,

In this study, we used three characteristics to group our data: (i) flood type, with the categories levee breaches, riverine, surface, and groundwater floods; (ii) regions of Germany, with the categories south (Bavaria and Baden-Württemberg), east (Brandenburg, Mecklenburg-Western Pomerania, Saxony, Saxony-Anhalt, and Thuringia), and west and north (Hesse, Lower Saxony, North Rhine-Westphalia, Rhineland Palatinate, and Schleswig-Holstein – grouped together due to the low number of cases); and (iii) flood year, i.e. 2002, 2005, 2006, 2010, 2011, and 2013. We tested three model variants, each using only one group variable at a time (Table

Description of potential predictors of flood loss.

Number of instances in the training set used across grouping variables flood type, region, and event year (

Single-level multiple linear regression is adequate for capturing general trends in data but ignores structure in the data, such as flood type or region affected. We explore the suitability of a Bayesian multilevel model to estimate relative building loss (or loss ratio) from models with different predictor combinations. We use a numerical sampling scheme for Bayesian analysis implemented in the

Bayesian multilevel models weigh the likelihood of observing the given data under the specified model parameters by prior knowledge. Bayesian models thus express the uncertainty in both the prior parameter knowledge and the posterior parameter estimates. The multilevel approach allows us to analyse all data in one model while honouring structure or nominal groups in the data. Thus, the training of the group-specific parameters occurs at the same time so that model parameters can inform each other by means of specified (hyper-)prior distributions. This approach warrants more training data than running stand-alone models on subsets of our data, which in turn are more prone to over- and underfitting and overestimates of the regression coefficients while reducing effects of collinearity and offering a natural form of penalised regression

In a multilevel model, the data are structured into

The joint posterior distribution can then be written as

The

The choice of the likelihood and the priors should follow assumptions about the data-generation process

In

Each model run consisted of 4 chains, each with 3000 iterations and 1500 warm-up runs; we used a thinning of every 3 samples and obtained a total number of 2000 post-warm-up samples. To assess whether the simulations converged, we checked the Gelman–Rubin potential scale reduction factor

We trained the models using several different combinations of predictors to find the best balance between complexity and predictive accuracy. Our main motivation was to achieve a good balance of sufficiently detailed but available data, which is often challenging

We compared models with a gradually increasing number of predictors based on the prior knowledge of predictor importance reported in a study using single-level linear regression by

The model selected in step 1 – “fit_s1” – has a subset of the predictor matrix

We compared the model candidates combining the selected candidates from step 2. If, for example, two different candidates

We compared all candidate models using leave-one-out cross-validation (LOO-CV) with Pareto smoothed importance sampling (PSIS-LOO), which is an out-of-sample estimator of predictive model accuracy

Having identified the models with the most informative predictors, we checked for credible differences across levels using the 95 % highest density interval (HDI) of the marginal posterior distributions of the model parameters. We refer to regression intercepts and slopes as

We begin by reporting results form the model selection where we aimed at a compromise between model complexity, predictive accuracy, and data availability. For example, the generic model (Eq.

Judging from the predictive capacity using LOO-CV, we arrived at a number of models worth further inspection. Table

Comparison of flood-type model candidates of differing complexity and using their expected log pointwise predictive density (ELPD-LOO), ranked by increasing predictive accuracy, along with differences and their standard errors with reference to model “fit1” (see Table S1 for all model variants).

We find that models hardly improve beyond the complexity of model “fit6” (Table

Comparison of the flood-type model candidates by their difference in ELPD-LOO using the first six predictors plus one predictor at a time, ranked by increasing predictive accuracy, along with their differences and the standard error of the differences with reference to the model “fit6” (see Table S2 for all model variants).

We find that “fit6+11” is the candidate model with the highest accuracy, though “fit6+7” is comparable (Table

Comparison of flood-type model candidates by their difference in ELPD-LOO using combinations of the first five predictors (fit5) plus predictors 6, 7, and 11, along with their differences and the standard error of the differences with reference to candidate model “fit5+6” (see Table S3 for all model variants).

Comparison of model candidates by their difference in ELPD-LOO using combinations of the first five predictors (fit5) plus predictors 6, 7, and 11, along with their differences and the standard error of the differences with reference to candidate model “fit6” for each model variant.

Table

We fit three multilevel models with the selected candidates (fit “6+11”, i.e. water depth, building area, contamination, duration, PLPMs, insured, flood experience) in each of the flood-type, regional, and event models. All three multilevel models converged (

Performance indicators over mean values of the posterior predictive distribution (median of performance indicators over the full posterior predictive distribution) and convergence indicators of the three model variants. RMSE: root mean square error; MAE: median absolute error;

We also ran posterior predictive checks by comparing the observed distribution of the loss ratio with the posterior predictive distribution drawn from the training and the test data (Fig.

Density plot of observed loss ratio (

In this section we show the group-level coefficient estimate intervals of each model and whether they are credibly different for different groups. We report the highest density interval (HDI) of the posterior model weights and compare these estimates between the groups of each model. The models use an inverse-logit transformation over the linear regression (Eq.

Figure

For example, the standardised group-level intercepts (

The effects of flood duration (Fig.

Credibly different pairs of estimates with 95 % probability.

The 95 % HDI of regression estimates of the flood-type model (across four flood types; coloured segments) and the single-level model (black segments). The intercept is the sum of the population-level effect (common across levels) and group-level effects (for each flood type).

Figure

Similar to the flood-type model, all estimates are credibly different from zero for water depth (Fig.

The 95 % HDI of regression estimates of the regional model (across three regions; coloured segments) and the single-level model (black segments). The intercept is the sum of the population-level effect (common across levels) and group-level effects (for each region).

Figure

Estimates of the intercept (Fig.

The 95 % HDI of estimates of water depth (Fig.

The 95 % HDI of regression estimates of the flood-event model (each event coded by colour) and the single-level model (black bars). The intercept is the sum of the population-level effect (common across levels) and group-level effects (for each event).

We trained three variants of a Bayesian multilevel model to test whether flood type, regions within Germany, or flood events make a case for differing predictor influences on flood loss concerning these groups. The models help us to identify the factors most relevant for flood loss estimation and to assess whether there are credible differences between these contributions to the estimated loss ratio. In other words, the models show how considering these groups is a useful step towards improved model transferability.

After comparing the predictive-accuracy estimates of models with different sets of predictors, we selected the model “fit6+11” that uses water depth, building area, contamination, duration, PLPMs, insurance, and previous flood experience as predictors. Considering that we aim to explore the role of predictors in estimating flood losses rather than find the best fit model, chain convergence and posterior predictive checks are a necessary step before interpreting the fitted model

Our results show that, for most cases across regions or across flood events, the posterior regression weights are hardly different. Therefore, distinguishing groups, at least in the form implemented here, adds little information over a pooled model taking into account all of the data. Out of the training dataset of 1269 data points, the groups contained much smaller (

We note that the higher the water depth, the contamination of the floodwater, or the duration a building is inundated, the higher the loss ratio, assuming all other predictors are fixed. This is a simple expectation

Although previous work has indicated more intense flood events in eastern than in southern Germany, except for the 2005 flood

Despite the large overlap across estimates of the flood-event model, we find that the estimates for 2002, 2010, and 2013 for water depth and contamination are larger and more credible, reflecting also larger average losses reported by the households (Table S4). Although the 2006 subsample had a large average flood duration (Table S4), it still returns a highly uncertain coefficient estimate. The severe Central European flood of August 2002 in Germany mainly affected the rivers Danube and Elbe, and only a few households had implemented PLPMs or had previous flood experience

We emphasise that each event and each region of Germany contained mixed flood types (or pathways). For most predictors, the factors' effects are much clearer across flood types. This reinforces the notion that their importance varies across flood types. Given that mixed flood types were reported in all regions and years in our dataset, this might be the reason the predictor effects are also less certain and overlapping across regions and years.

It is plausible that the effects of some variables are influenced by others, whether included or ignored in our initial set. Only a few studies have so far directly compared the effect of predictors of flood loss ratio across groups in the data, such as flood types, events, or places. Two of them, i.e.

Data availability, especially regarding preparedness indicators, is a possible limitation to transferring flood loss models and their use for ex ante loss estimation. While these indicators have been deemed relevant for loss prediction, they are rarely collected and are often unavailable in a suitable form. An alternative is to use proxy data, for example the aggregated insurance coverage for Germany monitored by the German Insurance Association

When addressing transferability, we seek models that can generalise well and go beyond local or case-specific data.

Previous studies have indicated that the major damaging processes during floods may differ by flood type, event, and affected region. To better understand these differences and improve the transferability of flood loss models, we trained and tested Bayesian multilevel models for estimating relative flood losses of residential buildings.

Our model selection identified seven predictors addressing the flood magnitude (water depth, contamination, and duration), the building size (building area), and preparedness of the household (previous experience, insurance, and an indicator of implemented PLPMs). For at least one group, all predictors show credible posterior estimates of 95 % HDI. This result confirms that all these predictors can aid flood loss ratio estimation and reinforces the need to collect data after new flood events. This repeated updating is at the core of Bayesian models, which can also handle missing data and account for uncertainty intrinsically and are effectively finding a compromise between existing models and new data. We argue that this strategy might pave one way for transferring flood loss models more widely.

Credibly different estimates were found for six out of seven predictors across flood type, region, and event year, namely water depth, contamination, duration, implementation of property-level precautionary measures, insurance, and previous flood experience. The Bayesian multilevel model grouped by flood type is the most informative of these three model variants, featuring the most pronounced differences in the contributions of each predictor. Despite credible differences between different flood events, the large uncertainties in the posterior estimates of the regional and the event models likely indicate that several flood types may have mixed during a single flood event or region, thus making it difficult to disentangle individual controls better. In any case, the dataset is hardly conducive to fully revealing the underlying physical controls on flood losses.

Our results encourage using pooled data on flood events and regions and thus mark some transferability in this regard, judging from the minute differences in the posterior regression weights. The data indicate, however, that flood loss modelling should consider different flood types explicitly. We acknowledge that other groups in the data or a different set of predictors could improve predictions further but recommend strategies that make use of previous knowledge as much as possible. We conclude by reporting that grouping models by flood type adds information and transferability to flood loss estimation and encourage more research in this direction.

The survey data are owned by the second author. Data from the 2002 event can be provided only upon request. Data from the 2005, 2006, 2010, 2011, and 2013 events are available via the German flood loss database HOWAS21 (

The supplement related to this article is available online at:

All authors contributed to the conceptualisation of the study; GSM and OK contributed to the development of the model; GSM developed the code; all authors analysed the results and wrote the manuscript

The authors declare that they have no conflict of interest.

The surveys were conducted by a joint venture between the GeoForschungsZentrum Potsdam; the Deutsche Rückversicherung AG, Düsseldorf; and the University of Potsdam. Besides original resources from the partners, additional funds for data collection were provided by the German Ministry for Education and Research (BMBF). We thank Meike Müller, Ina Pech, Sarah Kienzler, and Heidi Kreibich for their contributions to the survey design and data processing.

This research has been supported by the German Academic Exchange Service (DAAD, Graduate School Scholarship Programme; grant no. 57320205). The surveys were partly financed by the German Ministry for Education and Research (BMBF) in the framework of the following research grants: DFNK (grant no. 01SFR9969/5), MEDIS (grant no. 0330688), and Flood 2013 (grant no. 13N13017).

This paper was edited by Daniela Molinari and reviewed by Nivedita Sairam and one anonymous referee.