Articles | Volume 17, issue 9
Research article
29 Sep 2017
Research article |  | 29 Sep 2017

Multi-variable flood damage modelling with limited data using supervised learning approaches

Dennis Wagenaar, Jurjen de Jong, and Laurens M. Bouwer

Abstract. Flood damage assessment is usually done with damage curves only dependent on the water depth. Several recent studies have shown that supervised learning techniques applied to a multi-variable data set can produce significantly better flood damage estimates. However, creating and applying a multi-variable flood damage model requires an extensive data set, which is rarely available, and this is currently holding back the widespread application of these techniques. In this paper we enrich a data set of residential building and contents damage from the Meuse flood of 1993 in the Netherlands, to make it suitable for multi-variable flood damage assessment. Results from 2-D flood simulations are used to add information on flow velocity, flood duration and the return period to the data set, and cadastre data are used to add information on building characteristics. Next, several statistical approaches are used to create multi-variable flood damage models, including regression trees, bagging regression trees, random forest, and a Bayesian network. Validation on data points from a test set shows that the enriched data set in combination with the supervised learning techniques delivers a 20 % reduction in the mean absolute error, compared to a simple model only based on the water depth, despite several limitations of the enriched data set. We find that with our data set, the tree-based methods perform better than the Bayesian network.

Short summary
Flood damage models are an important component of cost–benefit analyses for flood protection measures. Currently flood damage models predict the flood damage often only based on water depth. Recently, some progress has been made in also including other variables for this prediction. Data-intensive approaches (machine learning) have been applied to do this. In practice the required data for this are rare. We apply these new approaches on a new type of dataset (combination of different sources).
Final-revised paper