Are OpenStreetMap building data useful for ﬂood vulnerability modelling?

. Flood risk modelling aims to quantify the probability of ﬂooding and the resulting consequences for exposed elements. The assessment of ﬂood damage is a core task that requires the description of complex ﬂood damage processes including the inﬂuences of ﬂooding intensity and vulnerability characteristics. Multi-variable modelling approaches are better suited for this purpose than simple stage-damage functions. However, multi-variable ﬂood vulnerability models require detailed input data and often have problems to predict damage for regions other than those for which they have been developed. A transfer 5 of vulnerability models usually results in a drop of model predictive performance. Here we investigate the question whether data from the open data source OpenStreetMap is suitable to model ﬂood vulnerability of residential buildings and whether the underlying standardized data model is helpful to transfer models across regions. We develop a new data set by calculating numerical spatial measures for residential building footprints and combine these variables with an empirical data set of observed ﬂood damage. From this data set random forest regression models are learned using regional sub-sets and are 10 tested for predicting ﬂood damage in other regions. This regional split-sample validation approach reveals that the predictive performance of models based on OpenStreetMap building geometry data is comparable to alternative multi-variable models, which use comprehensive and detailed information about preparedness, socio-economic status and other aspects of residential building vulnerability. The transfer of these models for application in other regions should include a test of model performance using independent local ﬂood data. Including numerical spatial measures based on OpenStreetMap building footprints reduces 15 model prediction errors (MAE by 20%, and MSE by 25%) and increases the reliability of model predictions by a factor of 1.4 in terms of the Hit Rate when compared to a model that uses only water depth as a predictor. This applies also when the models are transferred to other regions which have not been used for model learning. Further, our results show that using numerical spatial measures derived from OpenStreetMap building footprints does not resolve all problems of model transfer. Still, we conclude that these variables are useful proxies for ﬂood vulnerability modelling because these data are consistent (i.e. input 20 variables and underlying data model have the same deﬁnition, format, units, etc.), openly accessible, and thus make it easier and more cost-effective to transfer vulnerability models to other regions.

Floods have huge socio-economic impacts globally.Driven by increasing exposure, as well as increasing frequency and intensity of extreme weather events, consequences of flooding have sharply risen during recent decades (Hoeppe, 2016;Lugeri et al., 2010).Therefore, effective adaptation to growing flood risk is an urgent societal challenge (UNISDR, 2015;Jongman, 2018).With the transition to risk oriented approaches in flood management, flood risk models are important tools to conduct quantitative risk assessments as a support for decision making from continental to local scales (Alfieri et al., 2016;Moel et al., 2015;Winsemius et al., 2013).While macro-or meso-scale risk assessment approaches target regional, national or continental studies, risk assessment on the micro-scale is needed to guide urban planning, optimize investment for protection and other mitigation measures considered in flood risk management plans (Meyer et al., 2013;Moel et al., 2015;Rehan, 2018).Flood risk models include components to represent the key elements of flood risk: hazard, exposure and vulnerability (Kron, 2005).Flood hazard is usually modeled with high spatial resolutions in order to realistically capture variability in flood hazard intensity in consideration of local topographic characteristics (Apel et al., 2009;Teng, 2017).For consistent risk assessments, exposure and vulnerability need to be analysed on similar scales and with appropriate spatial resolution.With an increasing availability of new exposure data sets including for instance information about the number, occupancy, and characteristics of exposed objects (Figueiredo and Martina, 2016;Paprotny et al., 2020;Pittore et al., 2017) micro-scale exposure and vulnerability modelling gains much traction (Lüdtke et al., 2019;Schröter et al., 2018;Sieg et al., 2019).
Both synthetic (e.g.Blanco-Vogt and Schanze (2014); Dottori et al. (2016); Penning-Rowsell and Chatterton (1977)) and empirically based models (e.g.Thieken et al. (2005); Zhai et al. (2005)) have been proposed for micro-scale vulnerability modelling.As flood damaging processes are complex, a large diversity of influencing factors needs to be taken into account to capture and appropriately represent flooding intensity and resistance characteristics of exposed elements in flood vulnerability models (Thieken et al., 2005).In this context, multi-variable modelling approaches are an important advance from simple stage-damage curves, which relate only water depth to flood loss.While multi-variable vulnerability models usually outperform traditional stage-damage functions (Merz et al., 2004;Schröter et al., 2014), the downside of these approaches is an increased need of detailed data on the level of individual objects (Merz et al., 2010(Merz et al., , 2013) ) which are often not available in the target area of the analysis (Apel et al., 2009;Cammerer et al., 2013;Dottori et al., 2016).Missing standards for collecting comparable and consistent data are one reason for this problem (Changnon, 2003;Meyer et al., 2013).Hence, providing the input variables for multi-variable flood vulnerability models on the micro-scale is a key challenge for their practical applicability.Another challenge is the generalization of locally derived vulnerability models.A number of studies confirm a model performance mismatch between regions where models have been developed and the target areas for application (Cammerer et al., 2013;Jongman et al., 2012;Schröter et al., 2016;Wagenaar et al., 2018).It is argued that the generalized application of vulnerability models to different geographic and socio-economic conditions needs to consider an adequate representation of local characteristics and damage processes (Felder et al., 2018;Figueiredo et al., 2018;Sairam et al., 2019).Hence, consistency in input data is an important requirement for the spatial transfer of vulnerability models (Lüdtke et al., 2019;Molinari et al., 2020).The availability, accessibility and consistency of data sources are important requirements for generalized vulnerability model applications but also poses requirements on modelling approaches.With an increased number of input variables and an enlarged diversity of data sources used for vulnerability modelling, we usually deal with heterogeneous data in terms of different scaling, degrees of detail, resolution and complex inter-dependencies (Schröter et al., 2016(Schröter et al., , 2018)).Tree based algorithms are a suitable approach to handle heterogeneous data, represent non-linear and non-monotonic dependencies, and, as a nonparametric approach, do not require assumptions about independence of data (Carisi et al., 2018;Merz et al., 2013;Schröter et al., 2014;Wagenaar et al., 2017).The Random Forest (RF) algorithm (Breiman, 2001) is broadly used in many disciplines, due to its high predictive accuracy, simplicity in use and flexibility concerning input data.In the domain of flood risk modelling, Wang et al. (2015) have successfully applied RF for flood risk assessment and Bui et al. (2020) used RF for flood susceptibility mapping.Merz et al. (2013) demonstrated the suitability of tree-based algorithms for flood vulnerability modelling.Following this, Carisi et al. (2018); Chinh et al. (2015);Hasanzadeh Nafari et al. (2016); Sieg et al. (2017); Wagenaar et al. (2017) have used RF and other tree-based algorithms for flood loss estimation in flood prone regions in Vietnam, Australia, the Netherlands and Italy.In these studies, vulnerability modelling using RF was based on site specific empirical data sets which had been collected ex-post major flood events.In contrast, the framework proposed by (Amirebrahimi et al., 2016) successfully used 3D building information for flood damage assessment of individual buildings.Gerl et al. (2016) and Schröter et al. (2018) investigated the suitability of alternative more general data sources for flood vulnerability modelling using urban structure type information derived from remote sensing images, virtual 3D city models and numerical spatial measures which describe the extent and shape complexity of residential buildings.It was shown that geometric information such as building area and height are useful variables to describe building characteristics relevant for estimating flood losses (Schröter et al., 2018).From these studies it has been concluded that data about building geometry work as a proxy to describe resistance characteristics of buildings.However, further analyses are needed to understand whether building geometry data enable consistent flood vulnerability modelling with high resolution and are suitable to characterise differences in flood vulnerability across regions.With new data sources emerging from crowdsourcing projects and open data initiatives, detailed building data are increasingly available and accessible (Irwin, 2018).Open and/or standardized building data are a promising data source to coherently describe exposure and characterise vulnerability of residential buildings, and to improve the spatial transfer of vulnerability models given a consistent underlying data model and clear specification of input variables across regions.Data science methods are predestined to make use of these data in flood vulnerability modelling.Against this backdrop, we investigate the suitability of the open data source OpenStreetMap (OSM) (contributors, 2020) for flood vulnerability modelling of residential buildings.OSM is a geographic database with a worldwide coverage which is nowadays considered as reliable (Barrington-Leigh and Millard-Ball, 2017).The information about building footprints is freely available and straightforward to obtain from public online servers.
The OSM contributors community is constantly growing and assures regular updates in terms of accuracy and completeness of the data (Hecht et al., 2013).
We test the hypothesis that numerical spatial measures derived from OSM building footprints provide useful information for the estimation of flood losses to residential buildings.From the underlying consistent OSM data model and standardized calculation of spatial measures we expect an improvement of the spatial transfer of flood vulnerability models across regions.
Accordingly, the research objectives are i) to understand which building geometry related variables are useful to describe building vulnerability, ii) to learn predictive flood vulnerability models, and iii) to test and evaluate model transfer across regions.In Section 2 the data sources, the derived variables and the preparation of data sets are described.Section 3 introduces the methods to identify predictor variables and to derive predictive models.Further, it describes the set-up for testing and evaluating model performance in spatial transfers.The results from these analyses are reported and discussed in Section 4.
Conclusions are drawn in Section 5.

Data
We use an empirical data set of relative loss to residential buildings and influencing factors which has been collected via computer aided telephone interviews (CATI) during survey campaigns after major floods in Germany since 2002.Another data source is OSM (contributors, 2020) providing information about building locations, geometries, occupancy and other characteristics.OSM data is complemented with numerical spatial measures calculated from geometries of OSM building footprints.

Computer aided telephone interview data
CATI surveys were conducted with affected private households ex-post major floods in Germany.The regional focal points of flood impacts were the Elbe catchment in east Germany, and the Danube catchment in southern Germany.Particularly noteworthy are the floods of 2002 and 2013, which caused economic losses of EUR 11.6 bn (reference year 2005) and EUR 8 bn respectively in Germany (Thieken et al., 2006(Thieken et al., , 2016)).With EUR 1 bn economic damage, the city of Dresden at the Elbe River in Saxony has been a hotspot of flood impacts during the August 2002 flood (Kreibich and Thieken, 2009).In August 2002, flash floods triggered by record breaking precipitation and numerous levee failures caused widespread flooding along the Elbe River and its tributaries in Saxony and Saxony-Anhalt as well as along the Regen River and other southern tributaries to the Danube River in Bavaria (Schröter et al., 2015).The magnitude of flood peak discharges along these rivers well exceeded a statistical return period of 100 years (Ulbrich et al., 2003).In May 2013 a pronounced precipitation anomaly with subsequent extreme precipitation end of May/beginning of June caused severe flooding in June 2013 especially along the Elbe and Danube Rivers with new water level records and major dike breaches both at the Elbe and Danube Rivers (Conradt et al., 2013;Merz et al., 2014;Schröter et al., 2015).The magnitude of flood peak discharges exceeded statistical return periods of 100 years along the Elbe, Mulde and Saale tributaries, and along the Danube and Inn River in Bavaria (Blöschl et al., 2013;Schröter et al., 2015).With 180 questions, the CATI surveys cover a broad range of flood impact related factors including building characteristics, effects of warnings, precaution and the socio-economic background of households.The survey campaigns for different floods are consistent in terms of acquisition methodology, type and scope of questions.The interviewees were randomly selected from lists of potentially affected households along inundated streets which have been identified from satellite data, flood reports and press releases.With an average response rate of 15%, in total 3056 interviews have been completed.For further details about the surveys and data processing refer to (Kienzler et al., 2015;Thieken et al., 2005Thieken et al., , 2017)).Building on the findings of previous work (Merz et al., 2013;Schröter et al., 2014), for this study 23 variables have been preselected with a focus on building characteristics, flood intensity at the building, socio-economic status as well as warning, precaution and previous flood experience (Table 1).In addition, relative loss to the building has been determined as the ratio of reported actual losses and the building value (replacement cost) at the time of the flood event (Elmer et al., 2010).
Hence, it describes the degree of building damage on a scale from 0 (no damage) to 1 (total damage).Building values are based on the standard actuarial valuation method of the insurance industry in Germany (Dietz, 1999) which estimates replacement costs using information about the floor space, basement area, number of storeys, roof type, etc. that are available from CATI data.Relative loss to the building (rloss), and water depth (wst) at the building are the key variables from the CATI dataset used in this study.rloss is used to learn predictive models and to evaluate their performance.Consequently, the records in the CATI data set without values for rloss are removed.This reduces the number of available records from 3056 to 2203.wst is the most commonly used predictor in flood vulnerability modelling (Gerl et al., 2016), because it is a highly relevant characteristic of flood intensity and it is usually available from hydrodynamic-numeric simulations.wst from CATI is a continuous variable with a length unit in centimeters.Negative values represent a water level below the ground surface, which affects only the basement of a building.

OpenStreetMap data
OSM is a free web-based map service built on the activity of registered users who contribute to the database by adding, editing or deleting features based on their local knowledge.The contributors use GPS devices and satellite as well as aerial imagery to verify the accuracy of the map.OSM is an open data project and the cartographic information can be downloaded, altered and redistributed under the Open Data Commons Open Database License (ODbL) (contributors, 2020).Among the so-called volunteered geographic information (VGI) projects (Goodchild, 2007), OSM is the most widely known.OSM data provide information about building locations, footprint geometries, occupancy and other characteristics.The positional accuracy of OSM data, and the completeness of the database in respect to the number of mapped objects present in the real world, are nowadays considered as satisfactory for most of the developed countries and urban areas (Barrington-Leigh and Millard-Ball, 2017;Hecht et al., 2013).On the contrary, information on object attributes such as road names or building types are often scarce and inconsistent.The tag building is used to identify the outline of a building object in OSM.The majority of buildings (82%) has no further description and only 12% are specified as primarily residential or a single family house (https://taginfo.openstreetmap.org/keys/building#values(28.02.2020)).Therefore, the filtering for residential buildings from the OSM database uses the underlying residential landuse information of OSM.By joining the landuse information to the building polygons, those of residential occupation can be identified and selected.

Data preparation
The OSM and CATI data sets have been conflated in order to link the empirically observed variables rloss and wst with OSM data for individual residential buildings.This operation uses the geolocation information of both data sources.The CATI data are provided with address details including community, zip code, street name, and the house number ranges in blocks of 5 numbers.Geocoding algorithms including open web API (Application Programming Interface) services like Google (develop-  were applied to obtain geocoordinates for the address information from the interview data. OSM is a spatial data set including georeferenced building outlines.The geolocated interviews are spatially matched with OSM building polygons using an overlay operation which merges interview points with OSM building polygons.In view of limited address details regarding the building house number ranges and inherent inaccuracies of geocoding databases and algorithms (Teske, 2014) a buffer radius of 5 meters has been used to correct for offsets between geocoding points and building polygons.CATI records which still could not be matched with OSM geometries and with obviously erroneous geolocations, e.g.position is far away from flood affected areas or urban settlements, have been removed from the data set.After these steps 1649 records remain from the original set of CATI surveys.The spatial distribution of these data points highly concentrates on the Elbe catchment (1234 records) including Dresden (310 records) and on the Danube catchment (105 records) (Fig. 1)

Numerical measures
Information about building geometry is useful to support the estimation of flood losses to residential buildings (Schröter et al., 2018).Building on this knowledge, numerical spatial measures are calculated for OSM building footprints with the aim to add potential explanatory variables to the estimation of relative loss to residential buildings.For this purpose, image analysis algorithms typically used in landscape ecology are adopted.These algorithms calculate numerical spatial measures like area, perimeter, elongation and complexity based on the analysis of geometries identified in aerial or remote sensing images (Jung, 2016;Lang and Tiede, 2003;Rusnack, 2017).The numerical spatial measures are calculated for each OSM building polygon and are compiled in Table 2 along with the other CATI variables that are used to derive flood vulnerability models.The meaning of these spatial measures, the equations as well as the range of values and examples are listed in the Appendix A1.

Methods
We analyse the created data set with two main objectives.First, we strive to identify those variables from Table 2, which are most useful to explain relative loss to residential buildings.Second, we aim to derive flood vulnerability models for residential buildings and to test these models for spatial transfers across regions.The data analyses workflow including data pre-processing, model learning, model selection and model transfer are illustrated in Fig. 2. The data pre-processing steps with data preparation and numerical spatial measures have been described in the previous section.For model learning and model transfer we use the Random Forest (RF) machine learning algorithm introduced by (Breiman, 2001).
RF are an extension of the classification and regression tree (CART) algorithm (Breiman et al., 1984) which aims to identify a regression structure among the variables in the dataset.Regression trees recursively sub-divide the space of predictor variables to approximate a nonlinear regression structure.This sub-division is driven by optimizing the accuracy of local regression in these regions which, by repeated partitioning, leads to a tree structure.Predictions are made by following the division criteria along the nodes and branches from the root node to the leaves which finally contain the predicted value for a given set of input variables.RF make predictions based on a large number of decision trees, i.e. a forest, which is learned by randomly selecting the variables considered for splitting the features space of the data.RF incorporates bootstrap aggregation (bagging) as a simple and powerful ensemble method to reduce the variance of the CART algorithm.In comparison to single trees, RF are more suitable to identify complex patterns and structures in the data (Basu et al., 2018).As an ensemble approach, RF learns a regression tree for a number of bootstrap replica of the learning data.This results in a number of trees (ntree) forming a forest of regression trees.To reduce correlation between trees, the RF algorithm randomly selects a subset of variables (mtry) which are evaluated for dividing the space of predictor variables.This efficiently reduces overfitting and makes RF less sensitive to changes in the underlying data.Each bootstrap replica is created by randomly sampling with replacement about two thirds of observations from the original data set.The remaining data are indicated as out-of-bag (OOB) observations and are used for evaluating the predictive accuracy of the tree, in terms of the OOB error.For regression trees the OOB error is the mean squared sum of residuals.For loss estimation, the predictions of all trees are combined by aggregating the individual predictions as the mean prediction from the forest.The predictions of the individual trees, i.e. from the ensemble of models, provide an estimate of predictive uncertainty.
For variable selection and predictive model learning RF provides a concept to quantify the importance of candidate explanatory variables which allows to select the subset of most relevant variables.RF are also an efficient algorithm to learn predictive models from heterogeneous datasets with complex interactions and with different scales like continuous or categorical information (Huang and Boutros, 2016).
RF predictive model performance is sensitive to specifications of the algorithm parameters mtry and ntree (Huang and Boutros, 2016).Therefore, the optimum values for both parameters are identified as those which yield minimum OOB errors on an independent data set.For parameter tuning, we pursue the variation approach implemented by (Schröter et al., 2018) by selecting parameters from a broad and comprehensive range of values ntree ∈ [100,500,1000,2000,3000,15000] and mtry ∈ [p/6, p/3, 2p/3] with p as number of candidate predictors and derive RF models for each combination.For each pair of chosen values, the algorithm is repeated 100 times to account for inherent data variability.The optimum parameters will minimize the prediction error on the OOB sample data.Using the optimum RF parameter settings, we derive predictive models for rloss.

Variable selection
The first step in model learning is the selection of variables to be used as predictors in the model.The analysis of the Spearmans rank correlation between the variables gives a first insight into the linear dependency structure of the data-set.Furthermore, RF supports the evaluation and ranking of potential predictors by quantification of variable importance which also accounts for variable interaction effects.The importance of a selected variable is evaluated by calculating the changes of the squared error of the predictions when the values of that variable are randomly permuted in the OOB sample.The increase of the average error will be larger for more important variables and smaller for less important variables.On this basis it is possible to decide which variables to include in a predictive model.The outcomes of variable importance evaluations are sensitive to the RF algorithm parameters mtry and ntree (Genuer et al., 2010).Therefore, to achieve stable results for these analyses we implement a robust approach which averages the outcomes of multiple runs with variations in RF parameters (Schröter et al., 2018): ntree ∈ [500, 1000,1500,2000,5000] whereby each tree is repeatedly built for mtry ∈ [p/6, p/3, 2p/3], with p as number of candidate predictors, which correspond to the lower limit, the default value and the upper limit, suggested by (Breiman, 2001).Following this procedure, the potential explanatory variables of our data set (Table 2) are evaluated and ranked according to their relative importance to predict rloss.

Predictive model learning
Variable selection needs to be considered as an essential part of the model evaluation process.Therefore, candidate RF models using different numbers of variables are assessed in terms of predictive performance for independent data.
The OSM based numerical spatial measures differentiate building form and shape complexity.To gain further insights into the suitability of these variables for flood vulnerability modelling we incrementally add explanatory variables to the learning data set.Based on the outcomes of variable importance ranking the learning set is expanded variable by variable and models of increasing complexity are learned (c.f.Table 2).From the comparison of model predictive performance between these candidate models the best balance between model performance and number of input variables is assessed.This is implemented by bootstrapping the splitting of the data into sub-sets for learning (60%) and testing (40%) with 100 iterations.
Further, for an independent assessment of OSM based vulnerability model performance we consider two benchmark models.
We argue that the set of CATI variables (Table 1) represents the most detailed data set available for flood loss estimation of residential buildings (Merz et al., 2013;Schröter et al., 2014;Thieken et al., 2016).Therefore, a RF model is learned using all 23 CATI predictors as an upper benchmark (BMu).In contrast, a RF model using only wst as a predictor is learned as a lower benchmark.The reasoning is that using extra variables in addition to wst will improve the predictive performance of the models (Schröter et al., 2018(Schröter et al., , 2016)).As described in section 2.3, the detail of geolocation information from CATI data is limited to ranges of house numbers.Therefore, we face uncertainty in whether CATI data and OSM building footprints have been matched correctly.To assess the potential implications of this source of uncertainty we derive a model (BMrm) which is based on a data set with rloss and wst observations randomly assigned to OSM building footprints.We keep the RF modelling approach for the benchmark models consistent to ensure that any observed difference in model performance stems from differences in the underlying input variables.

Predictive model evaluation
Model predictive performance is evaluated by comparing predicted (P ) and observed (O) rloss values from the validation sample using the following metrics.In these metrics RF predictions are evaluated for the median prediction (P 50 ) derived from the ensemble of individual tree predictions.
Mean Absolute Error (MAE) quantifies the precision of model predictions, with smaller values indicating higher precision: Mean Bias Error (MBE) is a measure of accuracy, i.e. systematic deviation from the observed value.Unbiased predictions yield a value of 0, underestimation results in negative and overestimation in positive values: Mean Squared Error (MSE) combines the variance of the model predictions and their bias.Again, smaller values indicate better model performance: The ensemble of model predictions from the RF models offers insight into prediction uncertainty.This property is analyzed by evaluating the 90-percent quantile range, i.e. the difference between the 5-quantile and 95-quantile in relation to the median, as a measure of ensemble spread: with 95-quantile, 5-quantile and the 50-quantile, i.e. the median of the predictions.QR 90 is a measure of sharpness with smaller values indicating a smaller prediction uncertainty.
Reliability of model predictions is quantified in terms of the hit rate (Gneiting and Raftery, 2007): HR calculates the ratio of observations within the 95-5-quantile range of model predictions.For a reliable prediction HR should correspond to the expected nominal coverage of 0.9.
HR and QR 90 are combined to the interval score (IS) which accounts for the trade-off between HR values and QR 90 ranges (Gneiting and Raftery, 2007):

Spatial transfer evaluation
We investigate the question whether the consistent data basis of OSM derived numerical spatial measures supports the transfer of flood vulnerability models across regions by splitting the available data set into subsets for different regions affected by major floods.The CATI data are mainly located in the Elbe and Danube catchments in Germany, which are the regions mostly affected by inundations and flood impacts.This suggests a regional subdivision of the empirical data set according to these river basins for the investigation of spatial model transfer.In detail we partition the data set between the metropolitan area of Dresden (Saxony), the Elbe catchment (Saxony, Saxony-Anhalt, Thuringia), and the Danube catchment (Bavaria, Baden-Wuerttemberg), see Fig 1 .This split is applied irrespective of the CATI survey campaign year, and thus the regional sub-sets contain records from different flood events.The idea is to investigate examples with a small set of learning data for a small specific region (Dresden), a large learning data set from an extended region (Elbe catchment), and a small set of learning data from an extended region (Danube catchment).The details for the learning and transfer applications are listed in Table 3.For these three regions we learn RF models using the selected variables and assess their predictive performance when transferred to the other regions.As we use a completely independent dataset for model transfer testing, no additional bootstrap on top of RF internal bootstrapping is required.

Results and Discussion
Random Forest OOB errors are sensitive to the choice of RF parameters mtry and ntree.From the variation of RF parameters we observe that OOB errors decrease with smaller values for mtry and larger numbers of trees in a forest (ntree), Fig. 3.
The colored bands represent the 90-quantile range of OOB values from the 100 bootstrap repetitions for each RF algorithm configuration and illustrate the inherent variability of input variables in the learning data set.The color code distinguishes the number of variables used to determine splits at each node (mtry).For mtry = 2 the smallest OOB errors are achieved throughout the variations in the number of trees (ntree).This value represents the lower bound of recommended values for mtry in RF regression models (Breiman, 2001).For smaller values of mtry less variables are considered for splitting the space of predictor variables, which reduces the correlation between individual trees of the forest.Further, increasing values of ntree asymptotically approximate smaller OOB values.It appears that for the given data set OOB values are virtually stable above ntree = 7000.As the computational effort increases with larger forests it has to be balanced with improvements regarding predictive performance.Building on these results we use RF parameters mtry = 2 and ntree = 7000, which are comparable to those used by (Schröter et al., 2018).

Variable selection and predictive model learning
The numerical spatial measures (Table 2, and Appendix A1) evaluate properties of the building footprints including area, perimeter, and elongation of main building axes.Accordingly some of these variables are strongly correlated (Fig. 4).The Spearmans rank correlation matrix of the variables confirms a high degree of correlation in the dataset, as for instance between Area, Perimeter and RadGyras.In contrast, the spatial measures are only slightly correlated with wst and rloss.The presence of multi-colinearity may influence the analysis of variable importance (Gregorutti et al., 2017).The robust importance analysis uses different RF parameter settings and reports an average importance rank, which alleviates this problem.
The variable wst ranks first in the importance analysis (results not shown) which confirms common knowledge in flood loss modelling (Gerl et al., 2016;Smith, 1994).In comparison to wst, the numerical spatial measures of OSM building footprints have clearly smaller importance values with relatively small differences between them.In terms of building characteristics, both spatial measures which express the size and extension of the building (e.g.Area, Perimeter) and spatial measures which describe building compactness and shape complexity (e.g.PARatio, RadGyras, LinSegInd, BoundRatio) seem to add information to better estimate relative building loss.The following order of importance was determined for the variables: wst, PARatio, RadGyras, Area, LinSegInd, BoundRatio, Perimeter, DegrComp, FracDimInd, ShapeIndex.Predictive performance tests for models with two to ten variables (Fig. 5, Table 4) build on this order of importance.However, the outcome of the variable importance analysis does not suggest a clear selection of features to be included in a predictive flood vulnerability model.The model predictive performance based assessment of variables uses an increasing number of variables following their ranking order of variable importance in the RF modelling.The predictive performance is quantified in terms of MAE, MBE, and MSE (Equations 1, 2, 3) for 100 bootstrap repetitions.While the MAE is decreasing when additional variables are used with an overall minimum for a model using 6 variables, including more than 6 variables tends to increase MAE again (Fig. 5).However, regarding MBE these changes go in an opposite direction.We observe smallest MBE when only 2 variable are included.MBE then grows continuously for using up to 7 variables and then slightly reduces when more variables are used.The increase in precision expressed by the smaller MAE is accompanied with a reduction of accuracy reflected by an increasing MBE.This yields an almost balanced performance in terms of MSE for all models tested.
Looking into the sharpness of model predictions, the quantile range (QR90) is getting larger with an increasing number of model variables, which reflects larger uncertainty (Table 4).In terms of model reliability (HR), an increasing number of model variables achieves better performance statistics up to using 8 variables.The combination of both, QR and HR, in the interval score (IS) shows a similar pattern.On the basis of these assessments two model alternatives are selected for further analysis: Model A using 8 variables as it provides the most reliable model predictions, and Model B using 6 variables which provide the highest precision and balance between accuracy and precision.In detail Model B uses the variables wst, PARatio, RadGyras, Area, LinSegInd, and BoundRatio.Model A, in addition, uses Perimeter and DegrComp as predictors.

Model predictive performance: model benchmarking
The OSM models A and B are benchmarked with a model that uses all information available from the CATI surveys as an upper benchmark (BMu) and a model that uses only water depth as predictor as a lower benchmark (BMl).The performance statistics achieved by models A and B for the complete data set (all events and regions) are slightly inferior to BMu but clearly better than the outcomes of BMl (Fig. 6).Both models, A and B, give very similar performance statistics with slightly higher precision (smaller MAE) but larger bias (MBE) for model B. In contrast, model A provides more reliable predictions indicated by larger HR and smaller IS (Table 6).The randomized benchmark model (BMrm) achieves a better performance than BMl but is inferior to the models A and B (Fig. 6, Table 5).Hence, we are confident that the remaining uncertainty associated with the mapping of geolocations to building geometries is not affecting the outcomes of our analyses.Overall, we note that including numerical spatial measures based on OSM building footprints add useful information to predict loss to residential buildings.
The numerical spatial measures included in the models are all directly calculated using building footprints.Therefore, a larger number of variables used for loss estimation does not imply increased efforts to collect data.From this perspective the cost of using model A or B is equal.The RF algorithm strives to reduce overfitting when large numbers of predictors are included, and thus the parsimonious modelling principle can be relaxed.A possible negative effect of overfitting when using more predictors should manifest in spatial transfer applications.

Spatial transfer testing
The predictive performance of RF models is tested in regional transfer applications.For this purpose, the RF models A and B as well as the benchmark models BMu and BMl, as specified in the previous section, are learned using regional sub-sets of the data and applied to predict flood losses in a different region; see section 3.4 and Table 3 for details about the regional sub-division of data and spatial transfer experiments.Learning models with a regional sub-set of data and applying the models Instead, models A and B in some cases achieve better and in other cases worse performance statistics.Generally speaking, the predictive performance differs more strongly between the regional transfer settings than between the models (Fig. 7).This is more pronounced for precision and accuracy metrics (MAE, MBE and MSE) 3.'all' refers to using all records from all regions, c.f. Table 5.
produces sharper predictions, but still the models differ in reliability, i.e. covering the observed values within their predictive uncertainty ranges (HR).In this respect, the upper benchmark model (BMu) performs best.The differences between models A and B are small and both are better than the lower benchmark model (BMl) and almost similar to BMu for the transfer cases between the regions Elbe and Danube (E2D and D2E).
With 105 records the Danube data-set is the smallest sub-sample.It has a smaller variability and range of values for most numerical spatial measures in comparison to the Dresden and Elbe regional sets (Figure 8).The geometric properties of the flood-affected residential buildings in the Danube region seem to differ from the affected residential buildings in the Elbe region.In the Danube sub-set area and perimeter of buildings tend to be smaller than in the Elbe region.Also, the values for spatial measures representing building shape complexity as for instance RadGyras, DegrComp, and BoundRatio indicate more compact building footprints in the Danube region than in the Elbe region.These differences can be attributed to different socio-economic characteristics as well as building practices in former East and West Germany and 380 regional differences in building types (Thieken et al., 2007).With only 310 records, the Dresden sub-sample covers comparable ranges of observed variables as the Elbe sub-set (1234 records).Both sub-sets show largely similar relations between individual variables and rloss.Still, the Danube sub-set includes relatively many records with high rloss values, which are distributed along the whole spectrum of above ground-level water depths (Figure 8).In comparison, the Dresden sub-set comprises very few cases with high relative loss which is partly related to differing inundation processes.In the Elbe and Danube catchments 385 large areas have been flooded as a consequence of levee failures.Hence, the relationship of model variables to high rloss values cannot be learned from this sub-set, and thus is not represented well by the model.Therefore, this difference in the learning data may explain the positive bias introduced by learning the model in the Danube and transferring it to the Elbe, and, vice versa, the pronounced negative bias introduced by learning the model in Dresden and transferring it to the Danube region.Viewed from a model performance perspective, the transfer applications show that a good agreement between learning and transfer data-sets (e.g.d2E) produces more precise and reliable predictions than the transfer to regions with pronounced differences (e.g.d2D, D2E).Still from the Danube region with limited ranges of variable values, it is possible to obtain relatively precise and accurate predictions of relative building loss.This suggests that a broad variability of observed rloss values in the learning data set is an important control for the predictive capability of the model in other regions.In contrast, small samples with limited variability and only few records with high rloss values struggle with predicting rloss in other regions.This confirms insights that a model based on more heterogeneous data performs better when transferred in space (Wagenaar et al., 2018).Our findings also reveal that using numerical spatial measures derived from OSM building geometries does not resolve all problems of model transfer.
As not many variables of building characteristics are available from OSM data, the spatial measures calculated from building footprints serve as proxy variables for these unavailable details.These proxies achieve comparable predictive performance as specific property level data sets as for instance collected via computer aided telephone interview surveys represented by the BMu model.This model uses a broad range of variables to characterise vulnerability of residential buildings including details of building characteristics, socio-economic status of the household, and flood warning, precaution and previous flood experience (c.f.Table 1).Still, this more comprehensive information does not result in a clearly better model predictive performance in transfer applications.Additional improvements can be expected from including local expert knowledge about inundation duration, flood experience and return period of the event into the modelling process (Sairam et al., 2019).Flood event related variables including flood type appear to be important information to estimate the degree of building loss because they describe differences in the damaging processes (Vogel et al., 2018).Other data sources have been used to enrich empirical datasets for learning flood loss models.This includes for instance information about building age and floor area for living from Cadastre data (Wagenaar et al., 2017), number of storeys, building type, building structure, finishing level and conservation status from census data (Amadio et al., 2019).However, using these data did not result in a clear improvement in spatial model transfer.
Using variables derived from OSM data increases the flexibility of the models to be applied in other regions because the accessibility and availability of OSM data reduces the effort of data collection, simplifies the preparation of input variables, and ensures consistency of input data.The latter point being an important advantage because achieving consistency of input data has been stressed to cause large efforts in model transfers (Jongman et al., 2012;Molinari et al., 2020).The suggested RF models are based on an ensemble approach, and thus provide a view to the predictive uncertainty of the model outputs.
We have shown this to be a valuable detail in assessing the reliability of model predictions in spatial transfers.In cases where model performance cannot be tested with local empirical evidence, using model ensembles has been shown to provide more skillful loss estimates (Figueiredo et al., 2018).

Conclusions
The transfer of flood vulnerability models to regions other than those for which they have been developed often comes with reduced predictive performance.In this study we investigated the suitability of numerical spatial measures calculated for residential building footprints, which are accessible from OpenStreetMap, to predict flood damage.Further we tested potential benefits from using this widely available and consistent input data source for the transfer of vulnerability models across regions.We develop a new data-set based on OpenStreetMap data, which comprises of variables representing building footprint dimensions and shape complexity, and we devise novel flood vulnerability models for residential buildings.
The geometric characteristics of building footprints serve as proxy variables for building resistance to flood impacts and prove useful for flood loss estimation.These model input variables are easily extracted by an automated process applicable to every type of building polygon.Hence, the models can be applied to areas where information about the footprint geometry of residential buildings are available.Also other data sources, e.g.Cadastral data, or data derived from remote sensing, can be used besides the OpenStreetMap data source.While the variables derived from building footprints ensure consistency and support transferability of models, the models remain context specific and should only be transferred to regions with comparable building geometric features as the learning dataset.
The vulnerability models have been validated using empirical data of relative loss to residential buildings.Further, a benchmark comparison of the models has been conducted in spatial transfer applications.The models give comparable performance to alternative multi-variable models, which use comprehensive and detailed information about preparedness, socio-economic status and other aspects of building vulnerability.In comparison to a model which uses only water depth as a predictor, they reduce model prediction errors (MAE by 20% and MSE by 25%) and increase the reliability of model predictions by a factor of 1.4.
OpenStreetMap is a highly popular and evolving data source with constantly increasing completeness and up to date data.In the future, the attributes of residential buildings are expected to provide additional details which are of interest for the char- OSM is an open data project and the cartographic information can be downloaded, altered and redistributed under the Open Data Commons Open Database License (ODbL) (contributors, 2020).
In the presented study, the geographic data were processed in PostgreSQL 12.2 with PostGIS 3.0.1 extension and R version 3.6.3(2020-02-29) (R Core Team, 2020).The spatial measures were calculated in PostgreSQL and imported in to R for further processing.The Ran-

Figure 1 .
Figure 1.Regional sub division of dataset for spatial split sample testing (Dresden municipality, the Elbe catchment and the Danube catchment)

Figure 2 .
Figure 2. Fig. 2: Data pre-processing, model learning and model transfer workflow, with BMu (upper benchmark model), BMl (lower benchmark model), BMrm (Benchmark model with random match of interview locations with OSM building data), A (Random Forest model using 8 predictors), B ( Random Forest model using 8 predictors), and model transfers d2E (learning with Dresden and predictions for Elbe), d2D (learning with Dresden and predictions for Danube), E2D (learning with Elbe and predictions for Danube), D2E (learning with Danube and predictions for Elbe)

Figure 3 .
Figure 3. Out-of-Bag error for variations of mtry and ntree RF parameters.Color bands represent the variation range of OOB errors obtained from 100 bootstrap repetitions

Figure 5 .
Figure 5. Predictive performance of models using an increasing number of variables in order of their importance.Smaller MAE and MSE values and MBE values close to 0 indicate better performance, c.f. equations 1 -3.

Figure 6 .
Figure 6.Performance metrics of OSM based models and benchmark models than for sharpness and reliability indicators (QR, HR and IS).Learning from the Dresden subset and transferring the model to the Elbe region (d2E) works best as is shown by the smallest MAE and MSE as well as a MBE closest to zero.Learning the models with the Danube sub-set and transferring them to the Elbe region (D2E) yields comparably small MAE and MSE values, but this is also the only case with a tendency to overestimate rloss resulting in a positive MBE.The models are struggling most to predict loss when they are learned with the Dresden sub-set and transferred to the Danube region (d2D) showing the lowest precision and accuracy.In turn, extending the learning subset to the Elbe region improves the transfer to the Danube (E2D).Concerning predictive uncertainty and reliability, learning with the Danube sub set yields large QRs, which however only partly cover the observed loss values reflected in comparably low HRs and high IS (D2E).Learning from Dresden/Elbe and transferring to Elbe or Danube (d2E, d2D, E2D)

Figure 7 .
Figure 7. Model performance metrics in regional transfer.Models A and B based on spatial numerical measures calculated for OSM building footprints, benchmark models BMl and BMu based on CATI survey data.Transfer experiments d2E, d2D, E2D, and D2E as described in Table3.'all'refers to using all records from all regions, c.f. Table5.

Figure 8 .
Figure 8. Scatterplots of numerical spatial measures and relative loss in regional sub-samples (Danube, Dresden, Elbe) acterisation of building resistance to flooding.This includes for instance information about building type, roof type, number of floors, building material and opens further possibilities to refine the variables used for vulnerability modelling.These data could be further amended with other open data sources including socio-economic statistical data.In view of a large variability of flood loss on individual building level, vulnerability modelling for individual buildings remains challenging and is subject to large uncertainty.Advances to the understanding of damage processes and the improvement of flood vulnerability modelling, hence requires an improved and extended monitoring of flood losses.Code and data availability.Flood damage data of the 2005, 2006, 2010, 2011, and 2013 events along with instructions on how to access the data are available via the German flood damage database, HOWAS21 (http://howas21.gfzpotsdam.de/howas21/).Flood damage data of the 2002 event was partly funded by the reinsurance company Deutsche Rückversicherung (www.deutscherueck.de)and may be obtained upon request.The surveys were supported by the German Research Network Natural Disasters (German Ministry of Education and Research (BMBF), 01SFR9969/5), the MEDIS project (BMBF; 0330688) the project Hochwasser 2013 (BMBF; 13N13017), and by a joint venture between the German Research Centre for Geosciences GFZ, the University of Potsdam, and the Deutsche Ruckversicherung AG, Dusseldorf.

Table 2 .
Variables of the amended OSM data set for each building object

Table 3 .
Computational experiments for transfer applications

Table 4 .
Model performance metrics for models using increasing number of variables including arranged in order: wst, PARatio, RadGyras,Area, LinSegInd, BoundRatio, Perimeter, DegrComp, FracDimInd, ShapeIndex.Best performance values and selected models in bold.

Table 5 .
Model precision, accuracy and reliability performance metrics for OSM based and benchmark models