Urban development models typically provide simulated building areas in an aggregated form. When using such outputs to parametrize pluvial flood risk simulations in an urban setting, we need to identify ways to characterize imperviousness and flood exposure. We develop data-driven approaches for establishing this link, and we focus on the data resolutions and spatial scales that should be considered. We use regression models linking aggregated building areas to total imperviousness and models that link aggregated building areas and simulated flood areas to flood damage. The data resolutions used for training regression models are demonstrated to have a strong impact on identifiability, with too fine data resolutions preventing the identification of the link between building areas and hydrology and too coarse resolutions leading to uncertain parameter estimates. The optimal data resolution for modeling imperviousness was identified to be 400 m in our case study, while an aggregation of the data to at least 1000 m resolution is required when modeling flood damage. In addition, regression models for flood damage are more robust when considering building data with coarser resolutions of 200 m than with finer resolutions. The results suggest that aggregated building data can be used to derive realistic estimations of flood risk in screening simulations.

The development of pluvial flood risk adaptation measures in urban areas typically requires that a variety of combinations of different measures are tested

To consider these uncertainties in the design of water infrastructures, scenario assessments are performed. In these assessments, model simulations of the urban layout are linked to water systems models

Raster-based implementations for modeling urban development, such as the ones used by

When applying a linked (possibly conceptual) urban development–hydraulic simulation setup for pluvial flood risk assessment, we need to consider the effects of increasingly impervious areas, leading to increased runoff and thus larger flood hazard

For the case where urban development models provide aggregated, raster-based outputs, it is not clear how to link this output to hydrological modeling approaches and subsequent economic pluvial risk assessments. Related work has applied ad hoc definitions

Data-driven, empirical approaches would be highly attractive to parametrize this link. Our aim is to evaluate such procedures and to characterize the data resolutions and spatial scales for which robust performance can be obtained. Similarly, for damage assessment we would be highly interested in procedures that allow for upscaling of locally derived depth–damage functions, which are likely to provide better damage estimates

None of the previous work has explicitly assessed to what extent data resolutions applied in the development of scaling procedures affect the outcome of these procedures and at which spatial scale reasonable predictions can be obtained. A thorough assessment of these issues throughout the pluvial urban flood risk modeling chain is the main contribution of this paper.

We consider the city of Odense, Denmark, as a case study. Odense has approximately 200 000 inhabitants, and it is located in a typical moraine landscape close to the sea.

As base data characterizing the urban form, we were provided with building footprints in vector format by Odense Municipality (Fig.

Data on impervious area were provided in vector format. The data were obtained from remote sensing campaigns and grouped into six classes (Fig.

Figure

Terrain elevations in Odense, footprints of buildings existing in 2017 and major road network. See ©

Figure

Building footprint polygons and total building footprint areas aggregated to data resolutions of 25 and 200 m, shown for two selected areas in the case study together with flood maps simulated based on the building dataset shown in each subfigure for a return period of

We structured our study around steps illustrated in Fig.

Urban development models in general, and fast, raster-based modeling approaches in particular, do not provide detailed information on all impervious areas in a catchment. Thus, we need to estimate empirical relationships between an assumed urban development modeling output (here raster-based building footprint areas for different building types) and measured imperviousness. Fitting the regression relationship to datasets with varying resolutions provides insight into the spatial scale at which the link between urban layout and imperviousness can be identified. Generating predictions at varying resolutions provides insight into the spatial scale at which reasonable predictions can be generated.

In a hydrological model, coarse representations of imperviousness affect the runoff volume and location where runoff occurs and will thus lead to different simulations of flood hazards. We performed hydrodynamic 2D flood simulations where the hydrodynamic model was parametrized using impervious areas based on building areas with varying levels of aggregation. Comparing the resulting flood maps to a reference simulation, we can quantify how increasingly coarse representations of the urban layout affect simulated flood hazard.

Economic flood damage is an important parameter in decision-making related to flood adaptation. The standard approach for damage estimation in urban hydrology is to overlay high-resolution flood areas and building polygons. If only coarse, raster-based building data are available, flood damage can be derived by establishing a regression relationship between flood damage derived from a reference simulation and the extent of flooded building area as a measure of exposure. Inspecting the validity of this relationship provides insight into the combined impact of coarse representations of the urban layout on hazard and exposure.

Outline of the analysis steps performed in this paper. Letters A to D refer to the parts of the “Methods” section where the corresponding step is detailed. The dashed line illustrates the case where flood maps from the baseline simulation were used to derive flooded building areas as input to damage regression. Note that the second baseline 2D flood simulation where buildings were not inserted in the DEM is not shown in the flow chart. Steps B and D were performed for a set of eight selected building raster resolutions

Our aim was to predict impervious area in simulated urban developments when the assumed output of an urban development is building footprint areas for different building types. Linear regression approaches for modeling such relationships were previously documented by

Scatterplots of impervious area versus building area are included in the Supplement (Fig. S1). We have not included an intercept in Eq. (

To test the impact of spatial data resolution, we fitted regression models to datasets with 80 different resolutions

During each iteration, we computed root-mean-square error RMSE

We performed 2D flood simulations of pluvial hazards for 10 different models, considering

a model where imperviousness was determined from the original imperviousness dataset and where buildings were included in the DEM for flow calculation (baseline model);

a model where imperviousness was determined from the original imperviousness dataset and where buildings were not included in the DEM for flow calculation (baseline without buildings); and

models where imperviousness was derived considering the regression relationship shown in the Supplement (Sect. S2) and considering building data aggregated to resolutions

Our 2D modeling approach was the exact same approach as used by

As in

Impervious areas linked to major roads (Fig.

The 2D flood model was not calibrated to reflect observed flooding in the catchment. While the simulated flood maps may not coincide with reality, they provide a realistic baseline for the further analysis.

We compared the simulated flood maps to the baseline simulation where true imperviousness percentages were applied for runoff modeling and buildings were included in the DEM. In the comparison, we focused on built-up areas and excluded natural areas and water bodies.

We created contingency tables where we counted in how many pixels both the predicted flood map under scrutiny and the baseline flood map exceeded a water level of 0.1 m (hits) and how often this was the case only for the baseline model (misses) or the tested model (false alarms). Subsequently, we computed the scores hit rate HR, false alarm ratio FAR and critical success index CSI as defined in

Based on the 2D flood simulations performed for the baseline situation, we assessed flood damage. The derived damage data were subsequently used as a reference for training and validating the regression models derived in Sect.

Direct flood damage in urban areas is commonly assessed by overlaying polygons of exposed objects with high-resolution flood maps. Damage is then assigned to each object (e.g., a building) depending on the greatest adjacent water depth

We distinguished between two approaches for damage assessment, which we expected might yield different results in terms of which impacts different data resolutions may have in damage assessment. The first type is threshold-based approaches, where a unit damage is assigned to an object if the water level exceeds a defined threshold. In Denmark, such approaches are frequently applied in the context of pluvial risk assessments

We considered the framework of

Damage assessment frameworks considered in our work. WL is water level.

Flood damage was derived by overlaying the simulated flood areas with the building polygons. Damage per square meter was derived for each building, considering the damage functions shown in Table

We have also derived flood damage for the baseline simulation where buildings were not included in the DEM. The damage values were not used for regression but are shown in the results section, as they provide insight into the impact of blocked flow paths on damage assessment.

In the regression of flood damage, we considered the building footprint area

We reasoned that the regression models should reflect the characteristics of the damage function applied in the original damage assessment. We have therefore considered a model structure based on the three building classes considered in damage assessment. A square-root transformation was applied to both input and output variables to linearize the relationship (see Sect. S6):

The flooded building footprint areas for residential (

For the damage data derived based both on

Similar to the approach for impervious areas in Sect.

To distinguish to what extent coarse building data affect damage assessment by creating uncertainty in flood exposure or flood hazard, we derived flooded building areas both from the baseline flood map (considering true imperviousness and buildings included in the DEM for flood simulation) and from the flood map created in a 2D simulation with the aggregated building data which were also considered for damage regression.

To assess model performance, we performed cross validation. The city was divided into squares with an edge length

When the regression models were fitted to datasets with resolutions

To evaluate regression fit, we computed for cross-validation iteration

Median values of COD

The results section was structured into the same parts that were also highlighted in Fig.

Summary scores for return period

Summary scores for return period

Figure

When the regression models were fitted to data with resolutions below approximately 250 m, the relationship between building footprint areas and imperviousness could not be identified, because building footprint areas would then not necessarily be located in the same pixels as the associated features of the urban layout (e.g., sidewalks). Regression coefficients approached 1 for the finest data resolutions

While the median predictive performance of the regression models (COD

For our case study, we identified a data resolution

Results for regression models for impervious areas. Panels

Figure

For the 100-year event, similar total flooded areas were obtained for both models, which can be associated with the greater degree of water movement on the surface and, as a result, the filling of sinks in both models. However, the performance scores shown in Table

Total area flooded above water level threshold in baseline 2D simulation and in simulation based on building footprint areas aggregated to 200 m raster. Results are shown for return periods of 20 (left) and 100 (right) years. Maps below the plots illustrate simulated water depths in the different cases with background showing building footprint polygons (baseline) and total building footprint area per

For both return periods, the score values in Tables

A minor effect was noticeable in particular in the total simulated flood areas. Coarse building area resolutions implied that impervious areas would be distributed increasingly evenly over the catchment, leading to the distribution of effective precipitation over larger areas; surface flows with small water levels; and, as a result, fewer areas where water levels would exceed the threshold of 0.1 m. On the other hand, total impervious areas would be underestimated by the regression model for fine building datasets as a result of the regression specification without intercept. In fact, total impervious areas would be underestimated by 10 % with the 25 m building raster set, while the bias would exponentially decrease to under 1 % at a resolution

It needs to be emphasized that the effects discussed above were very minor compared to the impact of whether buildings were considered in the DEM applied for 2D simulation or not. The missing impact of increasingly coarse representations of imperviousness is likely to be linked to the fact that sewer systems were considered by reducing effective rainfall in a manner which was proportional to the imperviousness in a pixel (Eq.

Figure

The figure also illustrates differences in the results obtained for the two damage frameworks. Considering an aggregation level of 400 m, we noticed individual pixels where damage derived using depth–damage curves

Scatterplots of flood damage estimated based on 2D flood simulations with (baseline) and without buildings included in the DEM. Results are shown for return periods of 20 (top row) and 100 (bottom row) years, for both damage assessment frameworks and for spatial aggregation levels of 400 and 2000 m. Damage was assessed by overlaying building polygons and the corresponding flood areas.

Performance scores for damage regression models fitted based on building data with varying aggregation levels were summarized in Tables

The damage regression generally scored high values for COD

Both of the above statements were not true for the cases where damage was derived based on the framework of

Figure

Similar to the results obtained for impervious areas, a minimal data resolution

COD

For the framework of

Figure

Flood damage predicted by DMOD1 on an aggregation level of 500, considering the baseline dataset and regression predictions generated with building data aggregated to resolutions of 25, 200 and 750 m. Damage was computed using the framework documented by

Finally, comparing values of COD

The results suggest that the consideration of aggregated building data affected both the simulation of flood hazards, and the assessment of flood damage. In terms of the simulated flood hazards the main effect arose from not considering the blockage of surface flow paths in the 2D flood simulations when considering aggregated building data. Coarse representations of imperviousness and the resulting change in rainfall–runoff behavior had little effect in comparison.

Despite the aggregation of building data, we were able to achieve realistic representations of flood exposure, which were illustrated by the high COD

The damage regression yielded total damage estimates that, for a building data resolution

The damage assessment approach based on depth–damage curves

It is questionable whether this damage assessment approach is reasonable for pluvial flood risk, because it relies on modeled water depths which in reality would be unlikely to occur in this form, because the water would likely enter the building and distribute without causing major structural damage. Damage assessment approaches which are less sensitive to water depths may thus be preferable for pluvial flood risk assessment.

The issue could be mitigated by explicitly considering water flow through buildings in the surface flow model, which, however, poses technical challenges. Alternatively, robust regression approaches are likely to yield better results when performing damage regression in the presence of such issues.

Very clear dependencies on spatial scale could be identified when developing regression models that predicted impervious areas as a function of building footprint areas. The optimal data resolution

In a similar manner, the performance of regression models for flood damage only reached acceptable levels when data resolutions

We performed 2D surface flow simulations based on publicly available DEM data where buildings and plants were removed in an automated manner. Our results suggest that the simulated flood maps were very strongly affected by whether the blockage of flow paths through buildings was considered in the DEM or not. Remnants originating from the DEM cleaning process may affect this result and could be an explanation for the rather low performance scores of the simulations where buildings were not included in the DEM. For example, slight misalignments between building polygons and building locations in the DEM may result in artificial sinks in the baseline simulation which would not be possible to reproduce in simulations without buildings.

Our 2D flood modeling approach was a simplified representation of the urban water cycle. This approach was justified as our intention was to evaluate which spatial scales should be considered in the development of flood screening approaches. For detailed assessment of the risk we would recommend 1D–2D calculation methods to more accurately represent where flooding occurs in the catchment.

The regression parameters for imperviousness are likely to depend on topography and urban layout (e.g., degree of urban creep and density of the urban developments). In addition, the optimal data resolution for identifying regression relationships is likely to depend on the urban layout, with coarser data resolutions being optimal in less densely developed cities. This implies that regression models can be transferred between cities with similar urban layout and topography, but in many cases it will be necessary to identify optimal spatial scales and model parameters for the specific case study.

For flood damage regression, optimal spatial scales and the identified regression models additionally depend on the approach which is used for calculating flood damage. Further, the level of damage incurred by a given extent of flooded area must be expected to depend on the location of sinks and flow paths in the specific case study and the degree to which urban planning was performed in a flood-aware manner

Based on the considerations above, we suggest the following work flow for developing a fast flood risk screening setup in a new case study:

Obtain vector-based building data and highly resolved imperviousness data from aerial imagery as base data characterizing the urban layout.

Perform hydrodynamic flood simulations (e.g., 1D–2D) for the case study to derive a baseline flood map and compute flood damage.

Train regression models for impervious area and identify a suitable data resolution

Use the predicted impervious area as input to fast flood simulation tools (e.g.,

Use the flood map and rasterized building data to train damage regression models. Identify suitable resolutions for training data (

Apply setup – simulate urban development in raster format; predict impervious area based on the simulated building areas; use predicted imperviousness for rainfall–runoff calculation in fast flood simulation tool; and compute flood damage based on the generated flood map, simulated building areas and damage regression model.

We studied how different data resolutions affect the identification of empirical relationships between building data and urban hydrology and at which spatial scales reasonable predictions could be obtained. Based on our results, we draw the following conclusions:

The identification of empirical relations between urban layout and urban hydrology is subject to a bias–variance tradeoff. Too fine spatial data resolutions prevent the identification of empirical relationships and lead to biased results, while too coarse resolutions reduce the number of data points and blur out spatial variations, leading to uncertainty in the estimated relationships. The optimal data resolutions are expected to vary for different topographies and urban layouts and must thus be evaluated in the specific case study.

Simulated pluvial flood hazards are strongly affected by whether surface flow simulations consider the blockage of flow paths through buildings and less by spatially averaged representations of imperviousness during rainfall–runoff calculations.

Water levels are underestimated if local ponding near buildings is not considered in the surface flow simulations, as would be the case when considering aggregated building data as input. Without correction, this effect also leads to an underestimation of flood damage.

A simple regression model predicting flood damage in an area as a function of the extent of flooded building area can, to some extent, compensate for deficiencies in the simulated flood area. Building data aggregated to resolutions on the order of 200 m were the preferred input in our case study and performed more robustly than building data with finer resolutions, because they reduced local extrema in flooded building areas.

Regression models for flood damage must be expected to depend on whether flood-aware spatial planning was applied in the case study used for model training or not. Different models must thus be trained to consider different land-use management strategies.

Local ponding next to large buildings can create rather large water levels in simulations of pluvial flood risk that may be unrealistic. Damage assessment frameworks where damage increases as a function of water levels are vulnerable to this type of error which is specific to pluvial flood risk.

Computer code for fitting regression models for imperviousness and flood damage was made available by

The supplement related to this article is available online at:

RL performed the analysis and led the preparation of the manuscript. KAN supported the scoping of the study and provided feedback on various iterations of the results and the manuscript.

The authors declare that they have no conflict of interest.

We wish to thank Odense Municipality and VandCenter Syd (VCS Denmark) for the provision of data used in this study. In particular, we wish to thank Agnethe N. Pedersen and Nena Kroghsbo for their support, feedback and discussions. We thank Behzad Jamali for commenting and error corrections and two anonymous reviewers and the editor for a constructive and thorough review process.

This project was funded by Innovation Fund Denmark through the Water Smart Cities project (grant no. 5157-00009B).

This paper was edited by Thomas Glade and reviewed by two anonymous referees.