Comparison of estimates of global flood models for flood hazard and exposed gross domestic product A China case study

. Over the past decade global ﬂood hazard models have been developed and continuously improved. There is now a signiﬁcant demand for testing global hazard maps generated by these models in order to understand their applicability for international risk reduction strategies and for reinsurance portfolio risk assessments using catastrophe models. We expand on existing methods for comparing global hazard maps and analyse eight global ﬂood models (GFMs) that represent the current state of the global ﬂood modelling community. We apply our comparison to China as a case study and, for the ﬁrst time, include industry models, pluvial ﬂooding, and ﬂood protection standards in the analysis. In doing so, we provide new insights into how these components change the results of this comparison. We ﬁnd substantial variability, up to a factor of 4, between the ﬂood hazard maps in the modelled inundated area and exposed gross domestic product (GDP) across multiple return periods (ranging from 5 to 1500 years) and in expected annual exposed GDP. The inclusion of industry models, which currently model ﬂooding at a higher spatial resolution and which additionally include pluvial ﬂooding, strongly improves the comparison and provides important new benchmarks. We ﬁnd that the addition of pluvial ﬂooding can increase the expected annual exposed GDP by as much as 1.3 percentage points. Our ﬁndings strongly highlight the importance of ﬂood defences for a realistic risk assessment in countries like China that are characterized by high concentrations of exposure. Even an incomplete (1.74 % of the area of China) but locally detailed layer of structural defences in high-exposure areas reduces the expected annual exposed GDP to ﬂuvial and pluvial ﬂooding from 4.1 % to 2.8 %.

Abstract. Over the past decade global flood hazard models have been developed and continuously improved. There is now a significant demand for testing global hazard maps generated by these models in order to understand their applicability for international risk reduction strategies and for reinsurance portfolio risk assessments using catastrophe models. We expand on existing methods for comparing global hazard maps and analyse eight global flood models (GFMs) that represent the current state of the global flood modelling community. We apply our comparison to China as a case study and, for the first time, include industry models, pluvial flooding, and flood protection standards in the analysis. In doing so, we provide new insights into how these components change the results of this comparison. We find substantial variability, up to a factor of 4, between the flood hazard maps in the modelled inundated area and exposed gross domestic product (GDP) across multiple return periods (ranging from 5 to 1500 years) and in expected annual exposed GDP. The inclusion of industry models, which currently model flooding at a higher spatial resolution and which additionally include pluvial flooding, strongly improves the comparison and provides important new benchmarks. We find that the addition of pluvial flooding can increase the expected annual exposed GDP by as much as 1.3 percentage points. Our findings strongly highlight the importance of flood defences for a realistic risk assessment in countries like China that are characterized by high concentrations of exposure. Even an incomplete (1.74 % of the area of China) but locally detailed layer of structural defences in high-exposure areas reduces the expected annual exposed GDP to fluvial and pluvial flooding from 4.1 % to 2.8 %.

Introduction
Floods are one of the most frequent and most devastating kinds of natural disasters. Between 1980 and 2016, floods caused 23 % of overall economic losses and 14 % of fatalities due to natural hazards worldwide (Löw, 2018). In 2016, economic losses from flooding amounted to USD 56 billion globally. Understanding the risk of natural hazards, including flood risk, has therefore been identified as a priority in recent international risk reduction frameworks, such as the Sendai Framework for Disaster Risk Reduction (UNISDR, 2015).
In recent years, significant scientific efforts have been carried out to develop global flood risk models (GFMs) (Teng et al., 2017). In terms of river flooding, these have examined current flood risk at the global scale (e.g. Winsemius et al., 2013) as well as future flood risk due to changes in: hazard, as a result of climate change (Alfieri et al., 2017;Dottori et al., 2018;Arnell and Gosling 2016;Hirabayashi et al., 2013;Kundzewicz et al., 2014;Ward et al., 2017;Winsemius et al., 2015); exposure, due to increasing population, wealth, and urbanization (Hallegatte et al., 2013;de Moel et al., 2015); and vulnerability . To date, attention has especially been paid to developing global flood hazard maps. These maps indicate the severity of the hazard for different exceedance probabilities across the globe.
Published by Copernicus Publications on behalf of the European Geosciences Union.
The hazard severity is generally expressed in terms of flood extent and flood depth, on a raster grid with resolutions ranging from 1 to 32 arcsec. The GFMs that are used to create these flood hazard maps are simplified global-scale models of surface water flows that are driven by regional or global climate models or rely on gauged-discharge or (gauged-) precipitation datasets . The development of these models has been facilitated by advances in satellite data, numerical algorithms, computing power, and coupled modelling frameworks . The key advantage of GFMs compared to regional or national flood models is their global scale, which means that flood hazard maps are now available in data-poor areas that previously lacked hazard maps (Hagen and Lu, 2011).
Despite these recent advances, several major challenges still exist. For example, Ward et al. (2015) discuss the quality of elevation data, accuracy of boundary conditions used to force inundation models, and knowledge of river morphology, among other things. Bernhofen et al. (2018) also discuss the importance of forcing boundary conditions, especially input flow, as well as the influence of morphological features, such as floodplain size and the steepness of the terrain. Another major challenge for GFMs is to account for the impact that structural flood defences have on flood hazard, especially in regions with high protection standards.
Due to the aforementioned challenges and the growing number of GFMs, there is now a significant demand for comparing the outputs of different models and assessing their accuracy. This helps in understanding the applicability of GFMs for developing international risk reduction strategies and for their use in reinsurance and insurance portfolio risk assessments. Several such studies have been carried out by comparing or investigating a certain model component (e.g. global hydrological model, river routing model, and model resolution) in the GFM framework. For example, Schellekens et al. (2017) conducted an inter-model agreement assessment of 10 global hydrological models (GHMs) based on the signal-to-noise ratio in monthly mean anomalies of evapotranspiration, runoff, root zone soil moisture, and precipitation. The agreement of the GHMs was found to be low in snow-dominated regions and tropical rainforest or monsoon areas and high in temperate areas. A study by Zhao et al. (2017) assessed the ability of GHMs with native routing schemes to capture the timing and amplitude of river discharge. The results were compared to the use of a dedicated global river routing model, CaMa-Flood. Generally the use of CaMa-Flood improved the accuracy of simulating peak river discharge. Mateo et al. (2017) investigated the applicability of a GFM at higher spatial resolutions by validating it against a large past flood event in Thailand. They found that validation results improved with higher spatial resolution if multiple downstream connectivity is represented in the river routing model.
Rather than testing and investigating a certain model component of GFMs, Trigg et al. (2016) compared flood haz-ard maps from six different GFMs for the African continent. The study compared the inundated area across hazard maps for multiple return periods and assessed how this translates into differences in exposed gross domestic product (GDP) and exposed population. They found large differences; for example over the continent of Africa there is around 60 % to 70 % of disagreement between the GFMs in terms of the inundated area. These differences are mainly present in deltas, arid climate zones, and wetlands. The study concludes that in order to increase the quality of GFMs there is a demand for more inter-comparison studies and stresses the importance of the inclusion of industry models. In reply, Bernhofen et al. (2018) validated the same six GFMs in Africa. The best individual models performed at an acceptable level compared to observations. Further findings were that models forced by river gauged-flow data outperform models forced by climate reanalysis data. Contrary to previous studies, no relationship was found between performance and model spatial resolution. In a follow-up study, Hoch and Trigg (2019) proposed a validation framework for global flood models. The aim of this framework is to understand the drivers of deviations between GFMs by providing standard forcing data, validating and benchmarking model results, and sorting and indexing reference output. This framework is in line with the currently developed eWaterCycle II platform, which provides the above-mentioned principles for the global hydrological modelling community (https://www.ewatercycle.org/, last access: 2 January 2020; Hut et al., 2018).
In this study, we expand upon the existing work of intercomparison studies for global flood hazard maps. The main aim is to carry out a comprehensive comparison of flood hazard maps from eight GFMs for the country of China and assess how differences in the simulated flood extent between the models lead to differences in simulated exposed GDP and expected annual exposed GDP. The purpose of the main aim is (a) to assess the relative differences in the hazard output of a wide variety of global flood models, (b) to understand and explain these differences from the differences in the models themselves (data, methods, modelling, and output resolution), and (c) to provide a simple analysis on the impact of these differences to flood risk. This is carried out by addressing the variation in different model structures and the variability between flood hazard maps. Contrary to previous studies, we do examine the effect of flood protection standards on flood hazard and include pluvial flooding. We further investigate the current differences between flood hazard maps of GFMs, as opposed to a validation study, as the addition of the flood protection and pluvial components provide valuable new insights in their effects on the variability in results. Our comparison uses both publicly available academic GFMs (GLOFRIS, ECMWF, CAMA-UT, JRC, and CIMA-UNEP) as well as industry models (Fathom, KatRisk, and JBA) that are applied within the wider reinsurance industry. To our knowledge, it is the first comparison study to include industry models, the pluvial-flood component, and the role of flood protection on the flood hazard and exposure.
China is selected as our case study area because it poses many challenges to flood modelling: data scarcity; a variety of flood mechanisms spanning many climatic zones; complex topography; strong anthropogenic influence on the flood regimes, for example through river training; and a very high concentration of exposure. Moreover, China is prone to severe flood events. For example, in June 2016 alone more than 60 million people were affected by floods, resulting in an estimated damage of USD 22 billion (CRED, 2016). The combination of data scarcity, modelling challenges, and flood impacts that occur in China fit the key advantage of GFMs well, i.e. providing hazard maps in data-poor regions. In addition, the shear spatial scale and challenges of modelling China (including complex topography and climate variability) provide a unique test bed for assessing the differences between the flood hazard maps.
This paper is set up as follows. In Sect. 2, we describe the data and models used in this study. In Sect. 3, we describe the (statistical) methods applied to compare the data from the various models. In Sect. 4, we present and discuss the results, examining differences in flood hazard, exposed GDP, and expected annual exposed GDP between GFMs; the influence of incorporating flood protection; and model agreement. Conclusions and an outlook are provided in Sect. 5. In Sect. S1 of the Supplement, we provide a detailed overview of the models and data used.

Description of flood hazard maps and models
We compare flood hazard maps for different return periods from eight different GFMs, namely CaMa-UT (Yamazaki et al., 2011(Yamazaki et al., , 2014a, GLOFRIS Winsemius et al., 2013), JRC , ECMWF (Balsamo et al., 2015), Fathom (Sampson et al., 2015), CIMA-UNEP , KatRisk (contact Ka-tRisk for a technical report), and JBA (contact JBA for a technical report). An overview of the technical specifications of the flood hazard maps is provided in Table 1. The outputs of the native flood hazard map of each GFM were acquired between November 2017 and May 2018. Data were downloaded or requested in their original published format (at the time of the study), and no bespoke or postprocessed maps were requested. The acquired flood hazard maps do not include structural flood defences, the socalled undefended flood hazard maps. The exception is the CIMA-UNEP model, which has readily built-in flood protection (Sect. 2.2); these hazard maps are considered to be undefended in this study. Noteworthy is that the Fathom and JBA models do provide separate defended hazard maps (Sect. 2.2). The hazard maps are either fluvial floods only or fluvial with pluvial floods combined (Fathom, KatRisk, and JBA), the so-called combined flood hazard maps. The hazard maps cover return periods (RPs) ranging from 5 to 1500 years, and the output resolutions of the native flood hazard map range from 1 to 32 arcsec.

Model structures
From the eight GFMs, we identified two groups based on the model structure described in Trigg et al. (2016): the cascade model structure (CaMa-UT, GLOFRIS, JRC, ECMWF, and KatRisk) and the gauged-flow model structure (Fathom, CIMA-UNEP, and JBA). An overview of the modelling chain of both model structures is shown in Fig. 1 and further explained in Sect. 2.2.1 and 2.2.2. A concise description of the cascade model structure is provided by  and by  for the gauged-flow model structure.
The general model input data used by the GFMs (i.e. river network datasets and digital representations of the earth's surface like digital elevation models (DEMs), digital terrain models (DTMs), or digital surface models (DSMs)) vary in type, resolution, and corrections applied. CaMa-UT, GLOFRIS, JRC, ECMWF, CIMA-UNEP, Fathom, and Ka-tRisk use the HydroSHEDS river network (Lehner and Grill, 2013) and SRTM3 DEM (Farr et al., 2007) at either 3 or 30 arcsec. Urban and vegetation bias corrections are applied before use. Additionally, KatRisk applies an algorithmic filtering to clean the DEM and uses manual correction to re-  Table 2 shows the model structures, climate forcing datasets, GHM (when applicable), name and type of river routing models, considered catchment size, type of digital elevation model, downscaled model resolution, and native output resolution of the flood hazard maps.

Cascade model structure
The defining characteristics of the cascade model structure are the use of climate forcing input datasets for the GHMs. River routing models then calculate the continuous river flow along river networks, calculating river and floodplain inundation dynamics. This is followed by flood frequency analysis (FFA), which determines flood depth and extent for a given RP or the flood volume in the case that downscaling is required.
Following the numeration of Fig. 1, the cascade modelling chain starts with the following.
1. Climate forcing datasets provide precipitation, temperature, and in some cases potential evapotranspiration time series as input for GHMs. The datasets (JRA-25, EU-Watch, ERA-Interim, and EC-Earth) vary in their modelled time period, time step, resolution, and atmospheric processes. The modelled time periods range from 1979 up to present day, with all periods spanning more than 30 years to avoid bias by inter-decadal variability. The time step of the climate forcing datasets is 6-hourly, and the horizontal resolutions range between 80 km to 1.125 • . The KatRisk model uses gridded daily precipitation observations from the US National Weather Service's Climate Prediction Center (CPC) to establish rainfall-runoff relationships in combination with the ERA-Interim dataset that provides other atmospheric variables used to estimate evapotranspiration (like wind speed, radiation, and temperature).
3. A wide range of methods is used to model inundation dynamics. The complexities range from 2D flood volume redistribution (GLOFRIS) and complex 2D subgrid topography models (CaMa-UT and ECMWF) to 2D hydrodynamic models (JRC and KatRisk). Main differences between the river routing models are the resolution and the formulation of the shallow-water equations. The resolutions range from 3 arcsec (KatRisk), 0.1 • (JRC), and 0.25 • (CaMa-UT and ECMWF) to 0.5 • (GLOFRIS). The shallow-water equations used for calculating the river routing are either local inertia (CaMa-UT and ECMWF), kinematic wave (GLOFRIS and JRC), or a unit hydrograph approach (KatRisk) where upstream and lateral inflow are treated as instantaneous inputs to a linear time-invariant model using the advection-diffusion equation as a response function.
4. The output of the global river routing model is used to estimate a time series of flood volume (GLOFRIS) or flood depth (CaMa-UT, JRC, ECMWF, and KatRisk). Applying flood frequency analysis (FFA), annual maxima of local runoff and/or river discharge are extrapolated to RPs beyond the observational space using

Gauged-flow model structure
Following the numeration of Fig. 1, models belonging to the gauged-flow model structure use gauged-discharge or gauged-precipitation datasets as input. The modelling approaches differ between those using regionalization techniques that depend on upstream catchment characteristics (Fathom), models that need to be complemented by hydrologic simulations (CIMA-UNEP), and those that use empirical rainfall-runoff methods (JBA). Based on the output of these methods, the flood flow magnitude is calculated through flood frequency analysis for given RPs that force river routing models. The river routing models produce flood extents and flood depths for given RPs. The gauged-flow models in this study do not require downscaling.
1. For the water volume input, the CIMA-UNEP and Fathom models use the Global Runoff Data Centre (GRDC; Germany) river discharge dataset as their main input of discharge observations. This dataset consists of more than 9500 stations that collect their data at daily and monthly intervals. Of these 9500 stations, only 39 are located in China. The Fathom model is complemented with the United States Geological Survey (USGS) stream gauge dataset. The JBA method uses the Climate Research Unit (CRU) TS (Time-Series) 3.2 (> 4000 weather stations) (Harris et al., 2014) and Climate Forecast System Reanalysis (CFSR) v2 precipitation dataset (Saha et al., 2010), which respectively cover the period 1901 to 2011 and 1979 to 2009 with a monthly and daily temporal resolution. The CFSR data are calibrated using 25 rain gauges in China. For China, 170 river gauges are used to enable the modelling of empirical rainfall-runoff relationships to calculate river discharge.
generalized extreme-value distribution and are combined with the index flood to generate return period design flood hydrographs along the river network Smith et al., 2015).
The CIMA-UNEP model is complemented with hydrologic simulations using the EC-Earth climate forcing dataset and the continuum model to ensure that results are correct in data-scarce catchments. The JBA model does not require regression techniques as their precipitation datasets have global coverage.
3. The flood hydrographs are then used to force river routing models that propagate the flow across digital elevation models, calculating flood depth and extent without the need for downscaling. As with the cascade models, the river routing models of the gaugedflow models vary in methods and complexity. JBA uses the RFlow model for all of the large river networks in China, except for the downstream end of the Pearl River (Guangzhou area) and the downstream end of the Yangtze River (Shanghai area), which are modelled with JFlow in a fluvial configuration. Small rivers (catchments < 500 km 2 ) as well as surface water flooding are modelled using JFlow in a direct-rainfall configuration. The resolutions of the river routing models vary between 1 arcsec (RFlow and JFlow), 3 arcsec (CIMA-UNEP), and 30 arcsec (Fathom). The shallowwater equations used for calculating the river routing are inertia (Fathom), Manning equations (CIMA-UNEP), the combination of the normal depth and Manning equations (JBA-RFlow model), and the full shallow-water equations (JBA-JFlow model).

Pluvial-flood modelling
In addition to fluvial floods, the JBA, Fathom, and KatRisk models also simulate pluvial floods. Fathom uses a "rain-ongrid" method for rivers and catchments smaller than 50 km 2 , where flow is generated by raining directly on the DEM at 3 arcsec in order to calculate runoff. This method uses intensity-duration-frequency (IDF) relationships to estimate the duration, intensity, and frequency of extreme rainfall before applying the same regression techniques for extrapolation as with the fluvial component. The JBA method follows a similar approach by calculating IDF relationships at the centroid of each CFSR tile (0.5 • × 0.5 • ). Kriging is used to interpolate between the tile centroids to create a continuous rainfall surface for each RP and storm duration (three storm durations are included; 1, 3, and 24 h). The JFlow routing model is run in this direct-rainfall approach to model the small rivers (< 500 km 2 ) and surface waters. The Ka-tRisk model uses daily precipitation from the Climate Prediction Centre dataset (Boulder, Colorado, USA) to simulate rainfall over catchments smaller than 500 km 2 . The precipitation dataset combines all available historical data sources for daily and sub-daily global coverage from 1979 to realtime measurements, which are longer for monthly data. The data are checked for errors and to ensure spatial and temporal consistency. A 2D storage cell (diffusive-wave) model is used to calculate pluvial-flood patterns. The runoff is distributed uniformly across a catchment and routed according to topography at 3 arcsec. The flow (surface runoff fraction) is calibrated using river gauged-discharge data.

Defended hazard maps and external flood protection layers
Of all global flood models considered in this study, three include options for considering the impact of structural flood defences on the hazard maps. The CIMA-UNEP hazard maps are the only maps that contain a level of built-in flood protection, which cannot be removed. They incorporate flood protection standards by creating a defence ellipsoid around large cities, with the size being dependent on the GDP. All flooding within this ellipsoid is removed in post-processing, and the defences are assumed to fail above a standard of protection of RP200. Hence, this also means that for the CIMA-UNEP model the undefended baseline hazard maps are not available for this study.
Alongside the undefended hazard maps, Fathom also provided flood hazard maps with integrated flood protection. JBA further provided a dataset of defences (largely for dense urban areas) that can be superimposed on the flood hazard maps to create a defended set of flood maps per return period.
To allow for comparison between the individual GFMs, we decided to include defences only in a post-processing step using non-built-in layers of defences, meaning that Fathom's defended maps were not used in this study. Section 3.5 describes the post-processing step in more detail.
The two flood protection layers used in this study are (1) a county-level defence layer and (2) a city-level defence layer. The first layer was created by Du (2018) and describes standards of protection (SoPs) on an administrative county level covering the whole of China. It can be considered as a kind of policy layer, as it makes assumptions about the degree of protection based on goods to be protected. This layer was developed by dividing counties into urban or rural areas. The urban-area SoPs are based on GDP and population datasets from the Chinese government. The GDP dataset is converted into a weighted population dataset and is then combined with the population dataset to calculate the maximum urban protection for a given county. The rural-area SoP is based on the assumption that farmland is a key indicator for flood protection due to its importance for providing food security for the large population of China. The area of farmland is derived from a governmental land use map and is combined with the population dataset to calculate the maximum SoP for each county. The urban and rural areas within the counties are then combined to create a nationwide layer of flood Nat. Hazards Earth Syst. Sci., 20, 3245-3260, 2020 https://doi.org/10.5194/nhess-20-3245-2020 protection standards. The SoPs of the layer range from 10 in rural counties (western China) to 200 in urban counties (eastern China). The second layer is the high-resolution JBA flood protection layer for defended areas and is from hereon in referred to as the city-level defence layer. The layer is a national layer that contains SoP polygons with a focus on urban areas. The defended areas are determined using a variety of the best available third-party sources. Some of the defended areas were excluded by JBA, as it is likely that flooding might occur from surrounding undefended areas. The SoP attributed to each defended area is determined from the local available data source. Where it was not known, the defended area was attributed to the SoP of either the neighbouring defence data or the regional average. In total, the layer covers only 1.74 % of the area of China.

Methodology
We assess the agreement between the flood hazard maps of the eight GFMs by calculating the inundated area for the whole of China and by applying a model agreement index that calculates the agreement on inundation per grid cell. We include a GDP layer to study how the inundated area relates to exposed GDP and the amount of expected annual exposed GDP and how model agreement relates to agreement on the amount of exposed GDP. By including flood protection standards we can assess the effects of these layers on the previously mentioned types of analyses, adding to the knowledge of the importance of including such layers in further studies. In addition, we ensure a fair and accurate comparison of the flood hazard through the use of a data homogenization scheme.

Data homogenization
We acquired the undefended flood hazard maps of the global flood models (GFM) in their native output format. The difference in resolutions and output formats requires an initial homogenization of the data. Firstly, the hazard maps were masked to the case study area extent. The extent includes continental China, excluding Hong Kong SAR, Macau SAR, and Taiwan. Thirdly, we disaggregated the hazard maps to a 3 arcsec resolution. The chosen resolution is a balance between minimizing the loss of data quality while maintaining manageable file sizes and processing time. The disaggregation was conducted with the nearest-neighbour resampling technique, meaning that a single 30 arcsec grid cell is resampled to 10 3 arcsec grid cells with the same value. The Fathom and KatRisk model outputs did not require resampling, as their hazard maps are native at 3 arcsec. The JBA flood hazard maps were aggregated to 3 arcsec from their native 1 arcsec hazard map resolution. Fourthly, the hazard maps were converted from representing flood depth, when available, to flood extent by changing all grid cell values larger than 0 to 1. This decision was made due to the lack of flood depth availability in all flood hazard maps. Lastly, "permanent" waterbodies were removed from the flood hazard maps. The GFMs disagree on the inundation of lakes and rivers. To avoid a large positive bias in the hit rate, we removed these "neutral waterbodies" from the hazard maps using an independent dataset. The global surface water 1984-2015 dataset from the Joint Research Centre (Pekel et al., 2016) was modified to represent neutral waterbodies as areas that are inundated 80 % of the time or more during the 1984 to 2015 period. This percentage of occurrence ensures that permanent lakes and rivers are removed, whilst minimizing the removal of floodplain inundation.

Inundation percentages
We compared the amount of the inundated area between the different flood hazard maps with and without flood protection standards. To accurately calculate the inundated area in km 2 we implemented the Haversine method (Brummelen, 2013). Using this method we created a grid containing the accurate size in km 2 of each grid cell. Next, we divided the inundated area of the flood hazard maps by the total land area of China to express the results as an percentage of the inundated area of the total land area of China.
3.3 Exposed GDP and expected annual exposed GDP The exposed GDP was calculated by overlaying the flood hazard maps with a gridded GDP layer created by Kummu et al. (2018). This layer has a native resolution of 30 arcsec and represents the year 2015. We first adjusted the resolution of the GDP layer to 3 arcsec using the bilinear resampling technique. Next, we multiplied the homogenized flood extent hazard maps with the GDP layer to obtain the exposed-GDP value for each inundated grid cell. The results were then divided by the total GDP of China to express the exposed GDP as a percentage of the total GDP of China. In addition, we calculated the expected annual exposed GDP (EAE-GDP) following the method of Apel et al. (2016). The EAE-GDP is the result of the flood event probability of exceedance (P ) and its exposure (E).
R is the EAE-GDP. P is the change in annual probability of exceedance where P = 1/T , and T is the return period (RP) (Triet et al., 2018). E is the exposed GDP; i is the numerator of T under consideration (with i = 1 representing RP5 in this study); and n is the number of considered RPs. The RPs that Table 3. M (MAI) calculation based on an example grid with a river indicated in bold with a value of 0.

Model agreement index
The model agreement index (MAI) was introduced by Trigg et al. (2016) as a measure for expressing model agreement on a grid cell level. We calculated the MAI for the RPs 20-25, 50, 100, and 500 because these are available for all eight GFMs. A distinction is made between the fluvial and combined hazard maps. Before MAI calculation, the binary hazard maps (data homogenization processes) were aggregated (stacked), resulting in grid cell values ranging from 0 to 7 for the fluvial hazard maps and grid cell values ranging from 0 to 3 for the combined hazard maps. KatRisk's maps produce combined fluvial and pluvial flood hazard and are therefore not included in the fluvial MAI calculation.
where M is the model agreement index (MAI), N is the number of models under consideration, i the number of models in agreement, a i is the inundated area for the number (i) of models in agreement, and A is the total inundated area of all models under consideration. The MAI formula in Eq.
(2) has an output value between 0 (no agreement) and 1 (perfect agreement). The formula only takes into account inundated grid cells in order to avoid misrepresentation of model agreement. The large number of non-inundated grid cells would create bias due to a high hit rate. An example of a model agreement grid with MAI calculation is provided in Table 3.

Defended hazard maps
We assess the influence of flood protection on the inundated area, exposed GDP, EAE-GDP, and MAI using two different types of defences to reflect two typically used strategies for modelling structural defences: (a) a county-level and largely policy-based defence layer and (b) a national-level defence layer with a focus on urban areas on a city scale that delineates defences only in areas of the highest exposure (described in Sect. 2.2). The undefended hazard maps of all models considered in this study were used. For the special case of the CIMA-UNEP flood hazard maps, which include a built-in defence layer, we still superimpose the defence layers. The defended flood hazard maps are created by masking areas that are protected for a given standard of protection (SoP). For example, a grid cell that is inundated at RP100 and has a protection level of SoP100 is considered to be not inundated and is therefore masked in the flood hazard map.
4 Results and discussion 4.1 Spatial distribution of floods Figure 2 shows the RP100 flood extent for both fluvial (Fig. 2a) and combined fluvial and pluvial flooding (Fig. 2b) across China. Noticeable are the large inundated areas in the Xinjiang province of northwestern China and the northeastern provinces of Heilongjiang, Jilin, and Liaoning, as well as the large deltas located in the east. The latter consists of the large cities of Beijing and Shanghai (among others) and is therefore a region of high exposure.

Inundated area and flood protection
The comparison of the inundated area (expressed as a percentage of the total land area of China) between different models is shown in Fig. 3a-c. The figures show both the fluvial hazard maps and the combined hazard maps (fluvial and pluvial floods), with RPs ranging from 5 to 1500. Results are shown for the undefended layers (Fig. 3a) and the defended layers ( Fig. 3b and c).
Focusing first on the undefended fluvial hazard maps in Fig. 3a (solid lines), the predicted spread in percentage of the inundated area ranges between 4.3 % and 9.8 % for RP20 and 5.8 % and 14.2 % for RP500. The CaMa-UT, GLOFRIS, and JRC models show very similar results across RPs and generally low amounts in percentage of the inundated area compared to the other GFMs. The ECMWF, Fathom, and CIMA-UNEP models show similar results across RPs and moderate amounts in percentage of the inundated area. JBA's maps produce the highest percentage of the inundated area across all RPs.
The differences and similarities in results cannot be explained by differences in model structure alone. The GFMs with the closest resemblance in model structure and model components (Table 2) are the CaMa-UT and ECMWF models, and the results differ up to a factor of 2. These models use different climate forcing datasets (JRA reanalysis and ERA-Interim) and GHMs (MATSIRO-GW and HTESSEL); the rest of the model structure is similar. From the resemblance in model structures of the CaMa-UT and ECMWF models it can be inferred that the difference in global climate forcing and GHM have large effects on the percentage of the inundated area.
The difference in the inundated area between low and high RPs is small for the majority of models (Fig. 3a), with the exception of the Fathom and JBA models. The CaMa-UT and ECMWF models show a similar increment across the different RPs (though there is a large absolute difference between the two models), which is possibly caused by the similar output resolution (18 arcsec) and considered catchment size (500 km 2 ). GFMs with higher output resolutions and smaller considered catchment sizes tend to have larger increments between different RPs in the results, such as the JBA model. Moreover, the high output resolution and the inclusion of catchments of very small sizes in the JBA model are likely the reason for the hazard maps to predict inundation percentages significantly higher than the other models.
For the six GFMs (excluding JBA and KatRisk) that were used in the study of Trigg et al. (2016), percentages of the area inundated in our study for China for the undefended fluvial hazard map are similar to those found in Africa by Trigg et al. (2016). For example, the inundation percentages range from 3 % to 8.2 % for RP20 and 3.5 % to 9.5 % for RP500, and the results are highest for the ECMWF and Fathom mod-els in both studies. However, the results based on the CIMA-UNEP model are very different, with a relatively high percentage of inundation (double) in our study compared to the study of Trigg et al. (2016). However, it should be noted that the output resolution of the CIMA-UNEP hazard maps used in our study (32 arcsec or ∼ 1 km) is lower than the resolution used by Trigg et al. (2016) (3 arcsec or ∼ 90 m). Rudari et al. (2015) tested the role of output resolution on the hazard maps of CIMA-UNEP. They found that aggregating data from 3 to 32 arcsec has major implications; for 22 case study areas investigated in East Asia, they found an increase of inundation amount by a factor of 2 on average. Their findings correspond well with the difference in CIMA-UNEP results between both studies and further underline the large influence of output resolution on flood hazard maps.
The combined hazard maps shown in Fig. 3a (Fathom, Ka-tRisk, and JBA models; dashed lines) show less variation for a given RP than the undefended fluvial hazard maps. The values vary between 8.0 % and 10.5 % for RP20 and 15.2 % and 17.7 % for RP500. The difference in the inundated area between the JBA fluvial and combined hazard maps is relatively stable across increasing RPs. However, this is not the case with the Fathom model that shows larger differences with increasing RP. The higher amounts of inundation percentage due to the addition of pluvial floods (2 percentage points for Fathom and 0.9 percentage points for JBA for RP100) highlight the importance of including pluvial floods in flood hazard assessments at a large scale.
Next, we examine the results of the defended flood hazard map shown in Fig. 3b-c. The defended county-level flood hazard map results in Fig. 3b are based on the assumption of complete protection against RP10 (rural areas) and up to RP200 (in urban areas) and no protection against RP250 floods and higher. The results show the percentage of the inundated area for RP20 ranging between 0.2 % and 1.5 %. The effect of including flood protection is largest for low RPs and becomes smaller with an increasing RP. The results for RP100 vary between 4.4 % and 12.7 %. Compared to the undefended hazard maps the spread of results is reduced from 6.2 percentage points to 1.3 percentage points for RP20 and from 8.8 percentage points to 8.3 percentage points for RP100. The small difference between undefended and defended county-level maps at RP100 is explained by the presence of flood protection in the economically prosperous and densely populated counties in eastern China, leaving more counties prone to flooding.
The defended city-level hazard map results in Fig. 3c do not assume complete protection against a given RP flood. The results are similar to the results of the undefended flood hazard maps because of the coverage of 1.74 % of China for this flood protection layer.

Exposed GDP and flood protection
The exposed-GDP results (expressed as percentage of the total GDP of China) for the fluvial and combined hazard maps are shown in Fig. 3d-f, for RPs ranging from 5 to 1500 years, with and without flood protection. Results for the undefended exposed GDP (Fig. 3d) vary between 13.9 % and 27.8 % for RP20 and between 17.9 % and 33.4 % for RP100. Multiple similarities are found between the inundated-area (Fig. 3a) results and the exposed-GDP (Fig. 3d) results. The CaMa-UT, GLOFRIS, and JRC models have the lowest percentages for both types of results per RP. Similarly, the combined hazard maps of the KatRisk, Fathom, and JBA models have the highest percentages. The main difference is for the ECMWF model, which has the highest percentages of exposed GDP between RP5 and RP100, as this is different from the inundated-area results in which the inundated area is close to the average of all GFM results. Additionally, the Fathom model estimates relatively low exposed-GDP percentages compared to the fluvial percentage of the inundated area, which were close to the average. These results depict that a high amount of the inundated area does not necessary lead to a high amount of exposed GDP and vice versa. The high exposed-GDP percentages of the ECMWF model are caused by the inundation of densely populated deltas in eastern China. The inundated area alone does not give an adequate representation of the difference between models in terms of their use for assessing the impacts of floods. This is further illustrated by the relatively low exposed-GDP percentages of the Fathom model, which is due to simulated inundation in large parts of the sparsely populated regions of western China. The CIMA-UNEP results show a large increase in exposed-GDP percentage between RP500 and RP1000 of 12.1 percentage points, caused by the exceedance of the built-in level of flood protection of large cities.
The defended county-level exposed-GDP results in Fig. 3e vary between 0.1 % and 0.2 % for RP20 and between 8.8 % and 17.6 % for RP100. Compared to the undefended exposed-GDP results (Fig. 3d), the effect of including county-level flood protection standards is larger for exposed GDP than the inundated area. Generally, the variability between models in exposed GDP is very small between RP20 and increases towards RP100. At RP250 and higher the variability of results increases more due to floods exceeding the design values of the defences for the large cities (where GDP is concentrated) in the delta areas. This has a larger effect on the exposed GDP of the fluvial hazard maps of the CaMa-UT, GLOFRIS, JRC, and Fathom models than on the combined hazard maps of KatRisk, Fathom, and JBA models.
The results of the city-level defended exposed GDP in Fig. 3f vary between 9.4 % and 18.5 % for RP20 and between 17.3 % and 32.5 % for RP100. Contrary to the small effect of city-level defences on the inundated-area results, the impact is large for the exposed-GDP results in respect to the small coverage of China (1.74 %). For example, the ECMWF model has a lower exposed GDP of 15.8 % for the city defended scenario as compared to 27.8 % for the undefended scenario at RP5. The city defended results show less variability for the lower RPs than for the undefended exposed GDP. The variability among the GFMs increases between RP50 and RP100 from 9.6 % to 15.2 % because the highest assumed level of flood protection for this layer is RP100.
These results highlight the importance of including locally detailed flood protection data for the correct representation of exposed GDP. Adding information from a policy layer can further improve the risk assessment on a countrywide scale but needs careful validation of the uniform per-county total protection assumptions. Also, ideally, flood protection standards are already incorporated within the river routing models of the various GFMs instead of incorporation during post-processing.

Expected annual exposure
The expected annual exposed-GDP (EAE-GDP) results shown in Table 4 are expressed as a percentage of the total GDP of China. Generally, these results reflect the findings of the per-RP comparison in the previous sections. The CIMA-UNEP model simulates much lower EAE-GDP than the other models for the undefended and defended county-level EAE-GDP, which is due to the large difference in inundation percentages, caused by incorporated flood protection, between RP25 and RP50. Extrapolation of these results to RP5 leads to very low exposed-GDP percentage estimates and therefore results in a low EAE-GDP value. This is not the case for the defended county-level EAE-GDP due to all models agreeing on low amounts of exposed GDP for RP20 and RP25. The agreement between GFMs causes the defended county-level variation to be small, at 0.29 percentage points.

Model agreement
The model agreement maps shown in Fig. 2a-b depict the model agreement at the grid cell level for undefended fluvial and combined hazard maps for RP100. The areas with highest model agreement are mainly situated next to large rivers or deltas in eastern and northwestern China. Comparing the results of both flood type hazard maps, it appears that the combined flood hazard maps (Fig. 2b) have higher model agreement for these flood hotspots. Furthermore, the combined hazard maps show an increased level of detail due to higher native output resolutions. An overview of the model agreement index (MAI) for the whole of China is provided in Table 5.
The MAI scores for RP100 are 0.29 for the fluvial hazard maps and 0.38 for the three combined hazard maps. The change in MAI between RPs is the largest between RP20(-25) and RP50 for both undefended flood type hazard maps and reduces slightly at higher RPs. Comparing the results of the undefended and county-level defended hazard maps, the defended hazard maps have lower MAI scores for both flood types below RP500, and there is no difference between MAI scores for the defended and undefended maps at RP500 and above as no flood defences are in place. The city-scale defended hazard maps are not included in the MAI results section due to the small change in the inundated area and therefore model agreement.
Model disagreement occurs mainly at the floodplain edges and on the modelling of smaller streams and rivers due to differences in considered catchment size of the GFMs. This effect is more pronounced for smaller RPs.
The average MAI scores on a province level shown in Fig. 4a-b show the spatial differences of model agreement in China. MAI scores are higher (0.30-0.60) in the northwestern and eastern provinces for the fluvial hazard map in Fig. 4a. The same map shows that model agreement is low in western China, the provinces in the south, and especially the  island of Hainan, with MAI scores between 0.10 and 0.30. The combined hazard map results in Fig. 4b show a different spatial distribution of MAI scores. The scores are highest in the northern provinces (0.50-0.65), some of the southern provinces (0.50-0.55), and the eastern provinces (0.55-0.60). The delta areas in the eastern and northeastern regions and the provinces in western China have lower MAI scores (0.35-0.50) than the previously mentioned regions.
These results indicate the importance of modelled catchment size and output resolution of the GFMs for the hazard maps. For example, the fluvial hazard maps of the JRC model only include catchments larger than 5000 km 2 , while the Fathom model includes catchment sizes of 50 km 2 and larger for their fluvial hazard maps. This mismatch between models results in lower MAI scores. This is further illustrated by the low MAI score for the relatively small island of Hainan in the south of China, which is not modelled by Nat. Hazards Earth Syst. Sci., 20, 3245-3260, 2020 https://doi.org/10.5194/nhess-20-3245-2020

Limitations
The comparison of flood hazard maps is based on flood extent, where every grid cell is considered as fully inundated at more than 0 cm of flood depth. In this study we did not test the effect of this assumption on the results. A possible effect is the overestimation of flood extent by coarse-resolution models, as for example a grid cell with a small amount of inundation can be disaggregated to multiple inundated grid cells and therefore misrepresent the native flood hazard maps. A future study would benefit from testing multiple inundation thresholds for converting flood depth to flood extent or by adding methods to compare inundation depth. An additional limitation is the lack of RPs, especially the lower RPs, that shape the EAE-GDP results. Linear extrapolation of exposed-GDP results to RP5 can misrepresent how GFMs simulate low-RP floods. This affects the EAE-GDP because the results of low-RP floods have a larger weight on the results than high-RP floods. Future studies should test multiple extrapolation and or interpolation methods.
Our study has focused solely on the inter-comparison of the outputs of the eight GFMs and has not attempted a validation against past flood event footprints or results of regional flood maps. Therefore, results can currently only be interpreted relative to one another. In addition, this study does not portray a complete picture of a full flood risk assessment and should not be interpreted as such. The hazard component shows high amounts of uncertainty, as illustrated by the relevance of the flood defence assumptions which are larger than the variability between GFMs. The modelling of vulnerability and exposure would even add more levels of uncertainty to the outcome of a flood risk assessment.

Conclusions and outlook
The main aim of this study was to carry out a comprehensive comparison of flood hazard maps from eight GFMs for the country of China and assess how differences in the simulated flood extent between the models lead to differences in simulated exposed GDP and expected annual exposed GDP.
The main findings of this study are the following.
-Variations exist up to a factor of 4 between the flood hazard map outputs of GFMs in terms of the inundated area and exposed GDP.
-The GFMs that were assessed by Trigg et al. (2016) for the African continent showed similar results to this study, with the exception of the CIMA-UNEP model.
-The difference in the CIMA-UNEP model results between these studies underline the importance of the native output resolution of the flood hazard maps, which is in line with previous findings of Rudari et al. (2015).
-The GFMs with the closest resemblance in model structure and model components, i.e. the CaMa-UT and ECMWF models, differ up to a factor of 2. Their model setup deviates in terms of the used climate forcing datasets and GHMs, highlighting the large effect of these model inputs on the results.
-Higher model agreement is found for combined hazard maps than for fluvial hazard maps. This is due to greater similarity in the native output resolution and the considered catchment size of the three models (Fathom, JBA, and KatRisk) that include pluvial flooding. Furthermore, the spatial distribution of model agreement differs between both types of flood hazard maps on a province level.
-Pluvial flooding (both flooding of headwater catchments and off-floodplain flooding) is a highly important form of flooding (for China). Depending on the minimum catchment size used for modelling fluvial floods, adding pluvial flooding can increase the expected annual exposed GDP by as much as 1.3 percentage points.
-Incorporation of external flood protection standards in the flood hazard maps reduces the variability of inundation and exposed GDP between GFMs. Knowledge of structural defences in high-exposure areas is key in adequately assessing the overall risk of a country. Countylevel (policy-level) defence knowledge can help to further improve the results but needs to be checked carefully.
-The inclusion of industry models that currently model flooding at a higher resolution both on the grid as well as on the catchment level and that additionally include a pluvial-flooding component strongly improved GFMs are complex modelling chains, with assumptions and uncertainties in the input data, the individual model components, and their parameterization. In our study we can draw some preliminary conclusions on the impact of certain modelling decisions on the flood hazard map outputs. However, we cannot conclude on GFM quality or the quality of an individual model component. For the latter, a systematic comparison framework is required, in which each of these modelling components and parameters would be tested individually and in unison. The proposed model comparison framework of Hoch and Trigg (2019) could therefore greatly benefit our current understanding of global flood hazard. Based on our conclusions we advise practitioners to follow the flowchart in Fig. 5 when selecting flood hazard maps. The order of the flowchart does not indicate the relative importance of each component. First, a selection should be made based on the inclusion of (external) flood protection standards. Second, the practitioner should include pluvial floods when relevant in the study area. Third, the minimum catchment size and modelled resolution should fit the case study area and the required level of detail of the hazard maps. Fourth, the type of forcing product should be evaluated based on origin (reanalyses, gauged, radar, or satellite) and quality. Fifth, the model structure and specifications should be selected based on the GHM and river routing model characteristics.
In the future, multiple improvements are expected that can greatly benefit GFMs and their use for risk assessment. In terms of climate data, the ERA5 climate reanalysis dataset (the successor of ERA-Interim) has been released, leading to an increase of spatial and temporal resolution, among other aspects. GFMs can greatly benefit from next-generation DEMs, which will increase model resolution, result in better parameterization of hydrodynamic modelling, and have the potential for capturing flood defences. Improvements on current DEMs have been made by the creation of the Merit DEM (Yamazaki et al., 2019), which better captures river networks.
This study highlights the importance of pluvial flooding as a main contributor to flood risk that, if unaccounted for, can lead to a strong underestimation of the total flood risk. For future studies we recommend to further complete the comparison with coastal flooding that is increasingly available as either an integrated component of the global flood models under investigation or as separate hazard global layers (Couasnon et al., 2020). Further, we can illustrate the effect of flood defences on overall flood risk and the strong sensitivity to this parameter that dominates most other input and modelling uncertainties.
Author contributions. JPMA, SUE, and PJW conceived the study. JPMA, SUE, PJW, and DE contributed to the development and design of the methodology. JPMA analysed and prepared the paper with review and analysis contributions from SUE, PJW, and DE.