Predicting social and health vulnerability to floods in Bangladesh

Donghoon Lee1,2, Hassan Ahmadul3, Jonathan Patz4, and Paul Block1 1Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Wisconsin, USA 2Climate Hazards Center, Department of Geography, University of California, Santa Barbara, California, USA 3Red Cross Red Crescent Climate Centre, The Hague, the Netherlands 4Global Health Institute, Nelson Institute Center for Sustainability and the Global Environment, and Department of Population Health Sciences, University of Wisconsin-Madison, Wisconsin, USA Correspondence: Donghoon Lee (dlee@geog.ucsb.edu)


Introduction
Public health outcomes stemming from flood events are typically acute and severe, particularly in developing or tropical regions, potentially including death and injury, contaminated drinking water, endemic and infectious diseases, and community disruption and displacement. Although the impacts of floods on public health have been investigated (Ahern et al., 2005;Alderman et al., 2012;Batterman et al., 2009;Du et al., 2010;Tapsell et al., 2002), integrated management of flood and health 20 risks is technically and institutionally limited.
Unsurprisingly, public health research on the impacts of natural disasters predominantly focuses on clinical, microbiological, and ecological aspects, including vaccines, therapy, and improved treatment (Colston et al., 2020;Schwartz et al., 2006). Information outlining comprehensive measures of coping capacity at local scales is often incomplete or not available. In Bangladesh, the BBS conducts household surveys and quantifies disaster-related statistics for twelve main natural disasters (BBS, 2016). From this report, we adopted district-level statistics to represent coping capacity and public health vulnerability indicators for flood disasters. Examples include knowledge and perceptions of disasters (population assume that natural process causes critical disasters), damages and losses, households receiving financial support, population lacking safe drinking water, 85 etc.
Additionally, health facility (e.g., location, capacity, etc.) and physician data are obtained from the Facility Registry (http: //facilityregistry.dghs.gov.bd) and the Health Dashboard (https://dghs.gov.bd/index.php/en/home), respectively. From this data, the number of hospital beds and physicians are estimated to reflect the capacity of health system and health workforce of each Upazila. The national average of hospital beds per 1,000 people and physicians per 10,000 people in 2019 are measured 90 as 0.6 (0.8 in 2015 by World Bank) and 0.58 (0.53 in 2017 by World Bank), respectively.
For spatial population data, the WorldPop population per pixel data in 100m resolution is obtained and rescaled linearly with a World Bank population record of 2017 (World Bank, 2018;Worldpop et al., 2018).

Flood forecast, satellite inundation, and population data
In Bangladesh, the Flood Forecasting and Warning Centre (FFWC) provides flood forecasts and warning services country-95 wide. FFWC's flood forecasting system is based on the MIKE 11 model, a one-dimensional water modeling software for the simulation of water levels and discharges in river networks and flood plains. Two-dimensional flood inundation (flood depth) forecasts are created using Digital Elevation Models (DEM) at 300 m spatial resolution. The current early flood warning system offers a 120 hour lead-time (FFWC, 2018). The FFWC acknowledges that flood forecasts may underestimate or overestimate inundation depths and extent given the lack of model updates and course spatial resolution. These FFWC issued flood forecasts 100 are utilized for the August 2017 event (issued August 16th) evaluated here. These forecasts were verified by FFWC with observed inundation maps from Sentinel-1 satellite images, illustrating good agreement in the northwestern and northeastern regions (FFWC, 2018). We obtained the satellite inundation data for the August 2017 flood event generated using Sentinel-1 Synthetic Aperture Radar images (August 22nd, 24th, 27th, and 29th) from the International Centre for Integrated Mountain Development (Uddin et al., 2019) (Figure S1).

Flood impact records
The Global Shelter Cluster has aggregated relevant post-disaster reports and data for the August 2017 flood event in Bangladesh through government agencies and international relief organizations (https://www.sheltercluster.org/response/bangladesh-mon soon-floods-2017). Specifically, we leverage the 72-hour Rapid Assessment report published August 21st, the flood damage data reported on September 3rd by the DDM and Natural Disaster Response Coordination Group, and monthly hazard incident 110 report from the Network for Information, Response and Preparedness Activities on Disaster (NIRAPAD) (NIRAPAD, 2017b).
The DGHS reported health outcomes from the August 2017 flood collected between July to September. From this, we extract the number of diarrheal incidents and other adverse health outcomes, including incidents of respiratory tract infections (RTI), eye and skin diseases, snake bites, drowning, and other injuries.

115
Spatially explicit vulnerability and risk maps can support decision-makers by enhancing their ability to take appropriate actions.
However, vulnerability assessment is complicated by environmental, social, economic, and political patterns of societies. To date, no standard model or methodology exists to guide spatial vulnerability assessments for natural disasters, although the number of related studies is rapidly increasing (Villagrán de Léon, 2008;Ward et al., 2020). In this study, we select socioeconomic, health, and coping capacity vulnerability domains consisting of 26 indicators based on the literature, availability of 120 Vulnerability assessment Impact assessment  Previously, assessments of spatial vulnerability conditioned on socio-economic factors have been conducted for a number of regions of Bangladesh (Ahsan and Warner, 2014;Dewan, 2013;Gain et al., 2015;Hoque et al., 2019;Rabby et al., 2019;Roy and Blaschke, 2015) and more broadly for the entire country (DDM, 2017;Islam et al., 2013). Method of assessment, indicators, 130 study area, scale, and data are summarized in Table 1. These studies typically select vulnerability domains and indicators based on the context of the target disaster and study area or from a pre-defined approach in the literature. In previous studies, the domains include socio-economic, adaptive or coping capacity, and unique exposure or hazard domains, such as agricultural, physical (climate, flood, or coastal hazard), and infrastructure. For vulnerability models, an addictive model (equal weights) or analytic hierarchy process analysis (AHP) (custom weights from stakeholder engagement or expert opinion (Saaty and Vargas,135 5 https://doi.org/10.5194/nhess-2020-392 Preprint. Discussion started: 1 December 2020 c Author(s) 2020. CC BY 4.0 License. 2012)) are most common. A PCA analysis (e.g. Cutter et al. (2003)), is also frequently employed to identify dominant spatial patterns and to generate a composite vulnerability. The majority of studies adopt the equal weights approach such that each domain contributes equally to the composite vulnerability.
In this study, the SHV includes 26 indicators along with three indicator domains: socio-economic (15 indicators), health (5 indicators), and coping capacity (6 indicators) domains. The SHV specifically precludes physical indicators (e.g., low elevation, 140 proximity to river, etc.), as flood hazard information (i.e., flood inundation) will be linked later through the impact assessment.
Instead, we include a health domain uniquely reflecting flood-induced health risk that have rarely been considered in previous studies. Indicators are selected on the basis of their relevance to each domain vulnerability and availability of data for the country at Upazila or district level ( Table 2).
The socio-economic domain broadly represents the potential impact of the hazard conditioned on the existing societal con-145 text. Based on the literature review, we select 15 indicators relevant to demographic (3), built environment (5), social (4), and economic (3) categories, drawing on the most recent population census data. Comparatively, the coping capacity domain represents the ability to cope with or adapt to the hazard. In the literature, coping capacity indicators are surveyed for the local region, or proxy data from the census are used, such as households with communication devices and vehicles, literacy rates, education levels, etc. For this study, we apply 6 indicators specifically measured to represent the level of disaster resilience 150 in each district across Bangladesh, including: 1) percentage of households affected by floods, 2) percentage of children did not attend to school due to disasters, 3) percentage of household have not taken disaster preparedness activities, 4) percentage of population with knowledge and perception about disaster, 5) percentages of households received financial support from agencies, and 6) ratios of total damage/loss to total income (Table 2). In Bangladesh, several studies and reports investigate appropriate health indicators in the context of disaster management (DGHS, 2018;Schwartz et al., 2006;Shahid, 2010;WHO, 155 2013). However, most indicators are either national or local scale, and thus not interpretable at a high resolution for the entire country. Here, we include five indicators representing the health domain: the proportion of population having suffered from diseases caused by disasters, the proportion of population having experienced diarrheal disease during disaster periods, lack of drinking water due to disasters, the number of hospital beds, and the number of physicians.
The Min-Max formula is applied to derive an indicator score of Upazila i as follows: 160 where x i is the original value of the indicator, and x min and x max are the lowest and highest values of the indicator, respectively. Indicator scores range from zero to one, with larger values representing an increase in vulnerability (Table 2). All data is normalized to account for differences in magnitude of units.

165
In this study, equal-weight and PCA approaches are proposed to calculate DV for the three domains ( Table 2). The equalweight approach applies the addictive model with equal weights for all indicators in a domain as follows: where DV i denotes the domain vulnerability index of Upazila i, and IS i,k is kth indicator score of Upazila i (here n indicates the number of indicators in each domain shown in Table 2).

170
PCA is a common data-driven approach for construction of the Social Vulnerability Index proposed by Cutter et al. (2003).
Specifically, PCA reduces the number of indicators to a smaller number of components that account for a significant portion of the variances of the indicators. Through grouping highly correlated and similar indicators, principal components (PC) are formed. Here, varimax rotation is used to create more independence between PCs. Only PCs with eigenvalues > 1 are retained in order to meet the Kaiser criterion (Kaiser, 1960). The domain vulnerability index for each Upazila is calculated by adding 175 the scores of all the retained PCs as follows: Thus, each domain vulnerability contributes equally to the SHV index value. SHV scores are classified into five categories based on their values: very-low vulnerability (0 to 0.2), low vulnerability (0.2 to 0.4), moderate vulnerability (0.4 to 0.6), high vulnerability (0.6 to 0.8), and very-high vulnerability (0.8 to 1).

Impact assessment linking with flood forecast and satellite inundation information
Although social vulnerability is a complex function of social, economic, and cultural context, numerical vulnerability estimates are often presented in terms of fatalities, economic losses, migration, etc. (Rufat et al., 2019;Villagrán de Léon, 2006). One can imagine that a region classified as highly vulnerable may experience severe impacts from a disaster, poor resilience, slow recovery, or high rates of a particular action such as displacement or emergency shelter use (Fekete, 2009). However, validation 185 of social vulnerability is typically challenging due to limited availability and quality of data during/after the disaster period.
Moreover, given the compound characteristic of a composite vulnerability, a comparison of vulnerability with a particular disaster outcome may be difficult to validate in a traditional sense (Rufat et al., 2019). That withstanding, the objective here is to develop vulnerability measures for impact assessment, and specifically evaluate its utility for the August 2017 flood by merging with physical flood hazard information (i.e., flood forecast and satellite inundation) in order to aid in pre-and 190 post-disaster management practices.
In 2017, after devastating floods in the pre-monsoon period (mid-April) and the monsoon period (early July), the second monsoon rains began on August 11th, causing intense floods in 42% of the country, including 5 divisions and 32 districts in the northern, northeastern and central parts of the country, affecting a total of more than 11 million people. According to the Ministry of Disaster Management and Relief, this flood has been recorded as the worst in the last four decades (FFWC, 2018;195 NIRAPAD, 2017b).
First, we estimate the affected population based on flood forecast and satellite inundation maps and spatial population data.
All spatial data is linearly downscaled to a 30m resolution. Flood forecasts are represented as flood depths. The affected population is assumed to increase linearly from no impact at a flood depth of zero to maximum impact at a flood depth of 3 m. Satellite-based inundation conveys whether or not a grid (30m resolution) is flooded. For flooded grids, we assume the 200 full population of that grid is affected. We acknowledge that this may result in an overestimation of the affected population, however explicit flood protection infrastructure data is not available widely. This approach does still capture spatial patterns of affected population.
Post-disaster records were aggregated and reported at the district-level for the August 2017 flood event, therefore we calculate the district-level domain vulnerability by taking the population-weighted average of Upazila-level domain vulnerabilities 205 as follows: where POP and DV are the affected population and domain vulnerability of Upazila i and district j, respectively. Thus, the district-level DV indicates the average vulnerability of the affected population in each district.
In lieu of evaluating and comparing vulnerability directly with all disaster outcomes, we group the disaster records into four 210 index types, including distress, damage, disruption, and health (Table 3), as utilized by local management agencies and defined in post-disaster reports. Specifically, the distress index includes the percentage of the affected population and the number of deaths, the damage index includes the number of damaged houses and crop land areas, the disruption index includes the number of affected educational institutions and damaged tube-wells, and the health index includes the number of diarrheal and other disease cases (e.g., injury, drowning, RTI, skin, snakebite, etc.) (Table 3).

215
The variables within each group are normalized and averaged to form a group impact score. Validation is carried out by calculating correlations between developed vulnerability scores and group flood impact scores.

Relationships between vulnerability indicators
To evaluate cross-correlations, selected indicators (Table 2) are compared at the Upazila or district levels ( Figure 2). As neces-

Vulnerability assessment
Spatial representation of the DV is determined using the equal weight and PCA approaches for each of the three domains: 235 socio-economic, health, and coping capacity (Figure 3). In the PCA analysis, 3 PCs are included in the socio-economic and coping capacity domains, and 2 PCs are retained in the health domain per the eigenvalue criterion ( Figure S2).
Both socio-economic DVs based on the two approaches clearly represent the expected demographic, social, and economic characteristics of major cities, for example lower vulnerability (standard deviation (SD) < -1.0) near the country center (Dhaka; capital city) and the southeast coast (Chittagong; the second largest city) and high vulnerability (SD > 1.0) in the northeast   as in the northeastern floodplains ( Figure 4). The major difference between the two approaches appears in the northwestern 255 riverine regions; while the equal-weight approach indicates a relatively higher vulnerability (≥ 0.7), the PCA approach yields moderate vulnerability ranging from 0.4 to 0.6 ( Figure 4). As discussed above, this difference is mainly due to a relatively lower socio-economic DV in the PCA approach (Figure 3d).
For both approaches, vulnerable zones (≥ 0.6) appear proximal to major rivers and tributaries from northwest to central Bangladesh, and more broadly across low floodplains in the northeast (Haor basin; Figure 4). Although we did not include any both approaches present very low coping capacity DVs (≤ -1.0) in these regions (Figure 3). This is because our coping capacity indicators have included existing active disaster management practices and financial supports from agencies in those regions that have resulted in low coping capacity DV and SHV. On average, considering approaches, half of the country (45%) is classified as moderately vulnerable (0.4 ≤ SHV < 0.6), with the remaining 55% split between high vulnerability zones (SHV ≥ 0.6), including 42 million people or 26% of the 275 population, and low vulnerability zones (SHV < 0.4), with 46 million people or 29% of the population. As proposed in the framework (Figure 1), DV and SHV can also be merged with physical flood information to assess predictability of flood impacts. However, identifying highly vulnerable zones based solely on indicators is also informative for government and relief agencies to enhance the resilience of these regions through long-term management practices.

Impact assessment 280
For the August 2017 event, flood forecast and satellite inundation estimates indicate that 16.8 (10.6%) and 15.3 (9.7%) million people nationally were impacted from flood inundation, respectively. Post-disaster reports claim 9.2 million (5.8%) of the population was impacted ( Figure 5). This overestimation is likely attributable to the simplified approaches and insufficient data quality. For example, the two approaches adopted here do not consider the level of flood protection (e.g., embankment, levee, early warning, etc.) but rather assume all regions have equivalent protection and management. Furthermore, the current 285 flood forecasts and satellite inundation information do not provide specific physical flood properties, such as duration of the flood, which is a key factor in increasing flood impacts, as indicated in the post-flood reports. Geographical contexts may also contribute to this discrepancy. For example, both forecasts and satellite information estimate a high number of affected people in the northeastern floodplain (i.e., Haor region), whereas a relatively low percentage of affected population is reported ( Figure   5). This region is known to be highly vulnerable to flooding, but home styles and small households (lowest population density in  Table 4). According to post-flood reports, the August 2017 event had a significant impact on the northwestern regions (Rangpur, Rajshahi, and Mymensingh divisions). Generally, the equal-weight approach produces higher correlations than the PCA approach ( Figure S4). This is mainly attributable to the relatively low 300 socio-economic DV in the PCA approach for the northwest region. The forecast and satellite-based DVs correlate similarly with the four indices from the two approaches, although the forecast-based are marginally higher, and correlations with equal weights are notably higher than for the PCA approach (Table 4). Again, the moderate vulnerability of the PCA approach on the northwestern regions substantially depreciates its correlations with overall flood impact indices (Figures 4 and 6).
Specifically, the forecast-based socio-economic DV spatially correlates well with the equal weights approach indices, statis-305 tically significantly capturing distress (r = 0.38) and disruption (r = 0.3) impact indices. For the same comparison, the coping capacity DV also produces statistically significant correlations with disruption (r = 0.34) and health (r = 0.31) impact indices.
Surprisingly, the health DV demonstrates a low correlation with the health impact index, which consists of diarrheal and other disease incidents. Given that the causes of disease outbreaks are quite complex (e.g., current vaccines and medical status) and often do not have a simple relationship with hazard (Shahid, 2010), this reiterates that considering a capability to pre-310 pare/manage natural disasters may provide a better indication of the likelihood of flood-induced health impacts and epidemics as discussed by previous studies (Hashizume et al., 2008;Kunii et al., 2002;Schwartz et al., 2006).
Overall, the forecast-based SHV is statistically significantly correlated with all types of flood impact indices (Table 4). This could play a critical role in disaster management by indicating comprehensive impacts across multiple sectors. The proposed approach shows promising results. First, we find highly (SHV ≥ 0.6) and very-highly (SHV ≥ 0.8) vulnerable 325 zones near the northwest riverine areas, northeast floodplains, and southwest region covering 42 million people (26% of total population); most indicators illustrate consistently high vulnerability levels ( Figure 4). A spatial discrepancy in SHV between the equal weight and PCA approaches in the northwest riverine regions is evident, however, mainly attributable to the socioeconomic DV.

Conclusions and Discussions
The affected population by the August 2017 flood event is estimated using flood forecast and satellite inundation informa-330 tion ( Figure 5). Although both sources overestimate the affected population due to a lack of the information, such as flood protection/management and duration of flood, the satellite-based information exhibits a fairly consistent spatial pattern with the reported population (r =0.6). Given that the socio-economic DV is strongly correlated with the distress impact index, which includes the number of affected people and deaths (Table 4), the inclusion of a socio-economic DV to represent the level of overall flood protection and management is warranted. For this analysis, the equal weight approach has a stronger relationship 335 with flood impact indices than the PCA approach (Table 4). Specifically, the socio-economic DV reflects the distress impact, and the coping capacity DV captures disruption and health impacts. This suggests that thematic vulnerability can play an important role in contextualizing flood impacts.
Although the health vulnerability measure consists of indicators related to previous disease incidents, lack of drinking water, hospital capacity, and health workforce, it does not reflect well the observed health impact. However, the coping capacity DV impacts and actively support pre-and post-disaster management rather than being used as static supplementary data. We also demonstrate, through a validation, that the thematic vulnerability can better estimate a particular aspect of flood impacts.
This can potentially facilitate tailored management actions, such as prioritizing different resources (e.g., foods, cash, medical 355 supplies, volunteers, etc.) for the given location. In order to enhance the quality of this approach, the validation process can be updated and improved with additional data, indicators, and flood records across the country to enhance management practices.
Especially, more accurate post-disaster impact records with diverse variables at the local level (e.g., Upazila scale) may improve future vulnerability and risk assessments and impacts prediction. Flood forecasts have clear value, however producing local scale information may pose challenges in countries with limited resources; existing global scale forecasts may be able to fill 360 this role and should be evaluated (Alfieri et al., 2013;Emerton et al., 2018). Understanding the prospects for extending forecast lead-times is also warranted, and may facilitate more proactive disaster management practices (Coughlan de Perez et al., 2015).
Finally, integrating more physical flood information and models to estimate the affected population may enhance flood impact predictions.
The proposed approach is transferable and easily adapted to different countries to assess vulnerabilities. Integrating this 365 approach systematically with a flood forecast system, such as a web-based online tool, may be of further value to international and local disaster managers. However, it should be noted that the quality of this approach depends on the quality and availability of data, particularly for demographic and socio-economic elements requiring a sub-national level census. Overall, this study provides groundwork for the development of a multi-sectoral (flood and health) risk warning system. Actionable flood and health risk predictions can radically improve existing disaster management practices of NGOs and other private and public 370 organizations and save lives and resources by providing advanced preparedness and response strategies.
Data availability. The original data of vulnerability indicators are publicly available (see Section 2.1). The processed vulnerability indicators can be requested from the corresponding author.
Author contributions. The paper and its methodology were conceptualized and developed by DL and PB; the data processing, analyses, and visualization were carried out by DL. The original draft was prepared by DL and PB; further reviewing and editing was carried out by DL,  (1).
The deprivation index is a composite index calculated from 21 socioeconomic variables using PCA analysis (Mahhzab, 2015) Vulnerability