Using rapid damage observations from social media for Bayesian updating of hurricane vulnerability functions: A case study of Hurricane Dorian

Rapid impact assessments immediately after disasters are crucial to enable rapid and effective mobilization of resources for response and recovery efforts. These assessments are often performed by analysing the three components of risk: hazard, exposure and vulnerability. Vulnerability curves are often constructed using historic insurance data or expert 20 judgments, reducing their applicability for the characteristics of the specific hazard and building stock. Therefore, this paper outlines an approach to the creation of event-specific vulnerability curves, using Bayesian statistics (i.e., the zero-one inflated beta distribution) to update a pre-existing vulnerability curve (i.e., the prior) with observed impact data derived from social media. The approach is applied in a case study of Hurricane Dorian, which hit the Bahamas in September 2019. We analysed footage shot predominantly from unmanned aerial vehicles (UAVs) and other airborne vehicles posted on YouTube 25 in the first 10 days after the disaster. Due to its Bayesian nature, the approach can be used regardless of the amount of data available as it balances the contribution of the prior and the observations.

employed the Global RApid post-disaster Damage Estimation (GRADE) approach to rapidly estimate damages to physical 35 assets after major disasters (Gunasekera et al., 2018).
Damage estimations are commonly modelled using the three components of risk: hazard, exposure and vulnerability (Desai et al., 2015). Hazard is defined as a potentially damaging event, exposure as the elements subject to damage and losses as a result of a hazard and vulnerability as "the conditions determined by physical, social, economic and environmental factors or 40 processes which increase the susceptibility of an individual, a community, assets or systems to the impacts of hazards" (UNISDR, 2016). Vulnerability and fragility functions are commonly used to model damage to buildings due to natural disasters. These functions typically relate a measure of impact, such as wind speed (Pita et al., 2015), water depth (de Moel et al., 2013) or ground motion (Li et al., 2013), to damage. These relations are mostly based on previous insurance claims, experiments or expert judgment and are often only applicable to particular hazard characteristics and specific built 45 environments (Chung Yau et al., 2011;Douglas, 2007;Pita et al., 2013).
Observations, such as those from surveys (Wijayanti et al., 2017), can improve the accuracy of damage estimates (Douglas, 2007). However, affected areas are often difficult to access following a disaster (Bono and Gutiérrez, 2011), and human resources are often limited (Koshimura et al., 2009). Data sources such as social media (Kryvasheyeu et al., 2016), 50 unmanned aerial vehicles (UAVs) (Kim and Davidson, 2015) and other remote sensing techniques can provide detailed observations of damage quickly during and after a disaster. However, the amount of data is heavily dependent on the characteristics of the disaster area, such as the number of people that are able to use social media (de Bruijn et al., 2019;Yu et al., 2018) and cultural differences (Cho et al., 2009).

55
A scientific challenge is to seek for methods that use observations from the affected area to improve vulnerability curves. An example of such method is Bayesian analysis, which enables the updating of prior beliefs (e.g., beliefs based on expert judgment) with observational data, irrespective of the amount of data available (Koutsourelakis, 2010). The balance between prior information and observational data depends on the number of observations and the level of uncertainty in both the observations and the prior beliefs. 60 Bayesian updating of fragility functions (i.e., the probability of exceeding a certain damage state) has been employed in numerous studies. For example, Li et al. (2013) combined results from numerical simulations and experimental testing of bridge substructures with Bayesian updating to obtain improved earthquake fragility functions. Mishra et al. (2017) updated and quantified the uncertainty of analytical hurricane fragility functions for wood-frame buildings with experimental data. In 65 another study, a Bayesian framework was designed to create fragility functions for earthquakes (Koutsourelakis, 2010).
However, from a risk management perspective, vulnerability functions (where damage is represented as a damage ratio) are more desirable, as they measure the actual capacity of the built environment in an event (Rossetto et al., 2015).
In this paper, we aim to improve existing vulnerability functions employed in GRADE assessments by updating these 70 functions with post-disaster observations using Bayesian zero-one inflated beta regression (Ospina and Ferrari, 2010). The method is applied to a case study of tropical cyclone wind damage in the Bahamas caused by Hurricane Dorian in September 2019. The observations were obtained from online media reports (i.e., YouTube) in the days following the disaster. This data includes some ground observations from driving cars but is mostly composed of observations from UAVs and videos shot from other airborne vehicles. In the remainder of this paper, we describe the general methodology (Sect. 2) and its 75 application to the Bahamas (Sect. 3). We also discuss the applicability of the proposed method to other disaster types and different data sources.

Methodology
Bayesian updating or inference refers to the process of updating existing knowledge for a set of n parameters Θ, defined as Θ = ( 1 , … , ) and expressed in a prior distribution, defined as (Θ), with new information X to find the posterior distribution, 80 defined as (Θ|Χ). Mathematically, we can express Bayesian updating as (1) The posterior distribution (Θ|Χ) is obtained by multiplying the prior distribution (Θ) by the likelihood distribution (X|Θ), which is the probability of observing X given parameter set Θ, and a normalizing constant. The normalizing constant ( ) is omitted here since this is automatically determined during the process of Gibbs sampling (see below; Gilks et al., 85 1996;Plummer, 2003b).
Following the Bayesian framework, the prior distribution (Θ) for vulnerability must be defined. The vulnerability of the building stock can be expressed as a curve that maps wind speed as the explanatory variable ( 1 , … , ) to a damage ratio from 0 to 1 (inclusive) as the response variable ( 1 , … , ). Fragility curves of individual components of buildings (e.g., roof 90 sheathing and nails) are widely regarded as following a cumulative lognormal distribution (Ellingwood et al., 2004;Lee and Rosowsky, 2005;Li and Ellingwood, 2006). Assuming identical fragility curves for individual components and independence of failure, a vulnerability curve for a building follows that same cumulative lognormal distribution (Holmes, 1996): where (0,1) is the damage ratio, the median capacity of the building stock, the logarithmic standard deviation of that capacity, Φ(•) the cumulative probability density function for the standard normal distribution and the sustained wind speed. While and can be expressed deterministically, we prefer to regard both parameters as uncertain and we express them as the random variables 1 and 2 , respectively. For proportional data, the beta distribution is often used as the basis for the likelihood function (X|Θ) (Gupta and Nadarajah, 2004) because it supports a wide range of shapes on the interval (0, 1). Its probability density function, reparameterized in terms of mean and precision , is given by Ferrari and Cribari-Neto (2004): where Γ(•) is the gamma function. 105 However, observations of the damage ratio can be true values of zero (i.e., no damage) and one (i.e., complete destruction), which are not supported by the beta distribution. In fact, such observations are more frequent because the assumption that the individual building components of buildings fail independently does not always hold. For example, houses are often completely destroyed (i.e., damage ratio of 1) due to the collapse of an important fundament (Keote et al., 2015) or complete 110 displacement of the entire building (Shultz et al., 2005). By contrast, houses may be completely undamaged due to their environment or protection measures, such as shielding by other standing buildings (Keote et al., 2015). Using the zero-one inflated beta distribution enables us to explicitly model these 0 and 1 observations through probabilities 0 and 1 , respectively.

115
Therefore, for proportional response variables ∈ [0,1], Ospina and Ferrari (2010) propose a zero-one inflated beta regression. Here, the response variable ( 1 , … , ) is modeled as a mixture probability function of 0 , the probability that = 0; 1 , the conditional probability ( = 1 | ≠ 0); and the beta distribution with expected value and precision for the values between 0 and 1 (0 − 1). Its probability density function is given by: By employing this distribution with proper parameterization (Sect. 3.3), we can model a process where it is highly probable that the damage ratio is 0 for low wind speed and 1 for high wind speed. For a wind speed in between, it is likely that is modelled on the continuous scale (0-1) through the beta distribution. We base the equation for on Eq.
In Eq. (5-7), the parameters 1 , … , 6 are assumed to come from normal distributions with some mean and standard deviation. 130 Finally, the prior distribution (Eq. 4-7) and parameters ( 1 , … , 6 ) can be updated with some observations with the wind speed as the explanatory variable ( 1 , … , ) and the corresponding response variables ( 1 , … , ) using Gibbs sampling, which is a Markov chain Monte Carlo (MCMC) algorithm, see Gilks et al., (1996), for a general treatment. For the purpose of Gibbs sampling, the predictor variable is normalized such that � (0,1] by dividing by the maximum observed wind 135 speed (i.e., max ( )). All other input variables are scaled accordingly. Then, samples of the posterior distribution are generated using the Just Another Gibbs Sampler program (JAGS; Plummer, 2003a). We first use 1,000 iterations to tune the samplers (i.e., adaptation), 1,000 iterations as a burn-in to find the place where the Markov chain is most representative of the sampled distribution, followed by 100,000 iterations in three chains with a thinning of 100.

140
To verify the convergence of the Markov chains, we can present different diagnostics which are reviewed in, amongst others, Cowles and Carlin (1996) and Brooks and Roberts (1998). In particular, we concentrate on diagnostics based on distributional and autocorrelation statistics.

Case study of Hurricane Dorian
In this section, the methodology for updating vulnerability curves described above is applied to a rapid damage estimation of 145 Hurricane Dorian in Grand Bahama and the Abaco Islands, the northernmost main islands of the Bahamas. First, the hazard (Sect. 3.1) and exposure components (Sect. 3.2) are briefly described. Then, we discuss the collection of observations for the vulnerability component and the Bayesian updating process (Sect. 3.3). Finally, these three components of risk are combined to obtain a final damage estimate (Sect. 3.4). It should be noted that all data sources had to have been collectible in the first 10 days following the first landfall of the Hurricane in the Bahamas to be eligible for the rapid damage estimate. 150

Hazard
On 24  Cay, Great Abaco, in the Bahamas. During its passage over the Great Abaco and Grand Bahama islands, the weakening of 155 the nearby high-pressure area caused the hurricane to lose its steering currents and therefore significantly slowing its forward speed to 2 km/h, at times even coming to a standstill. Dorian remained near-stationary for 36 hours, until the hurricane started moving north-northwestwards towards North Carolina (USA). Dorian dissipated on 8 September near Canada.
In this study, meteorological conditions during Hurricane Dorian's passage over the Bahamas is taken from the International 160 Best Track Archive for Climate Stewardship (IBTrACS; Knapp et al., 2010). Using the position of the eye, the minimum pressure, maximum wind speed, and size of the eye (i.e., radius to maximum winds) the 2D wind field is constructed by applying the parametric approach of Holland (1980), which was further refined in by Lin and Chavas (2012). To obtain the wind field at 10 meter above surface level ( Fig. 1) a reduction factor of 0.85 is applied (Powell et al., 2005).

Exposure 165
To determine the exposure of the residential buildings on the Abaco and Grand Bahama Islands, we consulted the 2000 and 2010 Population and Housing Census of the Bahamas (Department of Statistics, 2002Statistics, , 2012. The census contains information about the housing stock. Specifically, it contains data on residential buildings and occupied and vacant dwelling units for each settlement and enumeration district in the Abaco Islands and each supervisory district in Grand Bahama, as well as the number of bedrooms and the total annual household income for each household size. In the Abaco Islands (and to 170 a lesser extent in Grand Bahama), the proportion of vacant housing stock is relatively high as the islands are a tourist destination, and many homes are owned or rented by vacationers. The Abaco Islands also has many migrant communities (mainly working in the service sectors for the tourism industry), who reside in low-quality housing in several informal settlements, such as the Mudd and the Pigeon Peas settlements. Other settlements are occupied by the local Bahamian population, while tourists and foreign citizens, often reside in high-value homes and resorts. 175 To account for this heterogeneity all settlements and supervisory districts ("regions") were mapped individually and the value of residential buildings within three building classes (low-, medium-and high-quality) was estimated for each region.
We first estimated the number and area of buildings per building type in each region. To do so, we consulted building footprints from OpenStreetMap (and found 16,100 and 12,500 building footprints in Grand Bahama and the Abaco Islands scales). Note that these estimations do not include building content. Since no official maps of the regions were available, we determined the coordinates of the population centres for each region. All maps are presented as Voronoi diagrams (i.e., partitioned into regions closest to each centre point) based on these centroids. 195

Vulnerability
To derive event-specific vulnerability curves, we aimed to update vulnerability curves derived from previous hurricane observations in similar built environments using damage observations for individual buildings in the affected area.
Therefore, we analysed all 498 YouTube videos that were listed when we searched for "Bahamas Dorian" and that were posted in the 10 days after the first landfall of Hurricane Dorian in the Bahamas (September 1 st -9 th ). We then analysed all 200 videos that 1) showed an overview of an area or a row of buildings (to ensure the sample was as representative as possible), 2) that we were able to geotag (i.e., locate) and 3) that showed buildings that did not appear to have undergone extensive flood damage. This resulted in a set of 15 videos (Appendix B), from which we extracted 732 buildings. Figure 3 depicts two examples of buildings extracted from the videos. By comparing the footage with satellite imagery of the area before the hurricane, we ensured that all buildings in an observed area or row of buildings were analysed, including those that 205 completely disappeared in the storm.
Then, the damage ratio [0-1] and building class (i.e., low-, medium, and high-quality) were estimated for each building.
Based on experience in post-disaster damage assessment in insurance, economic damage ratios were estimated based on the damage seen in the value of subcomponents of the structure and their relative values and interactions as a whole, see 210 Massarra et al. (2019). In some areas, especially those with low-quality houses, it was difficult to extract an image of each individual building due to the large amount of destruction and displacement. In such cases, we estimated the number of buildings per level of damage from pre-event satellite imagery.
Next, we derived a prior (i.e., a vulnerability curve based on pre-existing knowledge; Sect. 2) for each building class by 215 estimating the parameters based on expert judgment of the strength of the buildings in similar built environments. Such curves have fitted the results of previous PDNAs within the Caribbean for stronger wind events for economic damage, and provide similar smoothed curves to existing models in the region such as those presented in the UNDRR's Global Assessment Report on Disaster Risk Reduction (UNISDR, 2015) for developed locations. Figure 4 (left) shows the parameters and of these curves for the three housing qualities and Fig. 4 (right) shows the associated damage ratio for 220 low-quality buildings in the affected region.
We then set the priors (Fig. 5) for 1 and 2 (Eq. 5) using the specified values of and for each building class; 1 was set to 15 using the uncertainty range expressed in the fragility curves for a specific damage state (i.e., the range of wind gust speed that causes a specific damage state) in HAZUS (a tool for analysing natural hazards in the United States) for a similar built environment (Vickery et al., 2006) and 2 was set to 0.03 to allow for uncertainty of the building typology. The values of the means and standard deviations of 3 , … , 6 (Eq. 6, 7) were set such that the probability 0 is near one when is near zero and 0 near zero when is near one. Conversely, 1 is set such that its value is near one when is near one and 1 near zero when is near zero. This is explained in full in Appendix C. For precision parameter , we use an uninformed uniform prior. 230 For each building's observed damage ratio ( 1 , … , ) ∈ [0,1] , the maximum sustained wind speed ( 1 , … , ) was obtained using the location of the building (Sect. 3.1). Finally, using these observations we performed Gibbs sampling to obtain the posterior distribution ( Bayesian updating. Note that for the posterior vulnerability curves for low-quality buildings (bottom left; d) it appears that there is a significant probability that the building is entirely destroyed (green curve). However, this is not the case since 1 is the conditional probability that the damage ratio is one given that the damage ratio is not zero ( ( = 1 | ≠ 0); Eq. 4).
Figures D1, D2 and D3 display the values per iteration, density plot and autocorrelation for 1 , … , 6 for each building 245 quality class. While most sequences of generated parameters in the MCMC have low autocorrelations, some parameters do show high autocorrelations (i.e., 3 and 4 for low-and medium-quality buildings and 1 and 2 ). This is likely caused by the absence of data information for these parameters, together with a possibly large disagreement between the prior and observed data. 250 Figure 6 shows the posterior damage ratio per district for low-, medium-and high-quality buildings.

Damage estimation
Finally, the three components of risk were combined (i.e., hazard × exposure × vulnerability) to obtain a damage estimation for residential buildings (Table 1; Fig. 7) of 1056 million USD using the prior vulnerability curves versus an estimate of 658 million USD using posterior curves (i.e., ~38% lower). It should be noted that we used the coordinates of the population 255 centre determined for each region (Sect. 3.2) to extract the wind speed data (Sect. 3.1).

Discussion and conclusions
In this paper, we present a framework that uses Bayesian updating with social media data (YouTube videos) to create eventspecific vulnerability curves. This framework uses the zero-one inflated beta distribution, which allows us to use postdisaster observations to create vulnerability curves that have been adjusted for local hazard and building characteristics. We 260 demonstrate their application in a rapid damage assessment of structural damage to buildings caused by Hurricane Dorian in the Bahamas. In our estimation, wind damage to residential buildings is ~38% lower compared to that calculated using preexisting vulnerability curves (i.e., the priors). The largest relative differences were found for medium-and high-quality buildings, which we argue are most likely designed to be designed according to strict building codes (Ministry of Works & Utilities, 2003). 265 However, using social media data to assess building damages has several limitations. Observations from online media are biased, and some demographic groups have easier access to internet resources than other groups (Duggan and Brenner, 2013). In addition, observations tend to focus on the most impacted areas, as these are more newsworthy (Miles and Morse, 2007). While we have aimed to reduce this bias by only including footage that showed relatively large areas, we found very 270 little footage of the less severely impacted parts of the islands, such as the northern and southern tips of the Abaco Islands.
Moreover, the large spread in the observations (Fig. 5) shows that vulnerability is a complex concept. The vulnerability of a single building or part of building to a specific hazard is determined by many factors. In this paper, we only considered sustained wind speed. However, rainfall patterns and other environmental factors could also be important (Hatzikyriakou et al., 2016;Knapp et al., 2010). In addition, while we used three different building classes, the real variation in the strength of 275 these buildings cannot be captured by three relatively simple curves.
The variation in building classes was also difficult to capture. For our case study, we based our classification of damaged buildings on post-disaster imagery. While we aimed to deduce the building quality class from the building structure rather than from the damage to the building, it is likely that bias was introduced in the classification. A better approach would be to 280 use pre-disaster imagery (e.g., Google Street View or Mapillary) or, even better, detailed construction data per building.
Unfortunately, these were unavailable for the Grand Bahamas and the Abaco Islands. This was further complicated by the limited availability of vulnerability curves for the specific built environment. The estimated priors are based on a combination of data sources and expert judgement. This likely caused a relatively large 285 disagreement between priors and observations for some parameters, resulting in a high autocorrelation (Fig. D1, D2 and D3).
This calls for the establishment of a database of vulnerability curves for hurricane winds that considers a wide range of building typologies and strengths across a wide spectrum of wind intensities.
We applied the method using observations from online media. However, point observations from other sources could also be 290 used. For example, survey data collected by experts or insurance claims would likely be more reliable. However, the availability of such observations from such sources during the first days following a disaster is generally limited, which reduces their applicability for rapid post-disaster damage assessments. Our method could be applied to other hazard types, such as floods and earthquakes. However, building damage caused by standing water may be more difficult to observe from pictures than structural damage caused by earthquakes. Therefore, for floods additional data, such as data from surveys or 295 insurance reports, may be required.  where is 0.05. In simpler terms this means that the probability that the response variable is a true zero, is near-one when is also near-zero, and near-zero otherwise ( 50 or median in Fig. 5). The standard deviation is set at 10% of the mean. Likewise, 5 and 6 are chosen such that 1 (i.e., the probability that the damage ratio is one) is 0.01 where is 0.95 and 1 is 0.99 where is 0.99. Likewise, the standard deviation is set at 10% of the mean.

Author contribution
JB developed the methodology and took lead in writing the manuscript. JD took part in the creative process and assessed the damage ratios. AP, RG and JM collected and analyzed the exposure data. MR assisted in writing the manuscript. SK and HM assisted in development of the methodology. NB created the wind field. JA assisted in writing and oversaw the creative process. 345

Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Disclaimer
The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not 350 necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of its Executive Directors or the governments they represent. Bahamas" (https://www.youtube.com/watch?v=hvCQtLWW-y4) containing (left) a medium quality building with structural detailing with roof damage and water intrusion to part of the building's structure, non-structural damage (ca. 25% damage) and (right) a low quality building (with minor structural detailing), missing its front wall, roof damage, and significant debris (ca. 50% damage).