Road assessment after flood events using non-authoritative data

This research proposes a methodology that leverages non-authoritative data to augment flood extent mapping and the evaluation of transportation infrastructure. The novelty of this approach is the application of freely available, non-authoritative data and its integration with established data and methods. Crowdsourced photos and volunteered geographic data are fused together using a geostatistical interpolation to create an estimation of flood damage in New York City following Hurricane Sandy. This damage assessment is utilized to augment an authoritative storm surge map as well as to create a road damage map for the affected region.


Introduction
Accurate and timely flood assessments are critical during all phases of a flood disaster.In addition, knowledge of road conditions and accessibility is especially important for emergency managers, first responders, and residents.Over the past two decades, the use of satellite remote sensing has become a standard technique for the identification of flood extent.Satellite remote sensing data provide high spatial resolution and the capacity to provide information for areas of poor accessibility or lacking in ground measurements (Smith, 1997).However, in the case of hurricanes, high-resolution remote sensing data from satellites might be unavailable for days because of cloud cover or orbital limitations of revisit time.
Satellite data are often supplemented with additional data, such as digital elevation models (DEM) and river gauge data, to provide a more comprehensive flood assessment (Wang et al., 2002;Brivio et al., 2002).RADAR data, in particular, are often a good resource for flood identification because of the capability to distinguish water bodies from other land cover while penetrating through vegetative canopy and cloud cover (Laura et al., 1990;Townsend and Walsh, 1998).Because the application of RADAR data can be difficult due to limited swaths and long revisit times, there are many recent efforts for increasing RADAR's availability and accessibility.For example, Hoelzl et al. (2003) illustrate how a RADAR instrument on an unmanned aerial vehicle (UAV) can be used for flood assessment of targeted areas.Sohn et al. (2008) propose a multi-sensor approach by combining satellite, aerial, and ground data for a more accurate flood assessment.They test how a RADAR sensor onboard a UAV can provide useful data.Aerial platforms, both manned and unmanned, are particularly suited for coastal monitoring after major catastrophic events because they can fly below the clouds, and thus acquire data in a targeted and timely fashion.
In addition to capturing the location and progression of a flood event, remote sensing data are also used to catalog damage to the built environment.In particular, information regarding the accessibility, obstruction, or damage to roadways and bridges is imperative for emergency responders.While a functioning transportation network is essential in day-today life, it is particularly critical during and after disasters.For the evaluation of transportation infrastructure following Hurricane Katrina, a variety of assessment techniques were utilized including visual, non-destructive, and remote sensing.However, the assessment of transportation infrastructure over such a large area could have been accelerated through the use of high-resolution imagery and geospatial analysis (Uddin, 2011).
Recent studies have focused on the application of remote sensing data after earthquakes or flooding specifically to Published by Copernicus Publications on behalf of the European Geosciences Union.assess transportation networks.Butenuth et al. (2011) used multi-sensor, multi-temporal imagery to identify flooded roads.Ehrlich et al. (2009) identified, using pre-and post-disaster very high-resolution (VHR) optical imagery (1m or better), infrastructure and road damage after the 2008 Wenchuan earthquake.The combination of optical satellite imagery with a DEM to assess roads for accessibility after flooding was used to create a model for application in near-real time for emergency managers (Frey and Butenuth, 2011).
The integration of new data sources and methods with traditional approaches offers opportunities to provide additional information regarding on-the-ground conditions.For example, non-authoritative data describe any data which are not collected and distributed by traditional, authoritative emergency management methods and agencies.Specifically, these data are generated, and often distributed, by public citizens and offer opportunities to gain additional insight during and after hazard events.For example, volunteered geographic information (VGI) is an emerging and quickly growing data source (Goodchild, 2007).These data are voluntarily contributed, made available, and contain temporal and spatial information.The sources of VGI vary greatly and include pictures, videos, sounds, text messages, etc.An unprecedented and massive amount of ground data have become available through VGI, often in real time.
Although by definition non-authoritative data usually carry little scientific merit, it is still possible for them to yield useful information.For example, VGI have been evaluated during disaster and crisis events as a source of situational awareness or as documentation of an event's progression over time (De Longueville et al., 2009;Vieweg et al., 2010).Volunteered data have also been utilized specifically during flood events.For rapid flood damage estimation, Poser and Dransch (2010) interpolated flood inundation depth from VGI and found estimates to be comparable to interpolated in situ measurements as well as model predictions.McDougall (2011) estimated flood extent by using VGI and river gauge data to create a DEM which was then compared to the natural topographic surface.Furthermore, by fusing multiple sources of non-authoritative data it is possible to create an estimate of flood extent when remote sensing data are lacking or incomplete (Schnebele et al., 2014).
Another source of non-authoritative, volunteered information harnesses the power of group contribution, or the "wisdom of crowds" (Surowiecki, 2005).Crowdsourcing, a process where a task is undertaken by a large group of people rather than by a single individual or expert, often can result in successful problem solving (Howe, 2006).Examples of successful crowdsourcing include Wikipedia and Open Street Map, where information is voluntarily contributed and the public manages content and errors. 1 Goodchild and Glennon (2010) found the use of crowdsourcing during disasters to provide valuable information, although, like any volunteered, non-authoritative data source, there still can be issues related to data quality.
Because of issues related to uncertainty in nonauthoritative data, such as reliability and quality, they have yet to be regularly and systematically applied during large scale disasters (Flanagin and Metzger, 2008;Schlieder and Yanenko, 2010;Tapia et al., 2011).But despite their non-scientific nature, their integration with traditional data sources offers opportunities to include new and additional information which harnesses the power of 'citizens as sensors' and 'wisdom of crowds' to fill in the gaps (Surowiecki, 2005;Goodchild, 2007;Sui and Goodchild, 2011).
This paper utilizes crowdsourced aerial remote sensing data along with volunteered geographic data for flood damage assessment and the identification of road damage in the New York City area following Hurricane Sandy.Hurricane Sandy was a major storm which impacted a large portion of the US East coast in October 2012 with damage and recovery costs estimated to be between 50 and 60 billion dollars.2 2 Data

Volunteered geographic data
Geolocated videos which documented flooding and damage from Hurricane Sandy were collected from a Hurricane Sandy Google Earth site where posted geolocated YouTube videos from Storyful could be accessed. 3YouTube, a videosharing website, is utilized by millions of people for the sharing of videos covering a wide range of topics and experiences.Through this site the public voluntarily shares information, often documenting damage resulting from natural hazards.
Twitter, a social networking site, is often utilized by the public to share information about their daily lives through micro-blogging.Arizona State University's TweetTracker provided Twitter data for this project. 4Tweets generated in the New York City area extending from 40.92 • -40.54 • N latitude and 73.75 • -74.13 • W longitude from 26 October-3 November 2012 containing the word "flood" were used to provide a temporal framework.

Crowdsourced data
The Civil Air Patrol, the civilian branch of the US Air Force, was tasked with collecting aerial photos of the US East Coast following the impact of Hurricane Sandy.Within days of the storm making landfall, hundreds of missions were flown by volunteers from Cape Cod, MA to Cap May, NJ.From these missions, thousands of aerial photos of the coastline were generated, including those documenting heavily flooded areas.
The photos were placed on a Hurricane Sandy Google Crisis Map website (Fig. 1) for the public to assess visible damage through a crowdsourcing portal supported by Map-Mill. 5This yielded a large damage assessment data set generated from crowdsourced, non-authoritative, non-traditional sources.The photos were also made available online through a Federal Emergency Management Agency (FEMA) website for residents to search by street address to see what, if any, damage their homes may have sustained. 6

Authoritative data
The FEMA Modeling Task Force (MOTF) created storm surge maps for the US East Coast following Hurricane Sandy.Surge extent was determined from field-verified high water marks and storm surge sensor data.FEMA employed these data along with a digital elevation model (DEM) to create a surge boundary for each state.
A FEMA MOTF shapefile was downloaded from FEMA's GeoPlatform website and imported into ArcGIS 10 for analysis. 7The GeoPlatform site supplies data and analytics for emergency management.The shapefile utilized for this research was the finalized version (dated 14 February 2013) for  The model begins with the integration of non-authoritative data (i.e.crowdsourcing and VGI) to create a damage assessment.The step is methodindependent and can be performed using any method best suited for a particular combination of data and location.Because this step is not limited to a specific data type, it can easily be extended to integrate additional or different sources.After a damage assessment is created from non-authoritative data, it is integrated with available authoritative data to enhance the damage assessment.This step can be in the form of validation, if "ground truth" data are available, or can consist of an additional integration step whereby authoritative and nonauthoritative data are incorporated to fill in gaps in the spatial or temporal data infrastructure.The final step is the classification of roads which may be 8 Fig. 2. Flowchart illustrating the methodology for determining road damage from non-authoritative data.
New York City with a 1 m horizontal resolution and a New York State Plane coordinate system (Fig. 3a).

Road layer
A 2012 TIGER/line ® shapefile of road networks for the New York City area was downloaded from the US Census Bureau. 8The layer was georeferenced to New York State Plane coordinates in ArcGIS 10. Figure 3b displays the road network for the New York City area as well as the surge extent created by FEMA.

Overview
This work is based on the fusion of non-authoritative data and its integration with traditional authoritative sources.Figure 2 illustrates the general methodology where non-authoritative data from multiple sources are combined to produce a spatial and temporal assessment of the disaster.While the precise definition of data fusion will vary by discipline, for example, in computer science the process of data integration is considered to be the "data fusion"; in this work data fusion refers to the model in its entirety.The methodology consists of a three step process: 1. non-authoritative damage assessment; 2. integration with authoritative data for damage assessment; 3. generation of road damage map.The model begins with the integration of non-authoritative data (i.e., crowdsourcing and VGI) to create a damage assessment.The step is method-independent and can be performed using any method best suited for a particular combination of data and location.Because this step is not limited to a specific data type, it can easily be extended to integrate additional or different sources.After a damage assessment is created from non-authoritative data, it is integrated with available authoritative data to enhance the damage assessment.This step can be in the form of validation, if "ground-truth" data are available, or can consist of an additional integration step whereby authoritative and non-authoritative data are incorporated to fill in gaps in the spatial or temporal data infrastructure.The final step is the classification of roads which may be compromised as a result of flooding.This is accomplished by applying a road network to the damage assessment.Depending on data availability and flood event characteristics, a temporal assessment of the flood event may be generated in addition to the spatial assessment.The specifics for each step as they apply in this paper are discussed Sects.3.2-3.4.
The novelty of this approach is the utilization of nonauthoritative data to produce flood and road damage assessments.Although in this work specific crowdsourced data (Civil Air Patrol photos) and volunteered data (YouTube videos, Tweets) are utilized, this methodology can be extended to other sources.The goal of this paper is to illustrate how non-authoritative data can augment existing data and methods as well as optimize response initiatives by identifying areas of severe damage.

Non-authoritative damage assessment
We integrate non-authoritative data by interpolating to create a damage assessment surface.The geostatistical technique of kriging creates an interpolated surface from the spatial arrangement and variance of the nearby measured values (Stein, 1999).Kriging allows for spatial correlation between values (i.e., locations/severity of flooding) to be considered and is often used with Earth science data (Oliver and Webster, 1990;Olea and Olea, 1999;Waters, 2008).Kriging utilizes the distance between points, similar to an inverse weighted distance method, but also considers the spatial arrangement of the nearby measured values.In addition, a kriging interpolator is capable of providing some measure of error associated with the predicted values (Stein, 1999).A variogram is created to estimate spatial autocorrelation between observed values Z(x i ) at points x 1 , . .., x n .The variogram determines a weight w i at each point x i , and the value at a new position x 0 is interpolated as (1)

Integration with authoritative data
For this research, authoritative data in the form of a storm surge map created by FEMA MOTF is utilized to (1) illustrate how non-authoritative data can provide a range of damage estimations enhancing traditional storm surge products and (2) as a comparison of authoritative estimated flood extent.The damage assessment surface created from the nonauthoritative data is first limited to the FEMA estimated flood boundary to illustrate how non-authoritative data provide a range of damage values in contrast to the binary assessment (flooded/not flooded) provided by the FEMA MOTF map.Second, the area (m 2 ) classified as flooded by FEMA is used as a baseline against which the flooded area (m 2 ) estimated from non-authoritative sources can be measured.

Generation of road damage map
The identification of affected roads is accomplished by pairing a road network with the damage assessment surface.A layer comprising a high-resolution road network is added to the damage assessment surface layer.Roads are then identified as potentially compromised or impassable based on the underlying damage assessment.The classification of roads is accomplished in ArcGIS 10 using the clip tool to select roads which are located within each damage class.Depending on the range of damage values as well as the scale of the domain, the classes can then be aggregated to facilitate a reduction in complexity and present a clearer representation.
Potentially affected roads could also be classified as a function of distance from the flood source (i.e., river or coastline) or distance from the flood boundary.The value (G i ) for each grid is given by where n i is the number of photos in grid i, m i is their mean value, and N is the maximum number of photos in any grid.As a result, each grid has a value from 1-10, with 1 representing no damage and 10 severe damage/flooding.The videos were provided with geolocation information and were visually assessed by the authors to confirm the documentation of flooding.The small number of videos (n = 15) did not require any crowdsourcing or automated assessment.Furthermore, it is shown in Schnebele and Cervone (2013) that even a small number of properly located VGI data can help improve flood assessment.Each video point was assigned a value of 10 (severe damage/flooding).
The Civil Air Patrol and YouTube data were fused together using a kriging interpolation as described in Sect.resulting in a damage assessment surface generated solely from non-authoritative data.Ordinary kriging generated a strong interpolation model.Cross-validation statistics yielded a standardized mean prediction error of 0.0008 and a standardized root-mean-squared prediction error of 0.9967.Figure 3c illustrates the damage assessment within the boundaries of the FEMA surge extent.The damage assessment scale derived from the crowdsourced photos and geolocated videos is illustrated in Fig. 4. A histogram (Fig. 5) shows the ranges in these damage assessment values.The peak in medium/severe damage values (7-8) illustrates how non-authoritative data can provide damage information not conveyed in the FEMA map.
Ground information in the form of geolocated videos (Fig. 6) enhances the non-authoritative data set by providing flood information not conveyed in the Civil Air Patrol photos.As illustrated in Fig. 7, the locations of the videos (green triangles) did not coincide with locations of photos rated as medium/severe damage (larger orange circles, values 7-10).Reasons for this disparity may include the fact that flooding captured on video had receded before the Civil Air Patrol flights or because the images were captured at night, or because flooding may have occurred in areas which were not in a flight path or were unable to be seen from aerial platforms (i.e., flooding in tunnels, under overpasses).By using multiple data sources, flood or damage details not captured by one source can be provided by another.
A comparison of flood surface area between the two maps was also conducted.The storm surge area on the FEMA map is approximately 121 km 2 .Using the higher rated areas of damage (regions with values from 7-10) from the non-authoritative assessment yielded an approximate surface area of flooding and damage of 157 km 2 (Fig. 8).Using only the areas classified as medium-severely damaged, the surface area generated from non-authoritative sources is within 23 % of FEMA's surge extent for New York City.Overall, there is a very good agreement between the FEMA flood extent and the classified Civil Air Patrol photos with an approximate 1 % discrepancy.Figure 9 shows examples of agreement between photos identifying flooding/damage and the FEMA-generated flood extent, while Sources of error in non-authoritative data, such as incorrect information (false positive/negative) or improper geolocation needed to be considered.Incorrect information can be mitigated by including visually verified photos/videos and the application of multiple sources.Crowdsourcing, in particular, can increase accuracy and enhance information reliability compared to single source observations (Giles, 2005).Geolocation errors can be reduced with automation.
Sparse data or data skewed in favor of densely populated or landmark areas makes the use of non-authoritative data sources especially challenging.Increasing data volume and integrating authoritative data into the methodology can yield increased confidence and include underrepresented areas.Although non-authoritative data can provide timely, local information, they are often viewed with uncertainty.Conversely, the verification and authentication of authoritative data can be slower to ascertain and collect but yield trusted results.

Temporal assessment
For this study, Twitter data from TweetTracker9 were used to provide a temporal rather than spatial assessment.Although tweets were geolocated using TweetTracker (Kumar et al., 2011(Kumar et al., , 2012)), uncertainty in their location did not allow for a study at a street resolution.However, they provide precise temporal information that can be used to understand the progression of the surge extent over time.To understand the temporal progression is crucial during and after flood events, and is very hard to understand using remote sensing instruments, due to their inherent carrier limitations.Twitter data can effectively be used to overcome this limitation.For example, Fig. 12 illustrates how the peak in the number of tweets containing the word "flood" occurs 29 and 30 October 2012 coinciding with the majority of flooding when Hurricane Sandy made landfall the night of 29 October and the continued flooding on 30 October.

Road damage map
The non-authoritative assessment was also utilized to identify areas of potential road damage.Although, for the sake of comparison, the damage assessment was limited to within the authoritative FEMA surge extent area (Fig. 3c), for the classification of road damage, the area was not limited to the authoritative extent.The fusion of the non-authoritative data predicted flooding and damage outside the FEMA flood extent boundaries, so the full damage assessment was utilized for the road classification.
The road network from the TIGER/line ® shapefile was layered over the damage assessment map.Roads were then classified using the damage assessment layer by clipping and then segregating them from the original road network layer (Fig. 3b).This yielded 10 individual road classes, with values from 1-10, representing the original 10 damage classes from the interpolation of the gridded Civil Air Patrol crowdsourced photos and YouTube videos.To improve the readability of the map, the classes were aggregated into 4 groups: roads classified with values between 1-3 were considered to have no damage and were not included in Fig. 3d remaining classes were aggregated into slight (values 4-6), medium (value 7), and severe (values 8-10) damage.By using the damage assessment layer along with a highresolution road network layer, roads which may have severe damage can be identified at the street level.This is critically important during disasters when evacuations and response initiatives are paramount.For example, following the Colorado floods of September 2013, over 1000 bridges required inspection and approximately 200 miles of highway and 50 bridges were destroyed.10Rapid and directed identification of affected areas can aid authorities in prioritizing site visits and response initiatives as well as task additional aerial data collection.

Conclusions
The application and integration of non-authoritative data offers opportunities to augment traditional data and methods for flood extent mapping and damage assessment.Although questions of reliability and validity are of concern when utilizing non-authoritative data, especially during natural disasters, these data can be employed along with traditional authoritative data and methods to enhance our knowledge of ground conditions.The fusion of multiple non-authoritative data sources helps to fill in gaps in the spatial and temporal coverage.In addition, the ability to identify potential areas of road damage or inaccessibility from flooding can optimize response initiatives.

Figure 2 :
Figure 2: Flowchart illustrating the methodology for determining road damage from nonauthoritative data.

Figure 3 :Figure 4 :
Figure 3: Storm surge extent generated by FEMA and the road layer for New York City area (a and b).Flood damage assessment generated from non-authoritative data and the subsequent classification of potential road damage (c and d).
Fig. 3. (a) Storm surge created by FEMA MOTF for New York City.(b) Road network for NYC area and FEMA flood extent.(c) Damage assessment generated from non-authoritative data within FEMA surge boundary.(d) Road damage assessment based on analysis of nonauthoritative data.Storm surge extent generated by FEMA and the road layer for New York City area (a and b).Flood damage assessment generated from non-authoritative data and the subsequent classification of potential road damage (c and d).

Fig. 7 .
Fig. 7. Locations of Civil Air Patrol photos and geolocated videos documenting flooding.

Figure 9 :
Figure 9: Agreement between Civil Air Patrol photos and FEMA evaluation for flooded (a and b) not flooded (c and d).

Fig. 10 .
Fig. 10.(a) Flooding documented by the Civil Air Patrol but not estimated by FEMA.(b) Flooding documented by the Civil Air Patrol but not estimated by FEMA.(c) Flooding estimated by FEMA but not confirmed by the Civil Air Patrol.(d) Flooding estimated by FEMA but not confirmed by the Civil Air Patrol.Disagreement between Civil Air Patrol photos and FEMA evaluation for flooded (a and b) not flooded (c and d).

Fig. 11 .
Fig. 11.Locations of photos which illustrate agreement and disagreement between FEMA and the crowdsourced data.

Fig. 12 .
Fig. 12. Progression of tweets mentioning the word "flood" in the New York City area.