Relationship between the spatial distribution of SMS messages reporting needs and building damage in 2010 Haiti disaster

Abstract. Just 4 days after the M = 7.1 earthquake on 12 January 2010, Haitians could send SMS messages about their location and urgent needs through the on-line mapping platform Ushahidi. This real-time crowdsourcing of crisis information provided direct support to key humanitarian resources on the ground, including Search and Rescue teams. In addition to its use as a knowledge base for rescue operations and aid provision, the spatial distribution of geolocated SMS messages may represent an early indicator on the spatial distribution and on the intensity of building damage. This work explores the relationship between the spatial patterns of SMS messages and building damage. The latter is derived from the detailed damage assessment of individual buildings interpreted in post-earthquake airborne photos. The interaction between SMS messages and building damage is studied by analyzing the spatial structure of the corresponding bivariate patterns. The analysis is performed through the implementation of cross Ripley's K-function which is suitable for characterizing the spatial structure of a bivariate pattern, and more precisely the spatial relationship between two types of point sets located in the same study area. The results show a strong attraction between the patterns exhibited by SMS messages and building damages. The interactions identified between the two patterns suggest that the geolocated SMS can be used as early indicators of the spatial distribution of building damage pattern. Accordingly, a statistical model has been developed to map the distribution of building damage from the geolocated SMS pattern. The study presented in this paper is the first attempt to derive quantitative estimates on the spatial patterns of novel crowdsourced information and correlate these to established methods in damage assessment using remote sensing data. The consequences of the study findings for rapid damage detection in post-emergency contexts are discussed.


Introduction
As natural disasters have become major threats to human life and to nations' economies, collaborative crisis technologies are being developed in order to enhance the capacities of governments and organizations in crisis management.The added value of these technologies is related to the timely provision of relevant information for effective decision-making in crisis management and disaster response (Al-Khudhairy, 2010).One of the major new developments in collaborative crisis technologies is crowdsourcing which has been defined as the "the act of taking a job traditionally performed by a designated agent ... and outsourcing it to an undefined, generally large group of people in the form of an open call" (Howe, 2008).Crowdsourcing is attributed with social convergence because it facilitates people's participation in emergency management, it supports local and collective intelligence in a crisis, and it counterbalances mass media through citizen journalism (Dandoulaki and Halkia, 2010).There is a mounting number of experiences in the use of crowdsourcing in crisis management: New Orleans floods in 2005, California wildfires in 2007, l'Acquila earthquake in 2009, etc.The most unprecedented example is, undoubtedly, the devastating Haiti earthquake on 12 January 2010.Just hours after the earthquake struck, a global effort to leverage existing internet and mobile technologies, including social networking platforms, was set up.
Ushahidi (http://www.ushahidi.com/),a platform that gathers distributed data via SMS, e-mail, and social web sites was the first to deploy its capacities to ascertain the needs of victims and other relief and aid requirements on the ground (Hattotuwa and Stauffacher, 2011).Thanks to this platform, the affected communities in Haiti could text their location and urgent needs to a free number "4636".The received messages were then translated, categorized and geolocated for visualization on a map.More than 3500 reports were mapped almost in real-time on the Ushahidi platform 4 eats, infrastructure damage, natural hazards and services available.The model used by Open Street Map enabled "4636" project volunteers to associate ith a geographical location and generate a crowd-sourced map of events (Fig. 3).between 12 January and the end of March 2010.The greatest number of reports were received up to 23 January, and dropped off almost immediately thereafter.
In the meantime, another type of crowdsourcing activity emerged in the first week after the disaster with the aim of identifying damage to buildings using earth observation technology.More than 600 experts from 23 different countries joined the World Bank-UNOSAT-JRC team to assess the damage states of individual buildings using very high resolution aerial imagery (Corbane et al., 2011).The main outcome from this collaborative effort was a detailed building damage assessment that was finalized within two months of the earthquake and shared with the Haitian government in support of the Post-Disaster Needs Assessment (PDNA) and Recovery Framework (Kemper et al., 2010).
Both the crisis reports collected on Ushahidi and the remote sensing derived building damage assessment illustrate how the synergy between technology developments and crowdsourcing can generate crucial information to the response phase following a major disaster.However, while crisis reports were made available shortly after the disaster and mapped in near real-time, the detailed building damage assessment required a much larger effort and a longer period of time for completion.This difference in timeliness of precise damage location reports suggests that the labour intensive image interpretation work could potentially benefit from the information gathered from the crowdsourced crisis information.The present work is the first quantitative study aimed at analysing the complementary relationship between the two data sources.The approach involves studying the relationship between the spatial distributions of crisis reports and building damage in order to explore the potential of mod- Google Earth and a dedicated FTP server.The aerial photographs were analysed by a group of 12 volunteers who attempted to visually assess each individual building and assign it to a damage 13 grade based on the European Macroseismic Scale -1998 (EMS-98) classification schema 14 (Grünthal and Levret, 2001).EMS-98 scale includes five damage grades: 1no visible 15 damage; 2minor damage; 3moderate damage; 4very heavy damage; and 5destroyed.16 For the purpose of this study, damage grades 4 and 5 were grouped together (grade 4 and 5) 17 because these grades include buidlings that are beyond repair (i.e."total losses" in a 18 reconstruction sense) and are the most likely locations where human victims may be present 19 (i.e. a focus for search and rescue).All building centroids were marked including those that 20 did not exhibit visible damage (EMS grade 1).The overall building damage data set within 21 the previously defined study area consisted of a total of 161281 points with 120137 points 22 assigned to damage grade 1, 7348 points assigned to damage grade 3 and 33796 assigned to 23 damage grades 4 and 5 (Fig. 3).24 To analyze the accuracy of the remote sensing-based damage assessment, a validation dataset 25 consisting of 6,492 buildings was created from ground surveys (Corbane et al., in press).26 Based on the comparison of the remote sensing results to the ground survey data using the 27 original five damage categories in the ground survey ( EMS-98 grades 1 to 2, grade 3, grade 28 4, and grade 5), the overall accuracy of the remote sensing results reached 61%.However, 29 when the damage grades were aggregated into only three categories (i.e., grades 3 or less, 4, 30 elling the spatial pattern in structural building damage based on geolocated crisis reports.The paper is organized as follows: first, the dataset including crisis reports and building damage assessment is described (Sect.2).Then the methodology and the results of the analysis of the patterns of crisis reports and building damage are presented (Sect.3).An attempt to model the building damage distribution is then presented and discussed (Sect.4).The final section (Sect.5) summarizes the main conclusions and perspectives for further investigations.

Crisis reports recorded on Ushahidi platform
For this study, a test area of approximately 9 by 9 km in Portau-Prince, Haiti (centre coordinate: 18.547 • N, 72.312 • W), was selected (Fig. 1).In the period between January 12 and the end of March, 3596 actionable crisis reports were mapped on Ushahidi.Those included in the study area represent a dataset of 1645 crisis reports.As shown in Fig. 2, the greatest number of reports were recorded up to 23 January and dropped off almost immediately thereafter.This corresponds to the end of the emergency period announced by the United Nations (Heinzelman and Waters, 2010).Reports sent after 23 January were mostly unrelated to disaster response.The "4636" project volunteers were working on the translation, geotagging, and categorization of SMS messages and numerous email-, web-and social mediacommunications.Each message was categorized into one of the following categories: emergency, public health, security threats, infrastructure damage, natural hazards, and services available.The collaborative model used by Open Street Map enabled "4636" project volunteers to associate the reports with a geographical location and generate a crowdsourced map of events (Fig. 3).

Building damage assessment based on airborne imagery
On 18 January, high resolution aerial photography (with 15 cm spatial resolution) was acquired over the affected areas in Haiti by Google Inc. and made publicly accessible via Google Earth and a dedicated FTP server.The aerial photographs were analysed by a group of volunteers who attempted to visually assess each individual building and assign it to a damage grade based on the European Macroseismic Scale -1998 (EMS-98) classification schema (Grünthal and Levret, 2001).EMS-98 scale includes five damage grades: 1 -no visible damage; 2 -minor damage; 3 -moderate damage; 4 -very heavy damage; and 5 -destroyed.
For the purpose of this study, damage grades 4 and 5 were grouped together (grade 4 and 5) because these grades include buidlings that are beyond repair (i.e."total losses" in a reconstruction sense) and are the most likely locations where human victims may be present (i.e. a focus for search and rescue).All building centroids were marked, including those that did not exhibit visible damage (EMS grade 1).The overall building damage data set within the previously defined study area consisted of a total of 161 281 points, with 120 137 points assigned to damage grade 1, 7348 points assigned to damage grade 3, and 33 796 assigned to damage grades 4 and 5 (Fig. 3).
To analyze the accuracy of the remote sensing-based damage assessment, a validation dataset consisting of 6,492 buildings was created from ground surveys (Corbane et al., 2011).Based on the comparison of the remote sensing results to the ground survey data using the original five damage categories in the ground survey (EMS-98 grades 1 to 2, grade 3, grade 4, and grade 5), the overall accuracy of the remote sensing results reached 61 %.However, when the damage grades were aggregated into only three categories (i.e.grades 3 or less, 4, and 5), the overall accuracy increased to 73 %.The commission and omission errors suggested that there is a lot of confusion, even for grade 5. Further analysis of the errors has been undertaken and these results showed that 20 % of the total error, which is directly attributed to the interpreter, can be avoided through better damage assessment protocols (Shankar, 2010).building damages and the relationship between them Given these two independent point datasets, it is attempted, firstly, to study the spatial pattern of each individual dataset and, secondly, to explore the relationship between them.This empirical step is intended to analyse the potential of geolocated crisis reports in predicting an early estimate of the patterns of structural damage following a large earthquake.The geolocated crisis reports and the centres of individual buildings identified on the aerial photos were regarded as marked spatial point patterns (Illian et al., 2008) since the locations were recorded as two dimensional points (X,Y ) in a geographical space and each location included information about the point (i.e.whether the point is a SMS or a building damage state).Such data correspond to the family of spatial point processes called multivariate spatial point patterns, which are forms of marked point patterns that have a small number of qualitative marks (e.g.SMS message text, damage grade 1, grade 2, etc.) (Dixon, 2002).Several methods exist for the analysis of marked spatial point patterns.They can be broadly classified into two types (Perry et al., 2006): the first consists of area-based statistics and rely on various characteristics of the frequency distribution of the observed numbers of points in regularly denied subregions of the study area (cells).The area-based approach suffers from certain limitations, not only when applied to the analysis of a set of points but also when analysing the relationship between point distributions.This is because the results of area-based statistics depend on the choice of the cell size and they are often insufficient for distinguishing different distributions (Fard et al., 2008).The second type of method corresponds to distance-based techniques and uses information on the spacing of points to characterize the patterns.Modern statistics for spatial point patterns (Moller and Waagepetersen, 2007) are tailored to the second type of techniques as they tend to describe the short-range interaction among the points, which explains the mutual positions of the points.Quite often this concerns the degree of attraction or repulsion among points and the spatial scales at which it operates.This family of statistics allows both characterizing an entire pattern and modelling the geometrical properties of the structure represented by the points (Illian et al., 2008).Hence, it represents an interesting avenue to be researched for studying the spatial patterns of crisis reports and building damage.In the following, some brief and basic properties of spatial point patterns are introduced (Sect.3.1).Then the individual patterns as well as the relative distributions of crisis reports and building damage are analysed using explorative statistical analysis adapted to spatial point processes (Sect.3.2.This step is essential for the fitting and the validation of a suitable parameterized model that is consistent with the observed point pattern (Sect.4).

Marked spatial point patterns
From a probabilistic point of view, a point pattern is a realization of a point process X.A point process can be described as a probabilistic model producing almost surely locally finite sets of locations (x 1 ,x 2 ,..,x n ) in a sampling window W ⊂ 2 (Diggle, 1986).In addition to each point u in a spatial point process X, there may be an associated random variable m u called a mark.The process = {(u,m u ) : u ∈ X} is called a marked point process (Stoyan and Stoyan, 1996).It is commonly assumed in certain applications that the point process is stationary, i.e. its probability distribution is invariant against translations.We will assume that this hypothesis holds.Ripley's K-function (Ripley, 1976) is based on this assumption: Intuitively K(r) is the mean (= E = expected) number of other points of the process lying within a circle radius r, centered about a typical point (x,y) of the process, divided by the intensity λ of the process (Mattfeldt, 2005).In the case of univariate spatial patterns, we define the self-K function K ii (r).
In the case of a marked point patterns, the generalization of K(r) to more than one type of point is called the cross-K function and is computed as follows: The idea behind the cross-K function is the average number of other points (j ) found within the distance r from the typical point (i).
The K-function can be used not only to summarize the point pattern but also to test hypotheses about the pattern, estimate parameters, and fit models.
In practice, it is recommended to use the L-function, which is a variance stabilizing transformation when the K-function is estimated by non-parametric methods (Besag and Clifford, 1989) (Eq.3): As for Ripley's K-function, we can define a univariate L ii (r)and multivariate L ij (r) estimators.The L ii (r) and L ij (r) numerical descriptors are concerned with detecting and describing, respectively, the intra-type correlations (i.e.relationship between points with the same mark) and intertype correlations (i.e.relationship between points of different marks).They are hence suitable for exploring the patterns of crisis reports and building damage and for analysing the relationships between them.

Results of intra-type and inter-type correlations
The self-function L ii (r) and the cross-function L ij (r) were applied to the marked point dataset consisting of the geolocated crisis reports and the individual buildings assigned to  damage grades 1, 3, 4, and 5.They were used to evaluate evidence for (i) clustering or dispersion of points patterns of the same type and (ii) attraction or repulsion between points of different types.This is usually performed by comparing the distributions of the observed values of L(r) to theoretical values of a Poisson point process in which the events are distributed independently according to a uniform probability distribution over the region W and they do not exhibit any form of interaction.As most spatial statistics, the L(r) function requires tests of clearly identified hypotheses.The classical null hypotheses H 0 against which the L(r) function is usually tested are independence or complete spatial randomness (CSR) (Dixon, 2002).As the theoretical distributions of the estimators are unknown, the corresponding confidence intervals are commonly estimated through Monte Carlo simulations of the specific null hypothesis H 0 (Diggle, 1983).For the analysis of inter-type correlations, we used the hypothesis of CSR which indicates no clustering or dispersion of data points of the same type.For the analysis of intratype correlations, we used the hypothesis of independence of populations which corresponds to the absence of attractions or repulsion between data points of different types.All the analyses were implemented in R software using spatstat (Baddeley and Turner, 2005) and ads packages (Pelissier and Goreaud, 2007).
Figure 4 shows the results of the computation of selffunctions L ii used to study inter-type correlations for building damage grades 1 (Fig. 4a), 3 (Fig. 4b), 4 and 5 (Fig. 4c), and for crisis reports (Fig. 4d), under the hypothesis of CSR.It allows studying the intra-type correlations that correspond to the spatial pattern of each level of building damage as well as the spatial pattern of crisis reports.
The number of simulations was set to 1000 and the local confidence intervals were computed at significance level α = 0.1.Ripley's edge effect correction was applied when the sample circles overlap the boundary of the sampling window (Ripley, 1977).Due to edge effect correction, the maximum radius of the sample circles is usually set to be half the longer side for a rectangle sampling window (Goreaud and Pélissier, 1999).In our case, the study area is 9 × 9 km, therefore L(r) function was estimated for distances up to 4.5 km in r = 10 m increments.
The values of the self-functions L ii are positive for all building damage grades (Fig. 4a, b, and c) and for crisis reports (Fig. 4d).In addition, the observed L( r hypothesis of independence of populations.This hypothesis leads to an interpretation of the 10 spatial interaction between the two a priori different populations of crisis reports and building 11 damage.Each building damage grade has been considered separately in this analysis allowing 12 to study the differences in interactions' type (e.g.attraction or repulsion) and strength 13 depending on the observed damage grade.The results show that for interactions between 14 crisis reports and EMS-98 damage grade 1 (Fig. 5a) the computed L ij lies outside the Monte-15 Carlo simulation envelope (90% confidence interval) indicating significant interactions 16 between crisis reports and non-damaged buildings at mid-range distances (between 1 and 3.5 17 km).Inversely, for interactions between crisis reports and EMS-98 damage grades 3 (Fig. 5b), 18 Monte Carlo simulations and the magnitudes of deviations from CSR are high, which indicate that both small-scale and large-scale clustering patterns characterize the spatial distributions of crisis reports and all levels of building damage.These results suggest that building damage of a same type or grade tend to occur in the vicinity of each other.The same could be said about crisis reports that happen to be sent from neighboring locations.Also noticeable in Fig. 4d is the nugget effect (more than one point at a same location) in the spatial pattern of crisis reports, which is due to many coincident locations in the data.This is related to limitations in the geotagging precision of the received messages, which is based mainly on street addresses.Messages with ambiguously-or approximately-defined addresses create crisis report points which seem to overlap.For instance, 17 different messages assigned to different categories were found at the location of Delmas 73.
Figure 5 represents the cross-L functions L ij used to study inter-type correlations under the hypothesis of independence of populations.This hypothesis leads to an interpretation of the spatial interaction between the two a priori different populations of crisis reports and building damage.Each building damage grade has been considered separately in this analysis allowing the study of differences in interactions' type (e.g.attraction or repulsion) and strength depending on the observed damage grade.The results show that for interactions between crisis reports and EMS-98 damage grade 1 (Fig. 5a), the computed L ij lies outside the Monte-Carlo simulation envelope (90 % confidence interval), indicating significant interactions between crisis reports and non-damaged buildings at mid-range distances (between 1 and 3.5 km).Inversely, for interactions between crisis reports and EMS-98 damage grades 3 (Fig. 5b), L ij lies inside the envelope, suggesting insignificant interactions between crisis reports and moderately affected buildings.When analyzing the interactions between crisis reports and heavily affected/destroyed buildings (EMS-98 damage grades 4 and 5), the following results were obtained (Fig. 5c): (i) for small-scale distances (r < 1 km approximately) the values of L ij are inside the confidence interval of H 0 , indicating independence between the two populations; ii) for large-scale distances (1 < r < 3 km approximately), the observed L ij crosses outside the upper bounds of the envelope indicating an attraction between the two populations; (iii) for larger distances (r > 3 km approximately), the two populations are no longer correlated.A more or less similar behavior is observed when analyzing the symmetric L ij function (Fig. 5d) between building damage grades 4 and 5 and crisis reports.It is also important to note that the magnitude of the attraction between crisis reports and building damage grades 4 and 5 is much larger than the one observed between crisis reports and non-damaged buildings.This supports our initial hypothesis on the existence of a correlation between crisis reports and building damage.In fact, the results show that there is evidence of a statistically significant correlation between the spatial patterns of crisis reports and building damage at distances ranging between 1 km and up to approximately 3.5 km.This means that in 1 to 3.5 km radius circles, the number of crisis reports surrounding a randomly chosen damaged building is greater than expected if the two patterns were completely independent.A possible interpretation to the attraction effect observed at the range of 1 to 3.5 km could be that people tend to move to safe areas before sending the reports.However, below 1 km, the two patterns show a slight but not significant tendency towards attraction, indicated by values of L ij close to the upper confidence boundary.The large difference between the overall densities of damaged buildings and crisis reports may be the reason behind the absence of a significant attraction at short range distances.Below 1 km, few crisis reports were registered with respect to the number of heavily affected and destroyed buildings.
The attraction between the patterns at the identified distances suggests that (i) the two distributions appear to have the same triggering event which in this case corresponds to the earthquake and (ii) that the observed patterns are the result of a pairwise interaction process.Thus, using the results of inter-type and intra-types correlations of crisis reports and heavily affected or destroyed buildings, it is attempted to model the spatial structure of building damage using the locations of crisis reports.

The multi-type Strauss model
The inter-type and intra-type interactions between the two point patterns of crisis reports and building damages have been quantitatively described in the previous section.In the current section, an attempt to model the intensity of building damage using the spatial relationships identified the previous step is presented.Because we are mainly interested in modelling the interaction structure between crisis reports and building damages, the most appropriate spatial model corresponds to the family of Gibbs point processes which allows including interactions between points (Renshaw and Särkkä, 2001).The Strauss process is a special case of Gibbs models that can be used to simulate a wide range of patterns from simple inhibition to clustering (Strauss, 1975).Its extension to multi-type marked point patterns is known as a multi-type Strauss model.This model is potentially suitable for deriving the conditional intensity of building damage based on the pairwise interactions between crisis reports and building damages.The conditional density of a Strauss process is where β(u) is the density at location u, t (u, x) is the number of events x that lie within a distance r of u and the interaction parameter γ controls the strength of interaction between points.For the special case that γ = 1 the Strauss model reduces to the homogeneous Poisson process with constant intensity β (first order term), the case that γ = 0 corresponds to a simple inhibition process, whereas for γ > 1 the model produces a clustered process.For multi-type Strauss models, the second-order or pairwise interaction term c(u, v) u, v ∈ W is given as: where r m,m > 0 are interaction radii for type m with type m , and γ m,m ≥ 0 are interaction parameters.Thus the conditional density defined in Eq. ( 4) for a multi-type Strauss process is defined as: where t i,j (u,x) is the number of points in x, with mark equal to j , lying within a distance r i,j of the location u.The interaction radii and the interaction parameter must satisfy r i,j = r j,i and γ ij = γ j i (Baddeley et al., 2006a).

Fitting the multi-type Strauss model to crisis reports and building damage
To fit the stationary multi-type Strauss process to the marked point pattern, consisting of crisis reports and building damage grades 4 and 5, the maximum pseudo-likelihood method was used and was maximised using an extension of the Berman-Turner device (Baddeley and Turner, 2000).Interested readers may refer to Berman and Turner (Berman and Turner, 1992) for details on the computational device developed for fitting Poisson models and to Baddeley and Turner (2000) for details on its adaptation to pseudolikelihoods of general Gibbs point processes.
Fitting the stationary multi-type Strauss process requires the definition of interaction radii r i,j .Unfortunately, the parameter r i,j cannot be estimated by the Berman-Turner device and its value should be specified a priori (Baddeley et al., 2006b).The most recommended approach is to determine it by inspecting the plots of the Cross L-functions (Stoyan and Grabarnik, 1991).Hence, the results of the explorative statistical analysis (Sect.3.2) based on inter-type and intra-type correlations were used to specify r i,j as follows: r i,j = 10 2000 2000 10 with i corresponding to crisis reports and j to building damage grades 4 and 5.The minimum interaction distance between crisis reports and building damage grades 4 and 5 was set to 2000 in order to accommodate the symmetry constraint of the model.For fitting the model all of the 1645 locations of crisis reports were used, while only 1000 randomly sampled points, representing 3 % of the total damaged buildings, were selected out of the total 33 796 heavily affected/destroyed buildings.The sample size was determined following an analysis of the effect of the sampling procedure on the model graphical outputs.This set corresponding to 3 % of the total number of damaged buildings was found to be a good compromise between: (i) the desired approximation and (ii) memory issues encountered when trying to fit the model with a larger sample corresponding to 10 % of the total damaged buildings.
The estimated first order term β was 0.106 × 10 −6 and the estimated values of the interaction parameter γ ij between crisis reports and building damage grades 4 and 5 were obtained as: γ (reports − reports) = 1.05 γ (reports − grade 4 and 5) = 1.05 γ (grade 4 and 5 − grade 4 and 5) = 7.37 Figure 6 shows the fitted multi-type Strauss model for building damage grades 4 and 5.

Prediction of building damage conditionally to SMS messages
Given the fitted multi-type Strauss model and the estimated parameters, it is possible to evaluate the fitted conditional density λ((u,i),x) of building damage grades 4 and 5 at arbitrary new locations u.Note that x is always taken to be the data pattern to which the model was fitted.For predicting the conditional intensity, a regular square grid of 100 by 100 cells of 90 m was simulated over the observation window of 9 × 9 km size.Figure 7a illustrates the result of predictions at new locations.It shows the values of the conditional density of building damage obtained by the model.It is possible to visually compare the patterns of the predicted conditional density to the observed pattern of building damage density calculated using the full set of heavily affected/destroyed buildings (Fig. 7b).The latter was computed using a convolution of the isotropic Gaussian kernel of standard deviation σ set to 500 (Diggle, 1985).The value of σ was determined empirically by searching for the output that represented the best the overall damage pattern without compromising the sharpness of structural boundaries.

Prediction of building damage conditionally to SMS messages 9
Given the fitted multi-type Strauss model and the estimated parameters, it is 10 evaluate the fitted conditional density


of building damage grade 11 arbitrary new locations u.Note that x is always taken to be the data pattern to 12 model was fitted.For predicting the conditional intensity, a regular square grid of 13 cells of 90 m was simulated over the observation window of 9x 9 km size 14 illustrates the result of predictions at new locations.It shows the values of the 15 density of building damage obtained by the model.It is possible to visually c 16 patterns of the predicted conditional density to the observed pattern of build 17 density calculated using the full set of heavily affected/destroyed buildings (Figu 18 latter was computed using a convolution of the isotropic Gaussian kernel 19 deviation  set to 500 (Diggle, 1985).The value of  was determined em 20 searching for the output that represented the best the overall damage patt 21 compromising the sharpness of structural boundaries.22 23 Fig. 6.Fitted marked Gibbs point process.The triangles correspond to the locations of SMS messages used in the modelling phase.
A simple visual inspection of the results allows evidencing a strong similarity between the patterns exhibited by the model and the actual local variations of the damage pattern.Although the predicted density values are much smaller than the real ones (probably due to the choice of different cell sizes for representing the information and to the use of only 1000 points for the model-fitting stage) the damage pattern is well reproduced by the multi-type Strauss model.This demonstrates the usefulness of integrating the evidenced pairwise interactions into a spatial model for predicting the pattern of building damages.

Goodness of fit
The visual comparison of the patterns of predicted and observed building damage densities corresponds to an informal validation of the fitted model.A more robust validation method is however necessary to verify the potentials and limitations of the selected model.One of the main drawbacks of the multi-type Strauss model is the absence of a standard validation method.Although summary statistics such as the Kfunction are intended primarily for exploratory purposes, it is also possible to use them as a basis for statistical inference, especially in the case of a fitted multi-type Strauss model (Baddeley et al., 2005).A Monte Carlo goodness-of-fit test of fitted multi-type Strauss model can be conducted by comparing the values of K-function for the data with those from simulations of the model (Besag and Diggle, 1977).Thus for validating the model, the upper and lower limits of the simulated K-function were computed using the Monte Carlo goodness-of-fit test for 100 realizations of the point pattern under the fitted multi-type Strauss model.Figure 8 shows the computed envelopes of the fitted model and the cross-K function computed for crisis reports and building damage grades

Goodness of fit 12
The visual comparison of the patterns of predicted and observed building damage densities 13 corresponds to an informal validation of the fitted model.A more robust validation method is 14 however necessary to verify the potentials and limitations of the selected model.One of the 15 main drawbacks of the multi-type Strauss model is the absence of a standard validation 16 method.Although summary statistics such as the K-function are intended primarily for 17 exploratory purposes, it is also possible to use them as a basis for statistical inference 18 especially in the case of a fitted multi-type Strauss model (Baddeley et al., 2005).A Monte 19 Carlo goodness-of-fit test of fitted multi-type Strauss model can be conducted, by comparing 20 the values of K-function for the data with those from simulations of the model (Besag and 21 Diggle, 1977).Thus for validating the model, the upper and lower limits of the simulated K-22 function were computed using the Monte Carlo goodness-of-fit test for 100 realizations of the 23 point pattern under the fitted multi-type Strauss model.Figure 8 shows the computed 4 and 5.It is obvious from this figure that the summary cross-K function stays above the upper margin of the simulated envelope, which indicates that there is some variability in the spatial intensity of the data that the fitted model is unable to capture.The un-captured variability may be due to some covariate effect responsible in the formation of the point pattern which needs to be considered in the modelling of the building damage pattern.Another reason for the deviation of the simulated cross-K function from the observed one could be related to the estimator K which is affected by spatial inhomogeneity as well as by spatial dependence between points.Thus, in practice, the use of the K function in model criticism is restricted to cases where the fitted model is homogeneous and the data are still assumed to be homogeneous.Until now, our analysis has been based on the assumption of homogeneity at both the exploratory and modelling stages.It would be interesting in a follow-up study to examine the assumption of an inhomogeneous trend in the marked point pattern.For that, further analyses are needed to verify the existence of a spatial trend and to evidence the covariate responsible for the non-stationarity of the marked point pattern.In addition to the above mentioned issues, the resulting parameters of the fitted multi-type Strauss model may be considered as "invalid" or "undefined", in the sense that the corresponding probability density is not integrable.As announced in (Kelly and Ripley, 1976), the Strauss process is defined only for 0 < γ < 1, which means that the density is not integrable for γ > 1.The estimated values of the interaction parameter γ ij between crisis reports and building damage grades 4 and 5 are all greater than 1.This happens because in the Berman-Turner device, the conditional intensity is treated as if it were the mean in a Poisson loglinear regression model.The latter model is well-defined for all values of the linear predictor.The spatstat R package that was used in this study envelopes of the fitted model and the cross-K function computed for crisis 1 building damage grade 4 and 5.It is obvious from this figure that the summ 2 function stays above the upper margin of the simulated envelope which indicate 3 some variability in the spatial intensity of the data that the fitted model is unab 4 The un-captured variability may be due to some covariate effect responsible in 5 of the point pattern which needs to be considered in the modeling of the bui 6 pattern.Another reason for the deviation of the simulated cross-K function from 7 one could be related to the estimator K which is affected by spatial inhomogene 8 by spatial dependence between points.Thus, in practice, the use of the K func 9 criticism is restricted to cases where the fitted model is homogeneous and the 10 assumed to be homogeneous.Until now, our analysis has been based on the a 11 homogeneity at both the exploratory and modeling stages.It would be interestin 12 up study to examine the assumption of an inhomogeneous trend in the marked 13 For that, further analyses are needed to verify the existence of a spatial trend an 14 the covariate responsible for the non-stationarity of the marked point pattern.In addition to the above mentioned issues, the resulting parameters of the fitt 21 Strauss model may be considered as "invalid" or "undefined", in the se 22 corresponding probability density is not integrable.As announced in (Kelly 23 1976), the Strauss process is defined only for 0 <  < 1, which means that the 24 This study presents the particularity of exploring for the first time the relationship between geolocated crisis reports and structural damage derived from remote sensing data.Its main novelty lies in the use of methods for the exploration and modelling of spatial point processes and their application to the study of the building damage caused by the Haiti disastrous earthquake.The marked spatial point pattern analysis demonstrated the existence of a strong clustering in the patterns of building damage and crisis reports.In addition, this analysis led to an understanding of the interactions involved among the geolocated crowdsourced reports and heavily affected or destroyed buildings identified in aerial photos.It evidenced the presence of a spatial dependency between these two distributions that was used in a multi-type Strauss model for inferring the pattern of structural damage to buildings.
The main limitation of the approach is that it is data driven hence the reliability of the results depends on the quality of the input data.In this study the crowdsourced reports were manually geotagged on the basis of sometimes ambiguously defined street addresses.The issues of spatial data accuracy coupled with the limits inherent to the Strauss model (e.g.symmetry constraint, absence of a standard validation method) suggest a careful transposition of the approach to other disasters.
Further developments are currently being implemented in order to better understand the relationship between the locations of crowdsourced crisis reports and building damage: 1.A detailed analysis integrating the crisis reports' dates is being conducted in order to analyze the influence of the time component on the strength of the correlation between the crisis reports and building damage.
2. Another aspect that is being currently explored relates to the integration of the reports' categories (food, health, etc.) into the analysis with the purpose of evidencing a relationship between the type of message and the location of building damage.
3. The validation of selected multi-type Strauss model showed the presence of a spatial trend or covariate effect that violates the assumption of homogeneity.Therefore, a study of the possible influence of different covariates derived from remote sensing data, such as the building density or the distribution of rubble, on the quality of the predictions is also undertaken.The purpose is to introduce several explanatory variables into the model and to analyze their added-value in terms of model improvement without overfitting.
4. The outcomes of the multi-type Strauss model be compared to the outputs of more classical models that do not account for patterns of interaction among points such as a Poisson and Cox model in which intensities are directed by SMS locations.In the same way, the results of the Strauss model must be analyzed in comparison with the outputs of non-spatial stochastic models (e.g.Tobit model) in order to better evaluate the advantages and limitations of spatial point process models.
5. It is necessary to reproduce this kind of analysis for other disasters in order to better understand the complex relationship between SMS reports and damaged buildings and to better assess the added-value of spatial modelling in predicting the intensity of damage to infrastructures.
These research directions would allow to refine the analysis presented in this paper and to get better insights on how people utilized the Ushahidi opportunity to call for aid as well as how people moved after the earthquake.
The main outcome of this study is the finding that near real-time geolocated crisis reports can be used as early indicators of the patterns of structural damage caused by a high magnitude earthquake such as the 2010 Haiti earthquake.Most of all, it demonstrated that the timeliness of crowdsourced information could help to produce an overall damage pattern and to better organize the remote sensing efforts by focusing the damage assessment on the most affected areas.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the Joint Research Centre.

Figure 1 .
Figure 1.Location of the study area in Port-au-Prince Fig. 1.Location of the study area in Port-au-Prince.

Figure 2 .
Figure 2. Number of crisis reports recorded daily between the January 12 and 30 March 2010 8 2.2 Building damage assessment based on airborne imagery 9 On January 18th, high resolution aerial photography (with 15 cm spatial resolution) was 10 acquired over the affected areas in Haiti by Google Inc. and made publicly accessible via 11

Fig. 2 .
Fig. 2. Number of crisis reports recorded daily between 12 January and 30 March 2010.

Fig. 3 .
Fig. 3. Spatial distribution of building damage and crisis reports within the observation window of approximately 9 × 9 km.
defined addresses create crisis report points which seem to 7 overlap.For instance, 17 different messages assigned to different categories were found at the Self functions L ii with i= building damage grade 1 (a); i= building damage grade 3 31 (b); i= building damage grade 4 and 5 (c) and i= crisis reports (d), with envelopes computed 32 under the hypothesis of independence of populations.33

Fig. 4 .
Fig. 4. Self functions L ii with i = building damage grade 1 (a); i = building damage grade 3 (b); i = building damage grades 4 and 5 (c) and i = crisis reports (d), with envelopes computed under the hypothesis of independence of populations.

2 8 Figure 5
Figure 5 represents the cross-L functions L ij used to study inter-type correlations under the 9

Fig. 5 .
Fig. 5. Cross L-functions L ij with i = crisis reports, j = damage grade 1 (a) i = crisis reports, j = damage grade 3 (b) i = crisis reports, j = damage grades 4 and 5, Cross L-functions L j i with j = damage grades 4 and 5, i = crisis reports, with envelopes computed under the hypothesis of independence of populations.The red arrows indicate the magnitude of the significant attraction for a given distance.
Figure6shows the fitted multi-type Strauss model for building damage grade 4 an 5 Predicted conditional density of building damage (7a) compared to the actual 2 building damage density (7b) 3 4 A simple visual inspection of the results allows evidencing a strong similarity between the 5 patterns exhibited by the model and the actual local variations of the damage pattern.6 Although the predicted density values are much smaller than the real ones (probably due to 7 the choice of different cell sizes for representing the information and to the use of only 1000 8 points for the model-fitting stage) the damage pattern is well reproduced by the multi-type 9 Strauss model.This demonstrates the usefulness of integrating the evidenced pairwise 10 interactions into a spatial model for predicting the pattern of building damages.11

Fig. 7 .
Fig. 7. Predicted conditional density of building damage (a) compared to the actual building damage density (b).
Simulated envelope for the fitted multi-type Strauss model for 99 realiz 17 observed values of the cross-K function; mmean= estimated theoretical value of 18 function, computed by averaging simulated values; hi= upper envelope of simu 19 lower envelope of simulations.20

Fig. 8 .
Fig. 8. Simulated envelope for the fitted multi-type Strauss model for 99 realizations.Obs = observed values of the cross-K function; mmean = estimated theoretical value of the summary function, computed by averaging simulated values; hi = upper envelope of simulations; lo = lower envelope of simulations.