Evaluation of a preliminary satellite-based landslide hazard algorithm using global landslide inventories

Most landslide hazard assessment algorithms in common use are applied to small regions, where highresolution, in situ, observables are available. A preliminary global landslide hazard algorithm has been developed to estimate areas of potential landslide occurrence in near real-time by combining a calculation of landslide susceptibility with satellite derived rainfall estimates to forecast areas with increased potential for landslide conditions. This paper presents a stochastic methodology to compare this new, landslide hazard algorithm for rainfall-triggered landslides with a newly available inventory of global landslide events, in order to determine the predictive skill and limitations of such a global estimation technique. Additionally, we test the sensitivity of the global algorithm to its input observables, including precipitation, topography, land cover and soil variables. Our analysis indicates that the current algorithm is limited by issues related to both the surface-based susceptibility map and the temporal resolution of rainfall information, but shows skill in determining general geographic and seasonal distributions of landslides. We find that the global susceptibility model has inadequate performance in certain locations, due to improper weighting of surface observables in the susceptibility map. This suggests that the relative contributions of topographic slope and soil conditions to landslide susceptibility must be considered regionally. The current, initial forecast system, although showing some overall skill, must be improved considerably if it is to be used for hazard warning Correspondence to: D. B. Kirschbaum (dbach@ldeo.columbia.edu) or detailed studies. Surface and remote sensing observations at higher spatial resolution, together with improved landslide event catalogues, are required if global landslide hazard forecasts are to become an operational reality.


Introduction
Landslides rank as one of the most pervasive hazards globally, yet estimates of landslide susceptibility and hazard assessment are limited primarily to small spatial scales and are constrained by data availability and spatial resolution (Crozier and Glade, 2006).Methodologies developed for high spatial and temporal resolutions, time-dependent or dynamic landslide hazard assessments are designed for regions with adequate surface data, near real-time precipitation monitoring networks, and high-quality event catalogues for calibration and validation.As a result, these methods do not scale up easily for regional or global assessments of timedependent landslide hazard.Dilley et al. (2005) and Nadim et al. (2006) develop landslide susceptibility and risk maps at the global scale to represent hazard and population exposure, but these assessments are static.While these studies estimate spatial exposure, they do not provide a dynamic picture of landslide hazard.Hong et al. (2006Hong et al. ( , 2007a, b, c) , b, c) have developed an algorithm that provides dynamic forecasts of landslide occurrence in near real-time by combining a calculation of landslide susceptibility with satellite derived rainfall estimates to forecast areas with increased potential for slope instability conditions.The goal of this global algorithm is not to necessarily predict individual landslide events, but to identify locations that exhibit a high probability for landslide initiation, dynamically.The approach taken by Hong et al. (2006Hong et al. ( , 2007) ) is very challenging due to limitations in surface and precipitation data as well as the global scope at which dynamic landslide hazard assessment is evaluated.As a result, this landslide hazard algorithm remains an experimental first step and requires calibration, validation and improvement before it can be used for hazard warning.This paper evaluates the predictive capacity and accuracy of algorithm forecasts against a new database of global landslide occurrences considered over the same time period.We introduce a methodology for evaluating dynamic forecasts using global landslide inventories, and discuss the data limitations inherent in the inventories and forecast algorithm.We evaluate components of the algorithm to determine both relative and absolute predictive skill, and estimate the sensitivity of input and validation parameters.Potential improvements are discussed.

Algorithm description
The satellite-based global landslide hazard algorithm draws on several remote sensing observables to derive a landslide susceptibility map and rainfall intensity-duration relationship.All of the datasets, ranging in spatial resolution from 90-m to 1 degree, were aggregated or interpolated to 0.25 • ×0.25 • resolution for the susceptibility map using the median value within each grouping of pixels.Topography, slope, drainage density and other quantities were derived from a global digital elevation model from the Shuttle Radar Topography Mission (SRTM) (Rabus et al., 2003).Moderate Resolution Imaging Spectroradiometer (MODIS) land cover products were used to delineate land cover types using the algorithm outlined by Friedl et al. (2002) and were partitioned into general land cover classes according to Larsen and Torres-Sánchez (1998).Landslide susceptibility values were assigned to each cover type on a scale of increasing susceptibility to shallow landslides.Soil characteristics were obtained from (FAO/ UNESCO, 2003) and (Batjes, 2000).
Six parameters were chosen to estimate susceptibility: slope, soil type, soil texture, elevation, land cover, and drainage density.The parameters were normalized globally and integrated using a weighted linear combination of variables.The weight for each variable was qualitatively assigned using information from previous landslide susceptibility studies, which specifies the relative importance of each variable (Coe et al., 2004;Dai and Lee, 2002;Lee and Min, 2001;Sarkar and Kanungo, 2004;Larsen and Torres-Sanchez, 1998).The following weights were assigned: slope -0.3, soil type -0.2, soil texture -0.2, elevation -0.1, land cover -0.1, and drainage density -0.1.Weighted, normalized variables were combined to develop a global susceptibility map with a numerical index varying from 0 (water bodies and ice) to 5 (highest susceptibility) (Fig. 1).Since no global landslide validation data were available, Hong et al. (2007a) compared the susceptibility results with a North American study (Godt, 1997) and found there to be a reasonable, qualitative, fit.Details are presented in Hong et al. (2007a).
The present study evaluates the average rainfall intensity needed to trigger a landslide over a specified duration using an empirically derived rainfall intensity-duration (I-D) threshold curve first established by Caine (1980).Subsequent thresholds have been calculated on the global, regional, and local scales, primarily for shallow landslides or debris flows.Hong et al. (2006) developed an I-D threshold using Tropical Rainfall Measuring Mission (TRMM) Multi-Satellite Precipitation Analysis (TMPA) Version 6 rainfall estimates (Huffman et al., 2007) for 74 landslide occurrences globally, most of which were shallow landsliding events.The curve used in the algorithm falls slightly below the curve defined by Caine (1980) and above the quasi-global curves outlined by Innes (1983), Crosta andFrattini (2001), andGuzzetti et al. (2008), all of whom consider shallow landslides and debris flows (Fig. 2).The TMPA rainfall provides 3-hourly coverage from 50 • N to 50 • S at 0.25 • ×0.25 • resolution from 1998 to the present.These values are considered at each pixel globally and rainfall accumulations are summed over 1, 3, and 7-day durations, updating every 3 h.The average rainfall intensity is calculated as a continuous variable over the specified durations, rather than defining specific rainfall events in time.The cumulative rainfall estimates are divided by the specified duration to obtain average rainfall intensity values, for a more direct comparison to previous studies.
To determine areas of landslide potential globally, the susceptibility map and rainfall accumulations are considered on the pixel scale at 3-h intervals.If the susceptibility index for a given pixel has a high Susceptibility Index (SI) value of Category 4 or 5 and the rainfall accumulation at the specified duration exceeds the corresponding I-D threshold value, the pixel is identified as having high landslide potential and a forecast is issued.There is no specification for landslide forecast magnitude or size due to the coarse resolution and global scale of this model and the limited number of surface and precipitation inputs used to define potentially susceptible landslide conditions globally.The flow chart for this algorithm framework is illustrated in Fig. 3.The method is operating in near real-time, with results updated every three hours at http://trmm.gsfc.nasa.gov/publications dir/potential landslide.html.

Data description
One of the main factors limiting landslide hazard assessments on regional (country-level) or global scales is the lack of landslide inventories that can be used to validate landslide hazard models.A few studies have catalogued landslide occurrences at the national or global scale (Guzzetti et al., 1994;Petley et al., 2005).While they provide valuable insight into landslide distribution, the studies constrain the search criteria for cataloguing landslide occurrences spatially or by event impact.In order to evaluate the global landslide hazard algorithm, a global landslide inventory has been prepared for rainfall-triggered landslides reported in the media.Each landslide entry attempts to describe a landslide occurrence, which can be a single landslide or grouping of events triggered by the same extreme rainfall.In this paper, we refer to an individual landslide entry in the catalogue as a landslide event.The entry includes the date of occurrence, nominal Global rainfall intensity-duration thresholds: (A) Caine (1980), (B) Hong et al. (2006), (C) Crosta and Frattini (2001), (D) Innes (1983), (E) Guzzetti et al. (2008).The 1, 3, and 7-day durations considered by the algorithm are shown on the graph.location and geographic coordinates, type of trigger, landslide impacts, and qualitative descriptions of the relative size and location accuracy of the landslide.
The reports were extracted from journal articles, existing databases and online news media as well as government and relief aid organization reports.The study includes all types of rapidly-occurring landslides, most of which are relatively shallow landslides or debris flows occurring within 10 m of the surface, although the actual depth of the landslide or type of movement is rarely given.The inventory has been prepared for the years 2003 and 2007 and is currently being prepared for 2008 and 2009.The resulting landslide inventories have 205 landslide reports for 2003 and 350 reports for 2007.The spatial distribution of the landslide inventories is illustrated in Fig. 1.A detailed analysis of the landslide inventory is given by Kirschbaum et al. (2009).

Inventory data limitations
The inventory is influenced by reporting biases, which affect the availability, reliability, and completeness of landslide reports.The landslide inventory reports are particularly limited when landslide events are associated with contemporaneous hazards such as tropical cyclones, or when landslides occur in remote areas where they are typically underreported.These issues are emphasized by studies that employ similar methodologies (Guzzetti et al., 1994;Petley et al., 2005Petley et al., , 2007;;Castellanos Abella and van Westen, 2007;Chau et al., 2004).Due to reporting issues, the landslide inventory does not provide a comprehensive list of landslide occurrences globally nor does it assume that deficiencies in landslide reporting are distributed randomly.
For example, population exposure can affect landslide reporting.We compare population density values at both the landslide inventory and algorithm forecast locations, plotting  their distributions in Fig. 4. Results show that as population density increases, the percentage of landslide inventory events is consistently higher than the percentage of algorithm forecasts in the same population density bins.This suggests that landslide inventories are either failing to identify landslides in more rural locations or that the algorithm forecasts are not resolving potential landslide conditions in more populated areas.Another issue affecting the number of events in the inventory is the time frame in which the landslide occurrences are recorded.The inventory compilation project began in 2007, allowing for landslide occurrences to be recorded on a daily basis; whereas archived media records and other database sources were used to develop the 2003 inventory.As a result, the 2007 inventory has almost twice as many events as the 2003 inventory.Many more small events were detected as well, which we discuss below.While we are not able to differentiate quantitatively the contribution of reporting biases from other variables affecting landslide frequency, such as climatic factors like ENSO, we attribute the variability in the size of each inventory to the availability of more media sources for the 2007 database.
Nat To develop a general sense of the size and location accuracy of each landslide occurrence, two qualitative indices were developed.We assign a "confidence radius" to each catalogue entry, with the highest confidence assigned to landslides whose locations are known to within 5 km.A relative measure of the size or extent of each landslide occurrence was also defined using a similar ranking scale to differentiate very small events (one small hillslope) from landslides with large volumes and aerial extents.In our analysis, we remove landslide entries with either a low location accuracy (location not known within 75 km) or a small size classification in an attempt to remove some of the reporting biases that may exist between regions.While these specifications do not greatly improve the landslide inventory, they are helpful in providing a consistent set of criteria that can be used to better express inventory limitations in the context of the algorithm evaluation.

Methodology for algorithm evaluation
The algorithm was run retrospectively for 2003 and 2007 using 1, 3 and 7-day temporal windows to obtain the number and location of algorithm forecasts for each year.Pixels occurring within a 1 • radius and within one day of each other were grouped to represent a single forecast, providing a more realistic representation of the number of forecasts for each year.We assume that forecasted landslides represent all typologies, although shallow landslide events are more likely to be forecast because they were used to calibrate the rainfall I-D threshold relationship.

Forecast evaluation techniques
This analysis compares the spatial and temporal distributions of algorithm forecasts and landslide catalogued events to determine the general accuracy and the potential limitations of this preliminary landslide forecast approach.Our comparison is constrained by the incompleteness of the landslide inventory and the relatively low spatial resolution of the algorithm forecasts.Nevertheless, the inter-comparison approach provides a useful framework for assessing the algorithm's operational potential and limitations in the landslide inventory data.
We evaluate the susceptibility map and rainfall intensityduration threshold separately, comparing them with the landslide inventory spatially and temporally.The size of the algorithm forecast catalogues is approximately an order of magnitude larger than the corresponding landslide inventories, since there are many more areas that exhibit landslide potential conditions with no landslides than actual events, and the landslide inventories are incomplete.Since the ratio of algorithm forecasts to landslides is overly large, we must develop both absolute and relative skill indicators.This allows us to identify geographic areas which may not be well-represented by the current susceptibility and rainfall information.
Geographic Information System (GIS) software was used to determine where and when the forecasts successfully resolve landslide occurrences.To account for uncertainty in the landslide inventory locations, each landslide occurrence was defined as a circle equal to its assigned confidence radius.We consider a landslide to be successfully resolved if the algorithm forecast intersects the landslide occurrence 1 day prior or subsequent to the landslide date, corresponding to our estimate of the temporal precision of the landslide inventory.
To evaluate the algorithm's relative skill, we calculated the number of algorithm forecasts and landslide inventories within 2 • ×2 • cells globally.The 2 • resolution was chosen to clearly discriminate between algorithm forecast events, since forecast pixels within 1 • could be considered originating from the same event.We present the results for the 3-day temporal threshold.Forecast and landslide databases were normalized by the total database size.A Skill Ratio (SR) is defined as the normalized forecast density over the landslide inventory density at each pixel, written as: For pixel j , where N F and N L >0, where j represents the pixel indices, N F is the number of forecasts and N L is the number of landslides globally.The skill ratio indicates that a pixel with SR>1 has more forecasts than landslides, SR<1 has fewer forecasts compared to the density of landslides, and SR≈1 has a comparable number of forecast and landslides.Areas with no reported landslides but a high density of algorithm forecasts (Type I Errors) are considered Overforecasts, written as: Areas with landslide occurrences and no forecasts (Type II Errors) are labeled as Missed Landslides, written as:  were plotted on a global map for comparison.The precise meaning of the Skill Ratio and Over-forecast values is limited, since the landslide inventory is incomplete.Relative skill, however, highlights the spatial discrepancies between forecasts and observed occurrence, and suggests a more robust measure of validity.
The absolute skill of the algorithm is evaluated using a Probability of Detection (POD) value, which indicates the number of landslide occurrences that are successfully predicted by the algorithm forecasts over time (Panofsky and Brier, 1965).While this statistic suffers from the same inventory limitations as the relative skill metrics, it provides us with some indication of how the current algorithm performs in time.We define the statistic as:

Evaluation of surface parameters
We evaluate the importance of the algorithm's constitutive variables by examining the sensitivity of the Over-forecast and Missed Landslide metrics to soil texture, land cover, and slope.These three surface variables have clearly identifiable effects on slope instability and are heavily weighted in the current susceptibility index (Nadim et al., 2006;Guinau et al., 2005;Wang and Sassa, 2005).Soil texture and soil type are closely related so only soil texture is examined.The three surface variable grids were aggregated from 0.25 • ×0.25 • resolution to a 1 • grid globally using the mean value from each set of cells.The maximum value within the grid was used to represent slope.The soil texture and land cover grids were then individually normalized and values were extracted from each variable for landslide locations and algorithm forecast locations.We estimate the spatial sensitivity of algorithm overforecasts to input variables by defining a Correlation Parameter (CP), which represents the product of normalized soil or land cover with the normalized Over-Forecast metric, written as: Correlation Parameter, CP where S|LC represents either soil texture or land cover.The CP can be used as a proxy to estimate the spatial coherence and sensitivity of each parameter affecting over-forecasted areas.The statistic is used to create a 1 • global grid with a quantitative index ranging from 0 to 1, with 1 indicating pixels where the density of forecasts is highly sensitive to the surface variable.

Susceptibility map
According to the susceptibility map, 69% of the events for the 2003 landslide inventory and 58% of the 2007 inventory corresponded to SI values of Category 4 or 5 (Fig. 5).Between 27% and 29% of the landslide events for both years occurred in coastal regions (within 25 km of the coastline) with the majority of these events falling into low SI categories (62% for 2003 and 76% for 2007).Comparatively, only 30% and 19% of the landslides located inland corresponded to the low susceptibility values for 2003 and 2007, respectively (Table 1).The spatial resolution of the susceptibility map is primarily responsible for the discrepancy between coastal and inland values.In the Hong et al. (2007a) method, if the majority of a pixel in the susceptibility map is covered by water, it receives a value of 0. This limits the ability to resolve small, but highly susceptible, features especially topography, and results in an underestimation of susceptibility in these regions.In addition, high population growth in coastal areas can contribute to increased landslide frequency, a factor that is not incorporated in the current susceptibility map, but which affects the landslide inventory database.

Rainfall threshold detection
We chose a subset of landslide occurrences with high confidence in their location (i.e., location known within 25 km) in order to estimate the relative success of the I-D threshold.161 events were chosen from the 2003 database and 309 from 2007.For each event, daily rainfall accumulation was calculated using TMPA rainfall data for the seven days prior to and the date of the reported event.Using the same 1, 3, and 7-day duration threshold windows considered in the algorithm, average rainfall intensity measurements were calculated in two ways: finding one average rainfall value for the three durations, with the landslide date serving as a fixed end point; and calculating the average rainfall intensity for the three durations using a running window end date over the 8 days of each record.
Rainfall accumulation for the landslide occurrences exceeded the I-D threshold values for only 28% of the 2003 landslide inventory and 17% of the 2007 events, suggesting that either the rainfall threshold is too high or the TMPA rainfall data is limited (Table 2).Many of the 2003 and 2007 landslide occurrences that were not resolved were associated with short-duration, high-intensity rainfall, typically exceeding the I-D threshold at sub-daily scales of 3 to 12 h.The failure of the I-D threshold to resolve over 70 to 90% of the landslide occurrences suggests that either the threshold is too high for the temporal durations considered, the temporal windows are too long, the satellite data cannot appropriately resolve surface rainfall intensities, the landslide was incorrectly located or dated, or the landslide was triggered by conditions other than rainfall.It is likely that multiple factors contribute to these unresolved landslide occurrences.

Relative skill
We compared the algorithm forecasts with the landslide inventories according to the percentage of events by month and 10 • latitude bands (Fig. 6).The monthly distributions for both years indicate similar seasonal patterns, with a peak in landslide occurrences and forecasts in the Northern Hemisphere summer (June through September) which is consistent with the trend suggested in Petley et al. (2005).We also consider the latitudinal characteristics of the landslide occurrences and algorithm forecasts to determine if the two datasets have the same distribution in the Northern and Southern Hemispheres.Figure 6 indicates that the forecasts and landslides follow similar distributions for the Northern Hemisphere, with the largest percentage of events in both databases occurring within the 20-30 • North latitude band.However, the same distribution does not apply in the Southern Hemisphere, where there are very few landslide reports in proportion to the percentage of forecasts.We attribute this discrepancy to landslide reporting biases or non-English media in Central and Southern Africa and South America as well as over-forecasting in certain locations.There is likely a strong geographic bias in this statement given the small size of the validation dataset.
The Skill Ratio, Over-forecast, and Missed-Landslide metrics are used to illustrate where the algorithm is overestimating, under-estimating, or correctly resolving landslide activity and susceptibility.The three statistics were plotted on one map for 2003 (Fig. 7a) and 2007 (Fig. 7b).A positive Skill Ratio, depicted in green, indicates areas with both landslide reports and forecasts.Over-forecasted areas are shown in red, and blue areas denote pixels where landslides have been identified without corresponding forecasts.

Absolute skill
POD values are calculated at 1, 3, and 7-day temporal windows by considering the landslide occurrences that are accurately detected by the algorithm forecasts for the entire landslide population as well as the landslide events only occurring in high susceptibility index categories (Table 3).The 1-day threshold window resolves the largest percentage of the events for both years (32% for 2003 and 22% for 2007) and decreases as the temporal window increases.However the number of forecasts increases from approximately 1100 to 5000 forecasts per year as the temporal window decreases from 7 days to 1 day, resulting in larger concentrations of over-forecasted pixels.We attribute this large number of forecasts at shorter temporal thresholds to more frequent high-intensity, short-duration events as well as an overestimation of high susceptibility values, rather than too low an I-D threshold.This conclusion is problematic due to the significant difficulty of the I-D threshold to resolve a large majority of the landslide inventory events.This is based on the results of Sect.4.2.

Surface variables and resolution
We address the question of what is driving the most significant biases in the susceptibility map by comparing soil texture, land cover and slope at landslide locations and algorithm forecast locations.The Correlation Parameter metric is used to spatially evaluate these trends.Since the data used to derive soil texture globally closely mirrors the soil type data, when the products are considered together, areas with high soil susceptibility values have twice the influence on the susceptibility index.Both have a weight of 0.2 in www.nat-hazards-earth-syst-sci.net/9/673/2009/Nat.Hazards Earth Syst.Sci., 9, 673-686, 2009  the susceptibility map.We extract the normalized soil texture values for forecast locations and landslide locations and find that soil texture values tend to be higher in forecast locations compared to landslide areas (Fig. 8a).This implies that soil texture is biasing the susceptibility map without adding additional utility and its inclusion in the susceptibility index should be re-evaluated or de-emphasized.This result is not surprising given the coarse spatial resolution of soil data.Figure 8b highlights the spatial Correlation Parameter for soil texture within Eastern Asia and Oceania.The highest correlations over the Indian peninsula and portions of Southeastern China indicate that soil texture is likely the primary source of over-forecasting.This can significantly bias the susceptibility index in areas where the slope values (weighted as 0.3) are relatively low.
For the land cover data, the percentage of normalized land cover pixels above 0.7 is much larger for catalogued events than for forecasts, indicating that land cover may play a more influential role in the Susceptibility Index than its current weight of 0.1 suggests (Fig. 8c). Figure 8d illustrates the Correlation Parameter for land cover, with values exhibiting a distribution similar to that for soil texture but with slightly smaller values.
Slope values were extracted at landslide and algorithm forecast locations and their distributions are plotted in Fig. 9.The percentage of slope values above 5 degrees is significantly higher for the landslide locations compared to the forecast locations.The over-forecasted regions in India and portions of Brazil are significant contributors to the distribution of low slope values, which are dominated by high  soil type and soil texture values.When slope is aggregated to 0.25 • ×0.25 • resolution using the median value for each set of 90-m pixels, the maximum slope resolved in the 0.25 • ×0.25 • product globally is 21 degrees, a clear underestimation of the highest slope values.A more appropriate methodology is needed to accurately compute or characterize slope values for aggregated pixels.
Spatial resolution and uncertainty or errors in the surface data products limit the accurate characterization of susceptibility conditions.Land cover information offers nearhomogenous global coverage; however the classification of pixels into distinct categories can be subjective when estimating general landslide susceptibility conditions.The soil data have the most significant sources of uncertainty because the datasets are compiled from a set of global sampling sites and modeled to 0.5 • or 1.0 • resolution.The sampling sites are not uniformly distributed and result in data gaps that are typically undocumented.The spatial resolution and identification of surface parameters including topography, soil properties, land cover and hydrological variables in mountainous areas can also be challenging due to the complexity and variability of the terrain (Thompson et al., 2001).  2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2

Rainfall sensitivity
In order to evaluate how rainfall observations affect algorithm performance, we applied the algorithm to a case study of Hurricane Mitch in Central America (landfall 29 October, 1998, Category 1 on SS scale).Landslide inventory data were compiled in affected areas in Honduras, Nicaragua, El Salvador, and Guatemala (USGS, 2007).The mapped areas were compared with the algorithm forecasts using 1-day temporal windows for 10 days before and after the storm's landfall.Approximately 60-70% of the mapped land area corresponded to high susceptibility values.The algorithm forecasts intersect the mapped regions in parts of Nicaragua, Honduras, and a small portion of Guatemala; however, large portions of El Salvador, Guatemala and Western Honduras did not receive forecasts.This is likely due to the poor detection of high-intensity, short-duration rainfall events, which were frequent during Hurricane Mitch (Fig. 10).We extracted National Climate Data Center (NCDC) rainfall gauge information from the only two available stations in Honduras and compared the daily totals to TMPA rainfall over the same time period and area, shown in Fig. 10.At both stations, the daily rainfall accumulation was over two times higher than the satellite rainfall estimates.For the northern rainfall gauge at La Ceiba station, the daily rainfall estimate exceeded the 1-day threshold (80 mm/day) and a forecast was issued.However, at Tegucigalpa station in Southern Honduras, satellite rainfall did not exceed the 1-day threshold and resulted in a missed landslide.There are a number of issues in comparing rainfall gauge data and satellite rainfall estimations that affect the interpretation of these results, including the inherent difficulty in comparing point sources with spatially averaged data, and heterogeneous spatial coverage The algorithm was run using a 1-day temporal window and forecasts were plotted according to the cumulative rainfall.Forecasts are compared to mapped landslide areas (hatched boxes).Graphs at the right compare daily rainfall accumulation estimates for two available NCDC rainfall gauges in Honduras and TMPA satellite data.(Fisher, 2004).However, this example provides insight into some of the underlying discrepancies between in situ and remotely sensed rainfall estimations.
One reason for the discrepancy between satellite and gauge rainfall is that TMPA satellites are unable to continually monitor the rainfall intensities in a storm, but rather provide a snapshot of the rainfall field at a minimum of 3h intervals due to the sampling frequency (Huffman et al., 2007).In addition, the 0.25 • ×0.25 • spatial resolution and 1-day temporal window limits the ability to resolve highintensity, short-duration events such as fast-moving tropical systems.Dealing with extremes in radar rainfall data is an ongoing topic of research in remote-sensing and the degree to which we can better resolve such extremes will affect the forecasting capabilities of these systems.Shallow orographic rainfall is also not well-resolved in the current TMPA rainfall data due to the difficulties in resolving warm rain processes over land as well as the underestimation of mountainous relief, leading to significant under-estimations of rainfall (Barros and Lang, 2003;Adler et al., 2000).This is likely what occurred in the mapped areas in Guatemala, where complex topography is common and there were no forecasts issued (Bucknam et al., 2001).Antecedent soil moisture is also a key variable unaccounted for in the triggering relationship that can be used to identify susceptible surface conditions dynamically (Glade et al., 2000).

Discussion
The goal of this analysis is to evaluate a proposed global landslide hazard assessment algorithm using global landslide inventory data in order to determine what contributes to significant uncertainties, and how key input observations interrelate.Considering landslide hazard globally is an extremely challenging task that requires large generalizations of the physical mechanisms underlying slope instability as well as the rainfall-triggering processes.As a result, there are many factors limiting our ability to evaluate quantitatively the algorithm's performance, including the consideration of surface variables and rainfall data at the same coarse spatial resolution globally and the troubling incompleteness of the landslide catalogues.Despite these limitations, the analysis suggests that the algorithm demonstrates some predictive skill in resolving the landslide inventory events.Although a robust test is vitiated by limitations in the underlying variables, we suggest that the statistical framework we have developed is useful for assessing the performance of global forecasting methods as well as their limitations.The proposed method for dynamic landslide hazard forecasting has these limitations: 1. Improper identification of susceptible areas results in Over-forecasting biases; 3. Landslide inventory incompleteness for global and regional calibration and validation.
The present susceptibility map demonstrates some skill in resolving the potential location of landslide events; however, there are many regions where susceptibility is overestimated compared to what is realistic.The weighted linear combination approach used to integrate the parameters in the susceptibility map employs a subjective categorization of weights which serves to bias the susceptibility values regionally.Instead, a susceptibility evaluation should be done using regression and sensitivity analysis to more appropriately relate surface observables to susceptibility.The spatial resolution of surface variables at 0.25 • ×0.25 • in the susceptibility map results in a loss of important information, particularly for slope.When 90-m slopes are aggregated to coarser spatial resolutions, particularly using the median value, small, but important surface features are obscured including coastlines and small islands.Probability distributions and modeled slope histograms can be used as a guide to determine the scale at which to approach the issue of slope resolution.Building on the current data inputs and methodologies, a new susceptibility map can be derived at a higher spatial resolution using a separate weighting scheme to better resolve these features.
The relative skill metrics, Skill Ratio, Over-forecast density, and Missed landslide density, are useful for spatially evaluating algorithm performance and highlighting areas in which the algorithm requires improvements.The western and central portion of the Himalayan Arc, Western India, areas in Southeast China and parts of the Philippines and Indonesia have a positive Skill Ratio, suggesting that there are both landslides and algorithm forecasts in these regions; however for some areas, particularly in Northern India and Nepal, the Skill Ratio below 1 indicates that there are comparatively more landslides than forecasts.This is likely a result of satellite data underestimating rainfall at high altitudes during the monsoon season, since the susceptibility values are generally high in these regions (Barros et al., 2000).Many of the landslide occurrences in this region were also reported along major highways.While we removed all landslides that were clearly anthropogenically triggered, such as those in mines, removing events along roads would exclude too much of our inventory.This suggests that anthropogenic modification of the surface plays a sizeable role in increasing surface instability, which has not been resolved in the current susceptibility map.The high Over-forecast densities in the Indian peninsula, Brazil, Eastern China, and Central Africa appear to come from over-emphasizing soils information and under-weighting slope, which result in high susceptibility index values.Lastly, missed-landslides have the highest density in the Central United States, portions of the Caribbean, and parts of Western Asia including Afghanistan and Pakistan.We attribute this to low susceptibility index values in coastal areas, poorly resolved rainfall accumulations, or enhanced instability from other factors such as snow melt, seismic activity, and anthropogenic influence.
Although it is difficult to differentiate the physical susceptibility conditions from biases in landslide reporting, the temporal and latitudinal distribution of forecasts can provide insight into this issue.Both the landslide events and algorithm forecasts exhibit consistent seasonal and spatial patterns for 2003 and 2007; however, the proportion of algorithm forecasts to landslides is much higher for the Southern Hemisphere tropics and sub-tropics, compared to other areas.The relative skill evaluation indicates that over-forecasting is a problem in Brazil and Central Africa; however, the landslide reporting may be a larger contributor to the uncertainty due to non-English reporting of landslide events and limited transmission of event information to media sources.
Since 60-70% of the landslides fall within high susceptibility locations, the low POD values (ranging from 19 to 33%) imply that the rainfall threshold relationship is vastly under-estimating rainfall accumulation.While the rainfall intensity-duration threshold may be the most straightforward way to estimate potential landslide triggering conditions, considering the I-D threshold curve at the global scale may not be the ideal way to resolve these events, particularly among different climatic zones (Guzzetti et al., 2008).In addition, the minimum 1-day threshold is unable to successfully identify events with short-duration, high-intensity rainfall triggers.The limited sampling frequency of the current TMPA satellite data at sub-daily scales precludes the use of rainfall accumulations at less than one day.Specifying I-D thresholds regionally may help to resolve the rainfall extremes and could be enhanced with antecedent soil moisture information to provide more system memory.
The landslide inventory data represents one of the largest uncertainties in our algorithm evaluation because without a complete dataset, we are unable to pinpoint the contribution of each variable's uncertainty to the total estimation of algorithm accuracy.For our analysis, we require information on all available rainfall-triggered landslide occurrences at the global scale and therefore cannot limit the scope of our analysis at this point.The use of regional data from specific events, such as Hurricane Mitch, can offer more comprehensive and quantitative information to evaluate algorithm performance.This can provide key insight into the surface variables affecting susceptibility in these regions.

Conclusions
This analysis suggests that the first version of this landslide hazard algorithm cannot be used as an operational tool; however, it provides insight into the interrelationship of the susceptibility map, rainfall threshold, and landslide inventory at the global scale.The spatial resolution of the susceptibility map greatly inhibits the ability to resolve small features, particularly in complex topography and coastal areas.
Furthermore, the methodology used to relate the surface variables in the susceptibility map does not accurately represent surface conditions globally.The susceptibility map should consider data products at higher spatial resolutions using a more physically-based approach to better resolve the quantitative relationship between surface variables and their contribution to surface instability.Limitations in satellite estimation of rainfall, particularly in resolving peak rainfall accumulations, influence the detection of rainfall-triggered landslide events and consequently decrease the effectiveness of the intensity-duration curve.Since the accuracy and resolution of the TMPA rainfall data are not likely to improve significantly in the near future, rainfall-triggering information should be enhanced with additional proxies for hydrologic instability within the algorithm framework.At present, the algorithm's temporal memory is only as long as the duration windows considered in the rainfall threshold.Introducing a variable such as antecedent soil moisture could extend the memory of the system and better resolve locations of potential instability, particularly following heavy rainfall events.Intensity-duration and rainfall accumulation relationships may also be considered regionally to remove some of the climatological bias between regions and allow for more effective evaluation of the pre-conditions and intensities contributing to landslide triggering in separate climatologies.
While the global landslide hazard algorithm framework may eventually be useful for forecasting landslide hazard conditions at the global scale, at present the approach can be effective for understanding the relationship between landslide-controlling variables.Spatial and temporal data resolution and accuracy serve as the most significant limiting factors of this analysis and in many cases may be improved upon by considering higher resolution products.The incompleteness of the global landslide catalogue remains a significant limiting factor for validating any algorithm at the global level.We suggest that a comprehensive global catalogue be a high-priority goal of in situ and remote sensing observational programs.Ensemble regional inventories available from event-based mapping initiatives may help to fill in gaps where global information is lacking.Considering the algorithm framework regionally would present a useful way to test global data sets with more comprehensive landslide inventory data.Future versions of the algorithm, incorporating the suggested changes, could make this approach more valuable for discerning areas of potential landslide activity and allow the research community to consider issues of landslide hazard, risk and vulnerability in a broader context.

Fig. 1 .
Fig. 1.Global landslide susceptibility index data from Hong et al. (2007a), plotted with the 2003 and 2007 landslide inventory data.Susceptibility Index values 4 and 5 (yellow and red) denote areas that are considered as high susceptibility in the algorithm.The exact definition of the susceptibility index values is presented in Hong et al. (2007a).
D. B.Kirschbaum et al.:  Evaluation of a landslide algorithm using global inventories

Fig. 3 .Fig. 4 .
Fig.3.Schematic of the Global Landslide Hazard Algorithm.The algorithm is composed of two components: a landslide susceptibility map composed of the listed surface parameters, and an intensity-duration rainfall threshold with TMPA satellite rainfall data.When a pixel on the Susceptibility Map has a value of 4 or 5 and the rainfall accumulation exceeds the intensity-duration threshold value, the pixel is denoted as having high landslide potential and a forecast is issued.The map on the bottom illustrates how the global forecasts are represented on the algorithm website (http://trmm.gsfc.nasa.gov/publicationsdir/potential landslide.html).Yellow circles denote areas where forecasts have been issued in one time slice.Each area can be zoomed in to observe individual highlighted pixels.This example shows the 3-day duration threshold.

Fig. 5 .
Fig. 5. Frequency of 2003 and 2007 landslide inventory events for each susceptibility index category.The susceptibility index categories are shown in Fig. 1 and described in Hong et al. (2007a).

Fig. 6 .
Fig. 6.Frequency of landslide occurrences and algorithm forecasts spatially and temporally for 2003 (left) and 2007 (right); (a) and (b) show the percentage of landslide events and algorithm forecast by month; (c) and (d) show the percentage of landslide events and algorithm forecasts by latitude.

Fig. 7 .
Fig. 7. Global map of the Skill Ratio, Over-forecasts, and Missed Landslide statistics for (a) 2003 and (b) 2007.Green represents the skill ratio, defined as the normalized forecast density over the landslide density.Red denotes areas with a high density of forecasts but no landslides, and blue indicates pixels with landslide reports and no algorithm forecast.Both the Over-forecasts and Missed-landslide areas are normalized by the maximum density values.

Fig. 8 .
Fig. 8.Comparison of landslide event and algorithm forecast databases for soil texture (left) and land cover (right): (a) normalized soil texture values considered at algorithm forecast locations and landslide event locations; (b) spatial distribution of the Correlation Parameter between Over-forecasted areas and soil texture values; (c) normalized land cover values extracted for algorithm forecast locations and landslide event locations; (d) Correlation Parameter for Over-forecasted areas and land cover values.
Fig. 9.Example of slope distribution for the 2003 algorithm forecasts (black) and landslide occurrences (grey) using the 0.25 • ×0.25 • slope product.Results are divided into 1 degree bins.

Fig. 10 .
Fig. 10.Example of algorithm performance for Hurricane Mitch in October to November, 1998.Map shows the path and intensity of the storm according to the Saffir-Simpson scale.The algorithm was run using a 1-day temporal window and forecasts were plotted according to the cumulative rainfall.Forecasts are compared to mapped landslide areas (hatched boxes).Graphs at the right compare daily rainfall accumulation estimates for two available NCDC rainfall gauges in Honduras and TMPA satellite data.

Table 1 .
Percentage of landslide occurrences in each Susceptibility Index category for coastal and inland event locations.Results indicate that landslides located inland have higher susceptibility values compared to landslide events located in coastal areas.

Table 2 .
Evaluation of rainfall intensity-duration thresholds at 3 temporal windows, shown as the proportion of landslides resolved at each temporal duration.The two columns for each year show the percentage of landslide events resolved when the landslide date is fixed and when the intensity-duration threshold values are considered in a variable window within 8 days of the landslide event.