A method for semiautomated landslide detection and mapping, with the ability
to separate source and run-out areas, is presented in this paper. It combines
object-based image analysis and a support vector machine classifier and is
tested using a GeoEye-1 multispectral image, sensed 3 days after a major
damaging landslide event that occurred on Madeira Island (20 February 2010),
and a pre-event lidar digital terrain model. The testing is developed in
a 15
Landslides are complex mass movements that occur on hill slopes due to the action of gravity; they play an important role in the evolution of landforms, while constituting a serious natural hazard in many regions throughout the world. Landslides can involve flowing, sliding, toppling, or falling and are commonly associated with a trigger: slope failures generally occur within minutes after an earthquake, hours to days after a snowmelt, and days to weeks after an intense rainfall (Guzzetti et al., 2012; Malamud et al., 2004). Urban expansion into hilly or mountainous regions results in more people being exposed to the hazard, thus increasing landslide risk. Nowadays, landslides claim thousands of lives every year and result in extensive infrastructure and property damage (Malamud et al., 2004; Yang and Chen, 2010; Holbling et al., 2012). Landslide susceptibility and hazard assessment are important tools in land-use planning, in particular to avoid urban expansion into vulnerable areas, thus reducing future economic and human losses. Past landslides are one of the best indicators of future landslide activity, and mapping landslides is therefore an essential step in hazard assessment (Bucknam et al., 2001; Lahousse et al., 2011; Aksoy et al., 2012; Guzzetti et al., 2012).
An inventory map records the location and, when known, the date of occurrence and type of landslides that have left discernible traces in an area (Malamud et al., 2004). A landslide-event inventory consists of all the slope failures associated with a single trigger such as an earthquake, rainstorm, or snowmelt and is useful to determine the residual risk in the aftermath of the event, as a guide for emergency and recovery efforts, and to validate landslide susceptibility and hazard models (Malamud et al., 2004; Barlow et al., 2006; Guzzetti et al., 2012; Mondini et al., 2013, 2014). Immediately after the event, individual landslides are easy to recognize, even in the case of small and shallow landslides such as soil slips or debris flows, and detailed mapping carried out shortly after the landslide event leads in general to a significantly complete inventory (Malamud et al., 2004). This notwithstanding, landslide inventories are generally incomplete, both in what concerns the area covered and the time period investigated, a serious drawback for landslide hazard studies (Malamud et al., 2004; Van Westen et al., 2006).
Landslide inventories traditionally used to be derived by visual interpretation of aerial photographs and field surveys. The latter can lead to comprehensive and precise landslide inventories, but they are often hindered by cost effectiveness and logistical constraints, particularly over large or remote areas (Yang and Chen, 2010). Interpretation of stereoscopic aerial photography (whereby the interpreter detects and classifies landslides based on experience and on the analysis of changes in the form, shape, position, or appearance of the topographic surface) remains a common method to recognize landslides, despite being an empirical and subjective technique (Guzzetti et al., 2012). Aerial photographs, which can accurately depict the distribution and contours of landslides in a region, are seldom available in a timely manner, thus restricting the ability to prepare event and seasonal inventory maps repeatedly and for large areas (Martha et al., 2010; Aksoy et al., 2012; Guzzetti et al., 2012). In this context, and if the persistence of cloud cover is not an issue, satellite imagery emerges as a uniquely reliable tool for timely mapping of landslides and damage assessment over large and inaccessible areas (Barlow et al., 2006; Joyce et al., 2008; Aksoy et al., 2012; Holbling et al., 2012; Xu et al., 2015).
Visual interpretation of satellite imagery remains extremely demanding in terms of resources and time, especially when dealing with numerous multi-scale landslides affecting wide areas, such as rainfall-induced shallow landslides. Applying automated methods can contribute to more efficient landslide mapping and updating of existing inventories, and in recent years the number and variety of approaches has rapidly increased (Guzzetti et al., 2012; Holbling et al., 2012; Marc and Hovius, 2015). Landslides can display highly heterogeneous sizes, demanding information with higher spatial resolutions in order to produce complete event inventories. Very high-resolution (VHR) multispectral images, acquired by space-borne sensors with sub-metric precision, such as Ikonos, QuickBird, GeoEye, or WorldView, are increasingly considered the best option for landslide mapping (Van Westen et al., 2008), but these new levels of spatial detail present new challenges to state-of-the-art image analysis tools (Kurtz et al., 2014).
In recent years, several semiautomated methods have been developed to tackle such difficulties, using specific classification schemes that target single post-event optical images (Cheng et al., 2013; Moosavi et al., 2014) or, when suitable pre-event data are available, exploit pre- and post-event image changes (e.g., Lu et al., 2011; Mondini et al., 2011a, b). In the latter case, great care has to be taken in the co-registration and radiometric correction procedures. Ideally, pre-event and post-event images should be acquired at the same time of the year and with similar view angle and solar illumination, but this is often not feasible (Guzzetti et al., 2012).
Semiautomated approaches to landslide mapping can be classed, according to
the type of image element used, as “pixel based” (e.g., Chang et al., 2007;
Yang and Chen, 2010; Chini et al., 2011; Cheng et al., 2013; Mondini et al.,
2013, 2011a, b) or “object based” (e.g., Aksoy et al., 2012; Holbling
et al., 2012, 2015; Lacroix et al., 2013; Lahousse et al., 2011; Lu et al.,
2011; Martha et al., 2010, 2011, 2012, 2013; Stumpf et al., 2011, 2014; Van
Den Eeckhaut et al., 2012). When applied to very high spatial resolution
images, pixel-based methods often exhibit a “salt and pepper” appearance
(Van Westen et al., 2008; Guzzetti et al., 2012) which requires image
post-processing. The “object-oriented” approach, however, groups
image pixels into homogeneous objects, with shape, size, neighboring, and
textural features in addition to spectral information (Aksoy et al., 2012).
With both approaches, supervised and unsupervised classification schemes have
been adopted, based on algorithms such as maximum likelihood (Nichol et al.,
2005; Borghuis et al., 2007; Danneels et al., 2007),
Location of Madeira Island, in the North Atlantic, with the most affected basins during the 2010 event superimposed on a DTM. Our study area is delimited by the rectangle located over Funchal basins. HB: hydrographic basin. Adapted from Lira et al. (2013).
The separation of the landslide-affected region into source, transport, and deposition areas is important to support post-event mitigation actions, since sediments deposited by landslides are likely to become source materials in subsequent events (Mondini et al., 2011a; Lira et al., 2013). More generally, the assessment of the volume of sediments produced, displaced, and deposited by landslides is important for susceptibility and hazard evaluation (Guzzetti et al., 2009). Mondini et al. (2011a, 2013) developed semiautomated pixel-based approaches to map landslides into source and run-out areas (defined as the union of transport and depositional areas), using a single post-event image and pre- and post-event image changes, respectively. Recently, Holbling et al. (2015) developed an object-based approach for semiautomated landslide change detection, with the ability to separate landslide sources from debris flow/sediment transport areas. Van Den Eeckhaut et al. (2012) also separate landslide source area and run-out but, in contrast to the previous authors, they use lidar data instead of optical imagery.
In this work we develop and test a methodology for semiautomated landslide
recognition and mapping of landslide source and run-out areas. The method
combines object-based image analysis and an SVM supervised
learning algorithm, and it is tested with information from VHR multispectral
imagery acquired after a landslide event, together with a pre-event high-resolution (4
Madeira Island (Fig. 1), with a population of 250 000 inhabitants, has a long record of flash floods, with at least 30 flash flood events of significant intensity registered since the beginning of the 19th century (Baioni, 2011; Lira et al., 2013). These flash floods usually last a few hours, during which a large amount of sediment is transported downstream very rapidly and with very high energy. Large part of the transported material is sourced by shallow landslides triggered upstream by the heavy precipitation (Lira et al., 2013). The combination of rough and steep terrain (mean slope angle 37 %) with intense rainfall provides the conditions for frequent and widespread landslides on Madeira.
On 20 February 2010 an extreme rainfall event followed a prolonged precipitation period (Luna et al., 2011; Couto et al., 2012; Fragoso et al., 2012; Teixeira et al., 2014). In the first hours of the morning rainfall values reached more than the double of the monthly average, triggering landslides and exceptionally strong flash floods that affected severely the municipalities of Funchal (home to half of the island population) and Ribeira Brava (Lira et al., 2013).
Details of GeoEye-1 pan-sharpened RGB image acquired over Madeira 3 days after the flash floods.
Our study area (15
A landslide reference data set was prepared to train and evaluate the semiautomated method, by thorough revision of a previous inventory produced by Lira et al. (2013), based on delimitation of contours on the post-event GeoEye-1 image and assisted by visual interpretation of the orthophotos. After the revision of its contours, the landslides were classified according to the type of movement (Varnes, 1978; Cruden and Varnes, 1996; Zêzere, 2002), resorting to several data sources besides the imagery, namely field work reports and photographic evidence. The large majority of mass movements were shallow translational slides and debris flows. The latter are the fastest and most destructive type of landslide and important sources of sediment to channel networks; most begin as translational slides that liquefy (Gabet and Mudd, 2006; Sidle and Ochiai, 2006). For the drainage basins of the Funchal and Ribeira Brava municipalities, 3207 shallow translational slides, 795 debris flows, and 59 rotational slides were inventoried. Furthermore, source and run-out areas were mapped separately inside the disturbed region. In the case of shallow translational slides, the separation was based on the darker appearance of the source areas (corresponding to freshly uncovered deep soil) which contrasted with the brighter run-out areas characterized by disturbed and/or bent vegetation and superficial layers of soil. For the debris flow tracks, darker, fresher-looking areas were interpreted as scoured sectors acting as important sources of material. This is expected to occur in the steepest sectors of the drainage network (e.g., Schurch et al., 2011; Theule et al., 2014; Tiranti et al., 2015), which is the case of our study area. The distinction between source and run-out areas was easier for fresh translational slides than for debris flows, but in both cases the process was affected by considerable uncertainty. Given the absence of anomalous precipitation in the months following the event (IPMA, 2010), we were able to minimize the mapping errors to some extent by re-interpreting the same landslides on the orthophotos acquired a few months later (May, 2010), when all loose material had been washed away from run-out areas. Often the landslides are composed of a shallow translational slide that further develops into debris flows (as reported and illustrated for Madeira Island in Lira et al., 2013, Fig. 2). In such cases we divided the source area into primary and secondary sources: the former category corresponds to the shallow translational slide and the latter to scoured areas within debris flow tracks. Secondary sources were also separated as seemingly fresh slides occurring inside shallow translational run-out regions.
The methodology followed is schematically represented in Fig. 3. The
pre-processing consisted of fusing the 0.5
Diagram of the landslide mapping methodology.
The partition of the pan-sharpened and orthorectified GeoEye-1 image into
objects was achieved with ENVI's feature extraction module, which performs
image segmentation followed by merging of the segmented regions. Image
segmentation involved computation of a gradient image for each of the (R, G,
B, NIR) fused bands using a Sobel edge detection operator (Sobel, 1968),
followed by conversion to a single-band map, where each pixel retains the
maximum gradient across the bands. A watershed algorithm (Beucher and
Lantuéjoul, 1979; Roerdink and Meijster, 2001) is then applied, flooding
the map starting with the lowest gradient values (the most uniform part of
the objects) to the highest gradient values (the edges). A selectable scale
parameter can modify the gradient map, thus controlling the minimum contrast
of the object edges (Roerdink and Meijster, 2001; Jin, 2012;
Our goals in the segmentation/merging procedure were, first, to outline the landslides source and run-out areas, including the smallest of them, with as much accuracy as possible; and second to avoid the division of these features into many objects, so as not to obscure its recognizable landslide shape. In a sense, we searched for a balance between over-segmentation and under-segmentation – bearing in mind that the former is always preferable (Martha et al., 2010) – in an attempt to capture the geometric attributes of the landslide source and run-out areas (such as elongation or roundness) with minimum loss of accuracy in the delineation of contours. We tested several scale (5, 10, 20, 30, 40, 50, 60, and 70) and merge (30, 50, 70, 85, 90, and 95) parameters and superposed the final segmented regions onto the landslide inventory (solely in the training area) to find, by visual interpretation, the most appropriate combination to our case study (which turned out to be scale value 40 and merge value 90). Figure 4 shows details of the comparison between the inventory landslide delineation and our final choice for image partition. In the absence of an inventory map, the comparison can be made directly with the pan-sharpened RGB GeoEye-1 image also shown in the figure.
Details of GeoEye-1 pan-sharpened RGB image, showing the partition segments resulting from the chosen segmentation/merging parameters: scale 40, merge 90; white contour is the inventory landslide delineation; black contour is the image partition.
Layers used in classification.
From (i) the four pan-sharpened bands of GeoEye-1, (ii) a calculated
vegetation index (NDVI), and (iii) three topographic indexes derived from
a 4
The information from the nine input layers in Table 1 was used to compute the
full set of spectral, textural, and spatial features available on ENVI's
feature extraction (as described in
Features used for object-based learning.
GeoEye-1 pan-sharpened RGB image over the study area
(15
Classification was based on the SVM algorithm (Cortes and Vapnik, 1995; Vapnik, 1995), a supervised non-parametric statistical learning technique which separates the data set into groups or classes in a way consistent with the training examples. SVMs are gaining popularity in the remote sensing field, including landslide mapping (Van Den Eeckhaut et al., 2012; Moosavi et al., 2014), due to their ability to handle data with unknown statistical distributions and small training sets, as is often the case in this field (Mountrakis et al., 2011). SVMs are binary classifiers whose aim is to find the decision region boundary that separates the data set characteristics or features into two regions in the feature space. The SVM chooses the boundary (optimal hyperplane) with the maximum safety margin to the closest training features (termed support vectors), hence maximizing the margin between the classes. The linearization of the decision boundary is achieved through the use of kernel functions which map the training data into a higher-dimensional space in which the two classes can be linearly separated by a hyperplane.
As referred before, the study area (with 15
We conducted several tests in order to choose the optimal SVM kernel function and set the corresponding parameters, namely the penalty parameter (which controls how examples located on the wrong side of the decision boundary are penalized) and the sigma function parameter (which defines the radius of influence of each training sample). Within the ENVI image processing environment, the SVM classifier was tested using different kernel functions (linear, polynomial (degrees 2–6), sigmoid, and radial basis function (RBF)). This was first done in an expedite manner, by visual inspection of the match between results and reference data in the training areas, and led to the choice of an RBF function with sigma value 0.03 and penalty parameter 100. These tests were further refined, following Yang (2011), through a sensitivity analysis of the SVM parameters, using quantitative measures of the match between results and reference data in the validation areas. For RBF and degree 2 polynomial kernels, we exploited the range of sigma values between 0.01 and 0.30, fixing the ENVI's penalty parameter at the default value of 100. Furthermore, for the RBF kernel we tested the SVM with penalty values in the range from 1 to 1000 and sigma values of 0.03, 0.04, and 0.2. Finally, the SVM was tested for the degree 2 polynomial kernel, with sigma values of 0.03 and 0.04 and penalty values in the range from 100 to 250. The best prediction accuracy was achieved with the RBF kernel, with sigma value 0.03 and penalty parameter 200, but the results varied very little with respect to those of our expedite test. The classification confidence threshold was set to 90 %, i.e., objects with less than 90 % confidences in each class were set to unclassified.
To address our multi-class classification problem (landslide source; landslide run-out; all others) we started by using two “one-against-all approaches”, in which a landslide-against-all and a source-against-all classifier were designed to derive the classes “landslide”, “landslide source”, and “all others”; subsequently, the class “landslide run-out” was obtained from spatial subtraction of classes “landslide” and “landslide source”. The feature information used by the SVM classifier was described in Sect. 3.2 and is listed in Tables 1 and 2. The features associated with landslide/non-landslide examples or source/non-source examples are plotted by the algorithm into its corresponding feature spaces for definition of the decision boundaries. Then the entire image is classified: for each one-against-all approach, and for each object on the image, the SVM maps its spectral, spatial, and textural characteristics into the feature space and, depending on which side of the decision boundary it plots, classifies it accordingly. The set of features used by the “landslide-against-all” classifier was the same employed by the “source-against-all” classifier. However, from qualitative analysis of the feature maps produced for each of the (over 80) spectral, textural, and geometric attributes, we can state that spectral measures were the primary discriminant for source areas, followed by geometric attributes (e.g., elongation) and texture. The DTM used was acquired before the 2010 landslide event, so it does not represent surface changes occurred during it. Nevertheless, it provides unique geomorphic features (such as slope, aspect, and curvature) to each segmented object, assisting the classifier decision. The use of this information proved in our case very useful to diminish the ambiguity presented by objects with similar spectral characteristics located in flat areas.
Accuracy measures for landslide recognition (landslide-against-all classifier) and separation of landslide source and run-out (source-against-all classifier), for different validation regions (VAL 1 and VAL 2). Two cases are distinguished, depending on whether reference data source areas are primary sources only or include secondary sources (seemingly fresh slides occurring inside the run-out area).
acc: overall accuracy (%);
For assessment of the accuracy of the match between the classified image and
the reference data, we compared the classification results in the validation
areas with the landslide inventory map (Lahousse et al., 2001; Holbling
et al., 2012) and followed standard metrics derived from the error or
confusion matrix built from the two data sets (Congalton, 1991; Foody, 2002).
In assessing the accuracy of results, we have separated the
landslide/not-landslide classification from the source/run-out/not-landslide
classification. In the latter case, we weighted equally the three possible
classification errors or, in order words, we did not consider a lesser
mistake to misclassify as source or run-out an object belonging to a
landslide, instead of giving that same label to an object outside the
landslide. The accuracy metrics computed were the overall accuracy (acc),
which gives the proportion of area correctly classified, expressed as
a percentage; the kappa index of agreement (
Details of the previous figure, showing the comparison between
classified landslides (yellow fill) and inventory reference data (red
contours) in the validation regions 1
Classified landslide sources (red fill) and run-out (yellow
fill) compared to inventory reference data (red contours delineate
the landslide; blue contours delineate the source area). In the
examples shown, source areas are defined either as primary sources only
(in
Following the methodology already described, landslide classification maps
were produced for the 15
Figure 5 presents, for the overall study area, the object-based image classification of landslides, using the SVM machine-learning algorithm with the RBF kernel, as described before. To illustrate the performance of the approach, the semiautomatically recognized landslides are compared, in terms of overlapping area, to the inventory reference data (yellow fill and red contours in Fig. 5, respectively). We observe a remarkably accurate semiautomated depiction of the landslide areas, both in the training and validation regions. Landslides not successfully detected are located in areas obscured by shadows, an unavoidable hindrance in this approach.
In Fig. 6 the detailed mapping of the landslide areas is presented for the validation regions 1 and 2, which did not contribute with examples for learning. The figure also summarizes the accuracy metrics computed from the error matrix built for each of the validation areas, yielding good results for both of them, with commission errors below 26 % and omission errors below 24 % (see also Table 3). Note that we did not try to apply post-processing filters to exclude very small objects falsely identified as landslides but such a procedure would have reduced the commission errors.
Validation area 2, which contains a poorly illuminated slope, displays somewhat poorer accuracy in the classification of the overall area affected by a landslide. However, this problem is overcome by the source-against-all classification results (next Sect. 4.2), which performed well in what concerns landslide recognition: 61 out of 63 landslides were detected in this region, which compares with 20 out of 22 landslides in validation region 1 (see Fig. 7a and b). These values correspond to the detection of 95 % of the landslides scars in the validation areas.
Figure 7 displays the results of the semiautomated mapping of landslide source and landslide run-out (transport plus deposition) in the validation areas 1 and 2, using again the object-based and SVM machine-learning approaches described in Sect. 3.
In Fig. 8 a detail of the classification is shown to allow comparison with the landslide characteristics that can be visually recognized in the pan-sharpened GeoEye image.
The inspection of Figs. 7 and 8 shows a good performance of the classifier in the internal mapping of source and run-out landslide areas, particularly in the sunnier east-facing slopes. In the less illuminated areas the classifier is able to map the source areas accurately but performs poorly in what concerns the landslide run-out mapping (Fig. 7). Accuracy measures were again computed for each of the validation areas, by comparison with source and run-out areas in reference data. Two cases need to be distinguished, depending on whether reference data source areas are primary sources only (example in Fig. 7b) or include secondary sources (seemingly fresh slides occurring inside the run-out region; Figs. 7a and 8).
Detail of classification of landslide sources (red fill,
Table 3 lists the computed overall accuracies, kappa indexes, and FNRs/FPRs for both validation areas and for both definitions of source area. FNR and FPR values fall below 38 and 78 %, respectively, for all situations (the results in Fig. 7 correspond to the best cases). The overall accuracy seems hindered, on the one hand, by the difficulties of the classifier in mapping the run-outs in poorly illuminated areas and, on the other hand, by the subjectivity of reference data description as primary or secondary sources of sediment.
We present a method for semiautomated landslide recognition
and mapping of landslide source and run-out area, suitable for VHR remote
sensing images of rain-induced landslide events. The approach combines
object-based image analysis and an SVM supervised learning
algorithm, and it was tested with a GeoEye-1 multispectral image (0.5
At present, one of the main limitations of the proposed methodology is its poor performance in the automated mapping of landslide transport and deposition areas in poorly illuminated slopes, a problem that may perhaps be overcome using multiple satellite acquisition geometries. Another source of uncertainty results from the subjectivity associated with the definition of the landslide source areas, made mostly from visual inspection of post-event satellite images and orthophotos. These difficulties may have constrained the quantitative assessment of the classifier results. This task was particularly challenging in the case of debris flows, in which the high-energy transport area seemed to contain secondary sources of sediment supply (see examples in Fig. 8), with the same spectral and textural image characteristics of the primary sources. Another limitation is the subjectivity of the trial-and-error procedure used to select the segmentation parameters. Such expert-driven approach was used to minimize over-segmentation, in order to capture the geometrical attributes of the landslides which proved to be relevant in the classification, but we cannot exclude that automated methods for objective determination of segmentation parameters (see, e.g., Dragut et al., 2010, 2014; Martha et al., 2011; Gao et al., 2011) would yield better results. In a similar way, the use of objective automated feature selection methods (e.g., Stumpf and Kerle, 2011) could further improve the effectiveness of our method or the accuracy of the results.
The method proposed here may have the potential to increase promptness and cost effectiveness in the production of inventories following a landslide event, when a VHR post-event optical image and a pre-event digital elevation model are both available. It also assists an approximate spatial quantification of the amount of sediments produced and transported during a landslide event, information that can be crucial in emergency response situations, and is clearly important for landslide susceptibility and hazard assessment (Guzzetti et al., 2009), contributing in particular to support post-event mitigation actions, such as sediment control measures (Lira et al., 2013).
This contribution was developed in the frame of the project AULIS (PTDC/ECM/116611/2010) funded by Fundação para a Ciência e a Tecnologia (FCT) in Portugal. Sandra Heleno acknowledges FCT for her research grant SFRH/BPD/84796/2012. Edited by: P. Reichenbach Reviewed by: three anonymous referees