The morphology of landslides is influenced by the slide/flow of
the material downslope. Usually, the distance of the movement of the
material is greater than the width of the displaced material (especially for
flows, but also the majority of slides); the resulting landslides have a
greater length than width. In some specific geomorphologic environments
(monoclinic regions, with cuesta landforms type) or as is the case for some types
of landslides (translational slides, bank failures, complex landslides), for
the majority of landslides, the distance of the movement of the displaced
material can be smaller than its width; thus the landslides have a smaller
length than width. When working with landslide inventories containing both
types of landslides presented above, the analysis of the length and width of
the landslides computed using usual geographic information
system techniques (like bounding boxes) can be flawed. To overcome this flaw, I present an
algorithm which uses both the geometry of the landslide polygon minimum
oriented bounding box and a digital elevation model of the landslide
topography for identifying the long vs. wide landslides. I tested the
proposed algorithm for a landslide inventory which covers 131.1 km

Landslides are a natural phenomenon which has an important impact on human society and environment (Geertsema et al., 2009; Kjekstad and Highland, 2009; Petley, 2012). The topographic imprint of these geomorphologic processes is given by the slide/flow of the material downslope, and it is highly recognizable (Pike, 1988; Berti et al., 2013; Rossi et al., 2013; Marchesini et al., 2014), especially in high-resolution digital elevation models (DEMs) (Schulz, 2004; McKean and Roering, 2004; Van den Eeckhaut et al., 2005; Glenn et al., 2006; Schulz, 2007; Kasai et al., 2009; Jaboyedoff et al., 2012; Berti at al., 2013). In the last decades, big improvements have been made in landslide delineation (Cararra and Merenda, 1976; Brunsden, 1993; Keaton and DeGraff, 1996; Ardizzone et al., 2002, 2007; Guzzetti et al., 2012) and landslide inventories' analysis (Malamud et al., 2004; Guzzetti et al., 2012; Van den Eeckhaut et al., 2012). A landslide inventory is usually a collection of polygon shapes, which represent the boundary of a single event or a complex landslide, generated by multiple events (Van Westen, 1993; Guzzetti et al., 2012). Landslides are delineated manually, nowadays digitally in a GIS environment (Carrara et al., 1991; Van Westen, 1993; Van den Eeckhaut and Hervás, 2012), but in several environments (forested areas with frequent debris or earth flows, triggered by earthquakes or typhoons) where certain geospatial data are available (high-resolution imagery, light detector and ranging; lidar DEM), semi-automatic procedures were used (Petley et al., 2002; Barlow et al., 2003, 2006; Nichol and Wong, 2005; Martin and Franklin, 2005; Borghuis et al., 2007; Van den Eeckhaut et al., 2009, 2011; Martha et al., 2010, 2012; Mondini et al., 2011; Lyons et al., 2014). Since these automatic methods can delineate a large number of landslides, an automatic method for length and width estimation can be helpful for analyzing the resulted inventories.

The sliding/flowing of the material downslope produces a scarp and a displaced mass, which are usually longer than they are wide (Fig. 1d). While generalized for flows (Fig. 1d) (Cruden and Varnes, 1996), this mechanism is also quite frequent for rotational (Fig. 1a) and translational slides, falls, topples and spreads (Cruden and Varnes, 1996). I will call this type of landslide “long” landslides. Long landslides appear either as single events or as complex landslides (Figs. 1a, d, 2).

While not characteristic of flows, falls and rotational slides, cases when the length of the displacement does not exceed the width of the displaced mass are relatively common for translational slides and spreads (Fig. 1c, e). This type of landslide appears in almost every landslide inventory, in different proportions (such cases are visible in landslide inventory snapshots from Dewitte and Demoulin, 2005; Schulz, 2007; Galli et al., 2008; Baldo et al., 2009; Hattanji et al., 2009; Guzzetti et al., 2012; Van den Eeckhaut et al., 2012; Mondini et al., 2013). I will call this type of landslide “wide” landslides (Fig. 1b, c, e).

Schematic drawings of long and wide cases of landslides:

Geomorphologic cases of long (L) and wide (W) types of landslides from the study area. The top panel is a 3-D view Google Earth image; the bottom panel is a 3-D view of the lidar DEM shading. Both views show the landslide polygons overlaid in red.

Wide landslides most commonly appear along gully and river banks (Fig. 1b, e), but also on hillslopes (Fig. 1c), especially in regions with monoclinic structure along the scarps of cuesta landforms. Such a situation is found in the Moldavian Plateau (Niculiţǎ, 2015; Mǎrgǎrint and Niculiţǎ, 2016), where the monoclinic structure favors cuesta escarpments, which extend in width for long distances without being fragmented (Figs. 2, 3).

In both scenarios of landslide delineation (manual or automatic), the landslide length and width can be computed automatically in a GIS environment, using the dimensions of the minimum bounding box of the polygons to represent the landslides (Taylor and Malamud, 2012). While for long landslides, the length of the bounding box is a reasonable approximation of the length of the landslide, for wide landslides, the length is the width of the bounding box.

To overcome the flaw mentioned above, which can bias the analysis of landslide length and width (Taylor and Malamud, 2012; Taylor et al., 2015; Niculiţǎ, 2015; Mǎrgǎrint and Niculiţǎ, 2016), an automatic algorithm for the estimation of landslide flow direction is proposed. The algorithm uses the processing of the oriented bounding box, and the elevation and slope length computed from the DEM, for deriving the length along the landslide slide/flow direction and the associated width. The method is validated in a landslide inventory in which the length and width characteristics were assessed during landslide mapping.

For testing the proposed method, I have used a landslide inventory (Figs. 2,
3), created and described by Niculiţǎ and Mǎrgǎrint (2015), for
a small area (131.1 km

The proposed method was implemented using R software application (R Core Team, 2015) and several user-contributed packages (Fig. 4). For more details, the script code is available (please see Sect. 6).

Oriented bounding boxes are in fact minimum-area enclosing rectangles for convex polygons (Fig. 5). Besides the rectangular minimum bounding box (whose sides are parallel to the coordinate axes), the oriented minimum bounding box can be defined as the bounding box of which sides are along the orientation of the polygon (Freeman and Shapira, 1975). The rotating calipers algorithm (Toussaint, 1983) is used to create these bounding boxes in the majority of GIS implementations. Its implementation in the R shotGroups package (Wollschläger, 2016) was used in the present approach, through the getMinBBox function. This function uses the coordinates of the polygon vertices, stored in an object of the sp package, called SpatialPolygonsDataFrame (Bivand et al., 2013), which is created from a shapefile containing the landslide polygons and a unique ordered sequence of auto-increment IDs, by import with the rgdal package (Fig. 4). Further, sp package functions are used to manipulate the bounding boxes.

The study area and the landslide inventory overlaid on shaded lidar DEM. The upper-left panel is an inset showing the position of the study area in Romania; below the inset, a set of 3-D views of the study area landforms from north and south is displayed. The red boxes from the map on the right indicate the insets from Figs. 6 and 7.

Workflow of the proposed algorithm implemented as an R script.

The getMinBBox function creates the

The coordinates of the corner points were used to compute the midpoints of
the sides (MP1 to MP4), using the Midpoint Formula, in such a way that MP1
and MP3 are on the long sides, and MP2 and MP4 are on the short sides of the
minimum oriented bounding box (Fig. 5 left). Using these four midpoints, the
algorithm defines the length of the minimum oriented bounding box (line
connecting MP2 and MP4) and the width of the minimum oriented bounding box
(line connecting MP1 and MP3) (Fig. 5). Using the elevations of the four
corners (MP1 to MP4), taken from the DEM, the square root of the elevation
difference between MP1 to MP3 and MP2 to Mp4 raised to the power of 2 (for
obtaining only positive values) is computed. These values are used to
classify the landslides as long or wide according to the following rules.

The landslides which have a larger elevation difference along the length of the minimum oriented bounding box (line connecting MP2 and MP4) than along the width (line connecting MP1 and MP3) are classified as long.

The landslides which have a larger elevation difference along the width of the minimum oriented bounding box (line connecting MP1 and MP3) than along the length (line connecting MP2 and MP4) are classified as wide.

Geometry of the minimum oriented bounding box. The left panel shows the geometry of the oriented bounding box; the center panel shows the geometry of a long landslide bounding box; the right panel shows the geometry of a wide landslide box.

For these cases, a further step of processing is required. This step involves the cropping of the DEM for every landslide polygon, the hydrological preprocessing to enforce drainage flow routing and the computation of the slope length (Fig. 7). The slope length estimation requires the use of the RSAGA (Brenning and Bangs, 2016) R package, which is linked to SAGA GIS (Conrad et al., 2015). The DEM is preprocessed with the functions Sink Drainage Route Detection and Sink Removal from the Terrain Analysis – Preprocessing module library. The hydrological enforcement is done using the Deepen Drainage Routes method, and the flow path length was computed using the function Flow Path Length from the Terrain Analysis – Hydrology module library. Both flow-routing methods implemented in this tool, Deterministic 8 (O'Callaghan and Mark, 1984) and multiple flow direction (Freeman, 1991; Quinn et al., 1991), are computed (Fig. 6). The maximum slope length for every landslide polygon is computed using the function Grid Statistics for Polygons from Shapes – Grid Tools tool library.

Using the following rules, only the long landslides class is reclassified.

The landslides which have a length greater than the slope length are reclassified as being wide.

The landslides which have a length smaller than the slope length remain classified as being long.

Three-dimensional views of different types of long vs. wide landslide
classifications:

Three-dimensional view of a long landslide (left) and the flow lengths for it (right) (MFD is the flow length computed with the Multiple Flow Direction algorithm, and D8 is the flow length computed with the Deterministic 8 algorithm).

Plots of the difference between the real length/width,
length/width of the oriented bounding box and length/width estimated with
the proposed algorithm:

The confusion matrix for the classification performed in the two steps of the algorithm and the associated statistics.

The results of the proposed algorithm need to be validated against the reality identified in the landslide inventory, to prove its efficiency. The long and wide type was assessed during landslide delineation. Beside field validation, the use of 3-D views of lidar DEM shading and the oriented bounding box (similar to the 3-D views from Figs. 6 and 7) allowed correct identification of the direction of slide/flow and the comparison of length and width. Since the long vs. wide type was correctly assessed, I call the length and width derived from the oriented bounding box along the slide/flow direction the correct length and width (Figs. 5, 8). The width and length computed using the oriented bounding box without taking into consideration the slide/flow direction are called bounding box length and width (Figs. 5, 8). These two variables will be biased if the wide landslide type appears in a landslide inventory. The length and width obtained using the proposed algorithms are called estimated length and width because the algorithm fails to classify certain types of wide landslides (false positive – type I error) and certain types of long landslides (false negatives – type II error) correctly (Table 1). I argue that although some landslides are misclassified, from a pure morphometric point of view, the impact is minimized by the algorithm. This can be seen in Fig. 8. All four plots from this figure show that estimated length and width are closer to the correct length and width than the bounding box length and width. These plots show the scatter (Fig. 8a, b) of the difference between the correct landslide dimensions and estimated/bounding box dimensions and the frequency (Fig. 8c, d) of landslide dimensions. From the scatter plots it appears that bounding box dimensions are bigger than the correct dimensions since length is switched with width, the biggest impact being registered for elongated/elliptic landslides (flows). In the case of round landslides (rotational and translational slides), switching the length with width has a smaller impact. The histograms show the impact of changing the length with the width better and that the results of the proposed algorithm resemble the shape of the real dimensions' distributions with fidelity. The use of the dimensions of the bounding box instead has a huge impact not essentially on the shape, but on the magnitude of the distribution (since the wide type of landslides is dominant at 61 %; see Table 1).

While the interpretation of the differences of the dimensions is instructive and demonstrates the quality of the proposed algorithm in producing results which can be used in geomorphometric analyses, the assessment of the accuracy to predict the correct class is carried out using the confusion matrix (Stehman, 1997; Powers, 2011) and the Receiver Operating Characteristic curve (ROC) graphs (Fawcett, 2006; Powers, 2011). The confusion matrix is presented in Table 1, and its graphic synthesis is presented in the folded plots from Fig. 9.

In the first step of the proposed method, out of 1327 landslides (of which 518 – 39 % – were long and 809 – 61 % – were wide), 642 landslides (48.4 %) are classified as long, of which 478 (36 %) are indeed long (true positives – TP). The other 40 (3 %) long landslides are misclassified as wide (false negative – FN – type II error). Although wide, 164 (12.4 %) landslides are misclassified as long (false positives – FP – type I error). The other 645 (48.6 %) wide landslides are correctly classified (true negative – TN).

The wide landslides wrongly classified as long in the first step (FP – type I error) are mainly developed along the banks of incised rivers and gullies, landslide scarps, landslide toes and cuesta scarps, in situations where there is a greater difference in elevation between the flanks than between the scarp and the toe (Fig. 5c). The long landslides misclassified as wide (FN – type II error) are almost round, rhombic, hexagonal or elongated landslides, which are diagonal to the minimum enclosed bounding box and the downslope direction of the hillslope, which in general is gently sloped (Fig. 5a). This diagonal position generates a difference of elevation along the width, which is greater than the difference along the length.

Overall, the first step has an accuracy of 0.85 (Table 1). The ROC curve is
used to visually assess the performance of a classification (Fawcett, 2006).
The curve plots the true positive rate (benefits, sensitivity) vs. the
false positive rates (costs, specificity) of the classification (Fawcett,
2006; Powers, 2011). The diagonal line is equivalent with randomly guessing
the outcome of the classification (Fawcett, 2006; Powers, 2011). In this
first step of the classification, the TP and FP rates are higher, showing
that the long type is better classified. Overall, the area under the ROC
curve (AUC; see Fig. 10) shows that 84.3 % landslides are correctly
classified. The 95 % confidence interval was computed using 2000
stratified bootstrap replicates in the pROC sp package (Robin et al., 2011).
This interval is in the proximity of the ROC curve, the range of the AUC
being

In the second step, the best performance was acquired by the multiple flow algorithm for computing slope length. This algorithm computes flow distance which is more similar to hillslope length because the flow is modeled as a diffusion process, while the Deterministic 8 algorithm creates a pattern of interrupted flows, although the DEM was preprocessed hydrologically (Figs. 6, 7). Using D8, in the second step, 1309 landslides are classified as wide and only 18 as long. The multiple flow direction algorithm classifies 837 landslides as wide and 478 as long.

Four folded plots of the confusion matrices for the two steps of the algorithm (TP denotes true positive, TN true negative, FP false positive and FN false negative).

In the second step of the proposed method, 685 landslides (51.6 %) are classified as long, of which 425 (32 %) are indeed long (true positives - TP). The other 93 (7 %) long landslides are misclassified as wide (false negative – FN – type II error). Although wide, 65 (4.9 %) landslides are misclassified as long (false positives – FP – type I error). The other 744 (56.1 %) wide landslides are correctly classified (true negative – TN).

The majority of the wide landslides classified as long in the second step (FP – type I error) are almost round, rhombic or hexagonal, and diagonal to the downslope direction (Fig. 6), and only a few are developed along the gully banks or scarps.

The long landslides classified as wide (FN – type II error) are related to areas where the slope length computed by the flow algorithm fails to model the length of the hillslope, and in step two, their length is greater than the slope length (Fig. 6). These landslides have gullies or mounds, which interrupt the diffusion of the flow, and the flow algorithms do not route the flow from the crown to the toe of the landslide (Fig. 7).

Overall, the second step has an accuracy of 0.88 (Table 1). All the
statistics associated with the confusion matrix (Table 1) show that in the
second step, the classification performs slightly better. In this second
step, the ROC curve shows that the TP rate is lower, and the FP rate is higher,
meaning that the wide type is better classified. Overall, 87.8 %
landslides are correctly classified (AUC value – Fig. 10). The confidence
interval is lower,

Although the proposed algorithm performs well, any inconsistency between the shape of the earth portrayed by the DEM and the delineated landslide morphology will give misclassification cases. This scenario is to be expected from inventories where the landslides are delineated from high-resolution aerial/satellite imagery, but the DEM does not have the same resolution, a frequent scenario in the case of automatically delineated landslides. Normally if all the data have the same resolution, the proposed algorithm can provide good results. We also tested the SRTM 1 elevation data with the proposed method for the landslide inventory, and the AUC values decreased to 86.6 %.

ROC curves for the two steps of the algorithm, their 95 % confidence interval and the AUC values.

The quality of the landslide inventory is also of great importance since any
geomorphologic topological inconsistency can introduce wrong identification
cases such as the following:

amalgamation of landslides by crossing channels;

shifting of landslides and touching channels;

landslides spread across several hillslopes.

Inconsistencies of the type described above are to be expected in a substantial proportion for landslide inventories based on topographic maps and metric aerial imagery/DEM, their proportion decreasing in landslide inventories based on centimeter resolution aerial imagery/DEM. The proposed methodology could also be used for checking landslide inventories for geomorphological topological inconsistencies.

The main challenges of the method are as follows:

the inability to separate wide from long landslides based on the flow (mainly related to the false positive cases, long landslides classified as wide);

the failure to separate long landslides from wide landslides because of their diagonal position (mainly related to the false negative cases, wide landslides classified as long).

Possibilities to resolve the failure to distinguish between long and wide landslides because of their diagonal position are related to the computation of the bounding box. Using a 3-D approach in computing the bounding box, from a 3-D vector of the landslide boundary (vertex of the polygon with elevation altitude from the DEM), could probably resolve this, but research is needed. For resolving the flow length estimation, responsible for the type II error, better methods for slope length estimation are needed.

The proposed method can discern between long and wide landslides in 87.8 % of cases. The misclassified cases have a specific geometry and geomorphologic topology, and the algorithm fails to identify them in the right class. The majority of these landslides are round, rhombic or hexagonal, and oriented diagonally to the downslope direction of the hillslope. The slide/flow direction of these landslides is diagonal to the bounding box. Although the algorithms fail to recognize them as long or wide, the errors are minimal when their lengths and widths are used because the difference between them is not big. Of course, if the direction of flow is needed (as aspect value), the algorithm fails to identify the right direction. The minority of misclassified landslides is represented by the elongated landslides, which are either long (similar to flows) or wide (developed along gully banks and landslide scarps). For these landslides, the proposed method fails because the slope length estimation does not exceed the oriented bounding box length, a situation which requires further investigations of the possibilities of using other methods to estimate slope length. Overall, the presented approach managed to identify the majority of the long vs. wide landslides from a landslide inventory and represents a starting point in deriving algorithms for automatic geomorphometric analysis of landslides.

The R stat code of the script which implements the algorithm is available in a Zenodo
repository (

Data associated with the present work consist of a DEM and a landslide
inventory. The landslide inventory is available in a Zenodo
repository (

The authors declare that they have no conflict of interest.

The author gratefully acknowledges partial support from the European Social Fund in Romania, under the responsibility of the Managing Authority for the Sectoral Operational Program for Human Resources Development 2007–2013 (grant POSDRU/159/1.5/S/133391). I am grateful to Prut-Bârlad Water Administration who provided me with the lidar data. I have used the computational facilities given by the infrastructure provided through the POSCCE-O 2.2.1, SMIS-CSNR 13984-901, no. 257/28.09.2010 project, CERNESIM (L4). I thank the three anonymous reviewers for their valuable and insightful comments and suggestions, which greatly improved the article. Edited by: H. Mitasova Reviewed by: three anonymous referees