Landslide hazard models aim at mitigating landslide impact by providing probabilistic forecasting, and the accuracy of these models hinges on landslide databases for model training and testing. Landslide databases at times lack information on the underlying triggering mechanism, making these inventories almost unusable in hazard models. We developed a Python-based unique library, Landsifier, that contains three different machine-Learning frameworks for assessing the likely triggering mechanisms of individual landslides or entire inventories based on landslide geometry. Two of these methods only use the 2D landslide planforms, and the third utilizes the 3D shape of landslides relying on an underlying digital elevation model (DEM). The base method extracts geometric properties of landslide polygons as a feature space for the shallow learner – random forest (RF). An alternative method relies on landslide planform images as an input for the deep learning algorithm – convolutional neural network (CNN). The last framework extracts topological properties of 3D landslides through topological data analysis (TDA) and then feeds these properties as a feature space to the random forest classifier. We tested all three interchangeable methods on several inventories with known triggers spread over the Japanese archipelago. To demonstrate the effectiveness of developed methods, we used two testing configurations. The first configuration merges all the available data for the

Landslides are gravitational movements of rock and debris that pose a severe threat to the human environment

Landslide planforms are used to estimate the mobilized landslide volume, for example, estimating the potential sediment budget of a large-landslide-triggering event

Landslides with the same trigger morphologically cluster, for example, covering narrowly the available statistical variability of hillslope angles in a study region

Our preliminary model

This study also introduces a new Python library, Landsifier, that classifies the trigger of landslides, individually or as a whole, in an inventory where the landslide source mechanism is undocumented. Landsifier is the first-ever library built for
estimating likely triggers of mapped landslides; the methods used in this library to find landslides' triggers are new. Two of these methods are introduced in this paper for the first time, while the third was published in our preliminary work

The seven landslide inventories used in this work are spread over Japan, and their geographical locations are shown on the country's map at the center of the figure. Panels

In this work, we used seven landslide inventories spread over the Japanese archipelago (Fig.

The Geospatial Information Authority of Japan (GSI) is the source of the Hokkaido Eastern Iburi earthquake (September 2018), Fukuoka rainfall (July 2017), and Saka rainfall (July 2018) inventories. The source of the other two coseismic inventories – Iwata and Niigata – is the global repository created by

The TDA-based method uses elevation data to obtain the 3D shapes of landslides from their 2D planforms. We use the

Sample landslide planforms from all six known triggered inventories.

In our preliminary study

In the first method, we used the geometric properties of 2D landslide polygons for the classification. We explored several geometric properties of landslide polygons (e.g., Fig.

In

Sample 3D landslides from six known triggered inventories.

In the second method, we used the 3D shapes of landslides by incorporating the elevation data of the landslide regions. We extracted geometrical and topological properties of a landslides' 3D shapes using topological data analysis (TDA) and then used these properties as a feature space for the machine-learning algorithm – random forest (described in Sect. S1). The topological properties of the landslide's 3D shape extracted using DEM provide additional insights into the landslide triggers, which might further improve the accuracy of the landslide trigger classification. We converted the 2D landslide polygons to 3D landslide polygons using interpolation of 30 m elevation data (DEM) around the bounding box of landslides. We took only the elevation data within the landslide polygons to preserve the geometric shape of the landslides (Fig.

Topological data analysis (TDA) provides a gamut of metrics to quantify the multidimensional shape of data by applying techniques of algebraic topology

An example of using persistence homology: the data points are sampled from a noisy circle.

Generally, in TDA, one constructs a simplicial complex by the Vietoris–Rips complex method, where one chooses a parameter

Homology measures particular structures present in the data providing valuable information about the geometrical and topological properties of the data. For example, zero-dimensional homology captures connected components or clusters, one-dimensional homology measures loops, and two-dimensional homology measures voids

The above-mentioned topological features can be explained using two objects – one the set of the

Sample input images for the image-based classification.

Betti- and persistence-landscape-curve-based features are calculated from the

The heat-kernel-based feature is calculated using the

In the third method, we used landslide planform images as input to convolutional neural networks (CNN) for the classification. We converted landslide polygons into binary images in a way that preserves the relative shape and structure of the polygons (Fig.

Convolutional neural networks (CNNs) are a class of artificial neural networks that are effective for various applications, such as image classification and object detection

The figure shows the convolutional neural network (CNN) architecture used in the image-based method. The input of CNN is a binary-scale landslide image, and the output of CNN is the probability of a landslide image belonging to an earthquake- or rainfall-induced class.

Convolutional layers are the fundamental component of CNN that uses kernels (matrix of learnable parameters) to perform convolutions operations on the input. The resulting output of the convolution operation is called a feature map that learns the feature representation of the input data

Activation functions in CNNs capture the non-linear relationship between the input data and their output class. We used rectified linear unit (ReLU) activation for the hidden layer neuron activation functions as past studies have proved that ReLU activation improves classification results and learning speed

Fully connected (FC) layers work as a classification layer for CNNs, which comes after the convolutional layers. All layers in FC layers are fully connected, which means each neuron in a layer is connected to every neuron in the next layer of FC layers

We used two different testing configurations to evaluate the efficacy of our methods. Finding the triggers of individual landslides irrespective of their inventories is the first testing configuration. Here, we combined all the known trigger landslides from all six known triggered inventories and then split the combined landslide data into various training and testing sets following the

Combining all the landslide inventories with known triggers leads to 26 501 samples (

For the second split configuration, we trained the random forest classifier on five inventories and tested it on the sixth inventory. For earthquake-triggered inventories the method achieved a classification accuracy of

In the first test and training set split configuration, as in Sect.

In the second split configuration, this method achieves an above 90 % accuracy for the Iwata, Niigata, Kumamoto, and Saka inventories. For the Hokkaido and Fukuoka region, the method achieves an above 80 % classification accuracy (see Fig.

The topological-feature-based method (second method) accuracies for all the six known triggered inventories. The model is trained on five inventories in each case and tested on the sixth inventory. The

As explained above in Sect.

For the second split configuration, the method achieved an above 80 % accuracy for the Saka region (

One of the main aims of this paper is to introduce Landsifier, a Python library we built to provide the landslide research community with a user-friendly computational package to implement the methods described above. At the moment, we have made the code available on the corresponding author's GitHub:

The geometric properties of landslides can provide information about their trigger

The landslide data quality depends on the data-acquiring technique; e.g., landslide data obtained using aerial or satellite images are much higher quality than the data acquired via field campaigns. Geologists collect landslide data acquired via field campaigns, and, naturally, such inventories tend to fail to represent the smaller landslides and cover the larger landslides

Landslides are 3D shapes; thus, using 3D shapes of landslides instead of 2D could provide additional information related to the landslide morphology. Consequently, a 3D-landslide-shape-based method might elevate classification accuracy, especially in regions without proper training and testing data of similar quality. We use TDA, a method rooted in algebraic topology, to compute topological features of a landslide's 3D shapes to classify landslide triggers. The TDA-based method extracts topological information along with geometric information of landslide shape, whereas the geometric-feature-based method and likely the image-based method use only geometric information of the landslide shape for landslide classification. We expect the TDA-based method will provide best landslide trigger classification results. In Table

The table shows landslide classification results using the three methods. The model is evaluated on all possible training set combinations of the five inventories and tested on the sixth inventory.

We applied each method to classify landslides triggers in the Kumamoto unspecified inventory having an undocumented trigger to demonstrate the real-world application of the Landsifier library. Out of 612 landslides in the inventory, the geometric-feature-based method and topological-feature-based method classified 604 and 612 landslides as earthquake-triggered. In comparison, the image-based method uses 164 landslides after removing landslides having width and length greater than 180 m (see Sect.

Considering the above discussions, in future work, we plan to explore further the sensitivity of our trigger classification methods to spatial autocorrelations. We will also examine the influences of landslide size distributions on each method. Specifically, we plan to classify the trigger of large landslides (area

The landslide-triggering mechanism is crucial information to develop landslide hazard models; e.g., a landslide hazard model for extreme rainfall incidents requires landslide inventories related to rainfall events only. However, modern automated landslide mappers for continuous monitoring and historical landslide inventories rarely report the landslide-triggering mechanism. Missing triggers in the landslide inventories decrease their efficacy for landslide hazard models. In this work, we developed a Python library, Landsifier, containing three methods for landslide trigger classification by exploiting landslide planforms and 3D shapes. To develop the first two of these methods, we combined geometric and topological features with machine learning, and in the third method, we used deep learning. The latter two methods are new; i.e., we are reporting them here for the first time.

We use seven landslide inventories spread over the Japanese archipelago. Six of these seven inventories have known triggers, while the seventh inventory has a missing trigger. We applied each method to all possible sets of five training inventories and one testing inventory using six known triggered inventories. Moreover, we took different training and testing sets of landslides by mixing all known triggered landslide inventories following the

The Python-based Landsifier library provides a user-friendly computational package to implement the methods described above to the landslide research community. Two of the three methods included in the library are new and introduced here for the first time, while the third method is published in our previous work. To the best of our knowledge, Landsifier is the first Python tool developed for landslide trigger classification, and also such a tool does not exist in other programming languages. We anticipate that the landslide research community will find the Landsifier library helpful in finding the trigger mechanism of inventories or individual landslides. The presented methods and the library could be deployed in any region of the world with adequate training data from areas with similar climatic and tectonic features. The Landsifier library also contains useful functions like finding geometric properties of landslides polygons, downloading DEMs corresponding to an inventory region, and converting landslide polygons to landslide 3D shapes; these elements could be useful for the landslide research community.

Furthermore, methods in the Landsifier library are easy to use as they require only shapefiles of landslide polygons as input. Landsifier is a modular software; we hope the landslide community will further improve the offered tool and expand the available functions for new applications such as classifying landslide types, assessing landslide-prone regions, and other possible uses listed in the discussion section. At the moment, we have made the code available on the corresponding author's GitHub:

The source code is and future updates will be available in the Zenodo repository (

The landslide inventories used in this paper are publicly available from the Geospatial Information Authority (GSI) and the National Research Institute for Earth Science and Disaster Resilience (NIED). The 30 m SRTM DEM data used is also publicly available from NASA and downloadable via

The supplement related to this article is available online at:

All authors contributed to the writing and reviewing of the manuscript. KR developed the code. NM and UO interpreted the results and supervised the work.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the special issue “Advances in machine learning for natural hazards risk assessment”. It is not associated with a conference.

This project is supported by the Co-PREPARE project (no. 57553291) and the German Academic Exchange Service (DAAD). Kamal Rana acknowledges support from the Rochester Institute of Technology's (RIT) Steven M. Wear Endowed Graduate Fellowship, and Nishant Malik acknowledges support through RIT's FEAD grant. Ugur Ozturk acknowledges funding from the research focus point Earth and Environmental Systems of the University of Potsdam.

This research has been supported by the Deutscher Akademischer Austauschdienst (Co-PREPARE project, grant no. 57553291), the Rochester Institute of Technology (Steven M. Wear Endowed Graduate Fellowship), and the Rochester Institute of Technology (RIT's FEAD grant).

This paper was edited by Vitor Silva and reviewed by Luigi Lombardo and one anonymous referee.