13 Apr 2023
 | 13 Apr 2023
Status: this preprint is currently under review for the journal NHESS.

Non-landslide sampling and ensemble learning techniques to improve landslide susceptibility mapping

Chao Zhou, Yue Wang, Ying Cao, Ramesh P. Singhc, Bayes Ahmed, Mahdi Motagh, Yang Wang, and Ling Chen

Abstract. In recent years, several catastrophic landslide events have been observed throughout the globe, significantly affecting the loss of lives, infrastructure, everyday life and livelihood. To minimize the impact of landslides and issue early warnings, landslide susceptibility maps (LSM) are essential. Aim to improve the accuracy of LSM, this study applied a random selection of non-landslide samples and low accuracy of individual classifiers using machine learning (ML) techniques, coupled with ensemble learning and ML, for LSM. China's Zigui-Badong section of the Three Gorges Reservoir area (TGRA) was considered a case study. Twelve influencing factors were selected as inputs for modelling, and the relationship between each causal factor and landslide spatial development was quantitatively analyzed. A total of 179 landslides were identified in the present study. About 70 % of the landslide pixels were randomly considered for training, and the remaining 30 % were used for validation. The Logistic Regression model (LR) was applied to produce an initial susceptibility map, and the non-landslide samples were selected within the classified low-susceptibility area. Subsequently, two ML classifiers – the Classification and Regression Tree (CART), and the Multi-Layer Perceptron (MLP), and four coupling models – the CART-Bagging, CART-Boosting, MLP-Bagging, and MLP-Boosting, were utilized for LSM. Finally, the receiver operating characteristics (ROC) curve and statistical analysis were applied for accuracy assessment. The results show that elevation and distance to rivers were the main causal factors of landslide development in the study area. The modeling accuracy of LR-MLP was calculated approx. 0.901, which is higher than the LR-CART (0.889). The LR-MLP-Boosting performed the best with an accuracy of 0.986 followed by the LR-CART-Bagging (0.973), LR-CART-Boosting (0.981), and LR-MLP-Bagging (0.978). The accuracy has been improved compared with the NO-CART, NO-MLP, NO-CART-Bagging, NO-CART-Boosting, NO-MLP-Bagging, and NO-MLP-Boosting models. Four ensemble models outperformed their corresponding classifiers, while Boosting outperforms Bagging. Overall, the combination of ensemble learning and ML effectively improved the accuracy of LSM. The LR model can effectively constrain the selection range of non-landslide samples and enhance the quality of sample selection. Our results show promise to map susceptible landslides locations which will help to monitor for an early warning of the landside.

Chao Zhou et al.

Status: open (until 10 Jun 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Chao Zhou et al.

Chao Zhou et al.


Total article views: 344 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
290 48 6 344 3 3
  • HTML: 290
  • PDF: 48
  • XML: 6
  • Total: 344
  • BibTeX: 3
  • EndNote: 3
Views and downloads (calculated since 13 Apr 2023)
Cumulative views and downloads (calculated since 13 Apr 2023)

Viewed (geographical distribution)

Total article views: 342 (including HTML, PDF, and XML) Thereof 342 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 29 May 2023
Short summary
We found that the altitude (< 240 m) and distance to rivers (< 300 m) emerged as important factors for the cause of landslides. LR-MLP-Boosting achieves the highest prediction accuracy. The coupling models outperform the corresponding single models and Boosting algorithm performs better than the Bagging algorithm. High-quality non-landslide samples enhance the accuracy of LSM. They can be effectively obtained by using the LR model to constrain its selection range.