the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Non-landslide sampling and ensemble learning techniques to improve landslide susceptibility mapping
Abstract. In recent years, several catastrophic landslide events have been observed throughout the globe, significantly affecting the loss of lives, infrastructure, everyday life and livelihood. To minimize the impact of landslides and issue early warnings, landslide susceptibility maps (LSM) are essential. Aim to improve the accuracy of LSM, this study applied a random selection of non-landslide samples and low accuracy of individual classifiers using machine learning (ML) techniques, coupled with ensemble learning and ML, for LSM. China's Zigui-Badong section of the Three Gorges Reservoir area (TGRA) was considered a case study. Twelve influencing factors were selected as inputs for modelling, and the relationship between each causal factor and landslide spatial development was quantitatively analyzed. A total of 179 landslides were identified in the present study. About 70 % of the landslide pixels were randomly considered for training, and the remaining 30 % were used for validation. The Logistic Regression model (LR) was applied to produce an initial susceptibility map, and the non-landslide samples were selected within the classified low-susceptibility area. Subsequently, two ML classifiers – the Classification and Regression Tree (CART), and the Multi-Layer Perceptron (MLP), and four coupling models – the CART-Bagging, CART-Boosting, MLP-Bagging, and MLP-Boosting, were utilized for LSM. Finally, the receiver operating characteristics (ROC) curve and statistical analysis were applied for accuracy assessment. The results show that elevation and distance to rivers were the main causal factors of landslide development in the study area. The modeling accuracy of LR-MLP was calculated approx. 0.901, which is higher than the LR-CART (0.889). The LR-MLP-Boosting performed the best with an accuracy of 0.986 followed by the LR-CART-Bagging (0.973), LR-CART-Boosting (0.981), and LR-MLP-Bagging (0.978). The accuracy has been improved compared with the NO-CART, NO-MLP, NO-CART-Bagging, NO-CART-Boosting, NO-MLP-Bagging, and NO-MLP-Boosting models. Four ensemble models outperformed their corresponding classifiers, while Boosting outperforms Bagging. Overall, the combination of ensemble learning and ML effectively improved the accuracy of LSM. The LR model can effectively constrain the selection range of non-landslide samples and enhance the quality of sample selection. Our results show promise to map susceptible landslides locations which will help to monitor for an early warning of the landside.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(3081 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on nhess-2023-44', Anonymous Referee #1, 07 Jun 2023
The aim of this study is to improve landslide susceptibility mapping in a region of the Three Gorges Reservoir area of China. To achieve this, the authors rely on an inventory of 179 landslides. This inventory is then used to feed machine learning models and compare the methods. The authors show that in terms of accuracy, models outputs are rather similar. They conclude that their analysis will help for early warning.
Find below my comments.
Overall goal of the research:
Studies comparing a range of statistical methods are very common (e.g. Reichenbach et al., 2018; Merghabi et al., 2020). Furthermore, here the justification of using these modelling approaches specifically is poorly addressed. If one key motivation of the study is this quantitative objective, one would expect that, for example, the effect of spatial resolution in the relevance of the predictors is better discussed, that the spatial correlation and also to the association between each predictor variables and the occurrence of landslides is better discussed (better tested), that the sensitivity of the models are tested (various cal/val partitioning, various combination of predictors, etc.), that the info that represents the landslides (pixel, trigger zone, etc.) are also at least discussed. Here, none of those issues are actually addressed. In addition, the models are calibrated and validated with 179 landslides. Typically, this is not a large datasets and models are clearly sensitive to the size of the inventory (e.g. Steger et al., 2016; Depicker et al., 2020); such an issue would need at least a discussion.
Landslide inventory:
It is clear that the landslide inventory contains processes of various sizes and ages; which implies that the predisposing and triggering factors can be different (rainfall or earthquake triggered landslides can be associated to different environmental conditions :Sidle et al., 2016; Fan et al., 2019; Jones et al., 2021). Furthermore, there is the role of the lake water level. In addition, for landslides along roads and in built environments, most of the conditions that are at their origin (or reactivation) have been strongly altered with respect to the information that can be extracted from ancillary data. For example, the spatial resolution of 30 m DEM used in this analysis does not allow to capture the road cuts in the hillslopes.
Susceptibility analysis:
The susceptibility analysis suffers from a lack of process understanding. Several processes are addressed at the same time. Nothing is said about it. Processes that could be modelled separately are put together and depth distinction is not discussed (Sidle and Bogaard, 2016); deep-seated landslides being not (or in a very limited way) influenced by vegetation cover. The use of slope gradient in the analysis is another example of questioning, especially in the context man-made landform.
The inventory contains landslides from historical records. However, in this research, only the current LULC is used, and nothing about the recent changes of this factors is considered. For the shallow landslides, one could expect that they are sensitive not only to the current LULC pattern, but also to its recent changes (see e.g. Sidle and Bogaard, 2016).
There is no information on how a landslide is processed in the analysis; is this one pixel, the whole trigger zone, the whole landslide body? For example, one cannot model together at the same time source and runout zones (Emberson et al., 2021).
For the largest landslides; was the pre-failure slope topography considered? This has also implication for the susceptibility analysis (for example, Steger et al., 2020).
Discussion of the susceptibility results is poor:
In such an analysis, one would expect that the association between each predictor variables and the occurrence of landslides is better discussed (better tested), that the info that represents the landslides (pixel, trigger zone, etc.) are also at least discussed. Unfortunately, there is no real discussion framed around the understanding of the processes that are analyzed.
This study, like many others, shows that at the end, the differences between models (in terms of quantitative prediction performance) are rather marginal. None of the different modelling approaches really stand out. It has been shown that high AUC values can be achieved with unrealistic landslide susceptibility maps (Steger et al., 2016). In such a context, the strongest effort should be put on the plausibility of the maps in line with the quality of the independent predictors (with regard to a.o. resolution and scale issues) and their relevance for understanding the processes at play. Here, such a discussion is not at all present.
The authors overrate the relevance of their assessment when, for example, they say that it has the potential to be used for early warning system. How such a model with such a limited understanding of the landslide processes could be used by stakeholders at this spatial scale?
To conclude, this research looks like modelling-based recipes are applied in a rather blind way without really providing information that is detailed enough to proof the reliability of the results. One could argue that despite missing the point of applying methods in a relevant and robust way such a study would bring a better understanding of the processes at play. Such case studies are still of high value. However, there is very little in the present research that allows us to say that we have learned something really new of relevant interest for the international scientific community, especially with respect to large body of literature that exists in landslide research in the region.
I hope that these comments will be useful to the authors to improve their analysis.
References
Note that here I provide a list of work that I think could be useful for the authors to reflect on how to improve their research. This list is not exhaustive.
Depicker, A., Jacobs, L., Delvaux, D., Havenith, H.B., Mateso, J.C.M., Govers, G. and Dewitte, O., 2020. The added value of a regional landslide susceptibility assessment: The western branch of the East African Rift. Geomorphology, 353, p.106886.
Emberson, R. a., Kirschbaum, D.B., Stanley, T., 2021. Landslide Hazard and Exposure Modelling in Data-Poor Regions: The Example of the Rohingya Refugee Camps in Bangladesh. Earth’s Future 9, 1–22. doi:10.1029/2020EF001666
Fan, X., Scaringi, G., Korup, O., West, A.J., van Westen, C.J., Tanyas, H., Hovius, N., Hales, T.C., Jibson, R.W., Allstadt, K.E. and Zhang, L., 2019. Earthquake‐induced chains of geologic hazards: Patterns, mechanisms, and impacts. Reviews of geophysics, 57(2), pp.421-503.
Jones, J.N., Boulton, S.J., Bennett, G.L., Stokes, M., 2021. 30-year record of Himalaya mass-wasting reveals landscape perturbations by extreme events. Nature Communications 12, 6701. doi:10.1038/s41467-021-26964-8
Merghadi, Abdelaziz, Ali P. Yunus, Jie Dou, Jim Whiteley, Binh ThaiPham, Dieu Tien Bui, Ram Avtar, and Boumezbeur Abderrahmane. "Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance." Earth-Science Reviews 207 (2020): 103225
Reichenbach, P., Rossi, M., Malamud, B.D., Mihir, M., Guzzetti, F., 2018. A review of statistically-based landslide susceptibility models. Earth-Science Reviews 180, 60–91. doi:10.1016/j.earscirev.2018.03.001
Sidle, R.C., Bogaard, T.A., 2016. Dynamic earth system and ecological controls of rainfall-initiated landslides. Earth-Science Reviews 159, 275–291. doi:10.1016/j.earscirev.2016.05.013
Steger, S., Brenning, A., Bell, R., Petschko, H., Glade, T., 2016. Exploring discrepancies between quantitative validation results and the geomorphic plausibility of statistical landslide susceptibility maps. Geomorphology 262, 8–23. doi:10.1016/j.geomorph.2016.03.015
Steger, S., Mair, V., Kofler, C., Pittore, M., Zebisch, M., Schneiderbauer, S., 2021. Correlation does not imply geomorphic causation in data-driven landslide susceptibility modelling – Benefits of exploring landslide data collection effects. Science of the Total Environment 776, 145935. doi:10.1016/j.scitotenv.2021.145935
Steger, S., Schmaltz, E., Glade, T., 2020. The (f)utility to account for pre-failure topography in data-driven landslide susceptibility modelling. Geomorphology 354, 107041. doi:10.1016/j.geomorph.2020.107041
Citation: https://doi.org/10.5194/nhess-2023-44-RC1 -
EC1: 'Comment on nhess-2023-44', Marc van den Homberg, 03 Nov 2023
We thank the authors for their submission and the reviewer for his review. We have decided to close the discussion and we do not recommend a resubmission.
Citation: https://doi.org/10.5194/nhess-2023-44-EC1
Interactive discussion
Status: closed
-
RC1: 'Comment on nhess-2023-44', Anonymous Referee #1, 07 Jun 2023
The aim of this study is to improve landslide susceptibility mapping in a region of the Three Gorges Reservoir area of China. To achieve this, the authors rely on an inventory of 179 landslides. This inventory is then used to feed machine learning models and compare the methods. The authors show that in terms of accuracy, models outputs are rather similar. They conclude that their analysis will help for early warning.
Find below my comments.
Overall goal of the research:
Studies comparing a range of statistical methods are very common (e.g. Reichenbach et al., 2018; Merghabi et al., 2020). Furthermore, here the justification of using these modelling approaches specifically is poorly addressed. If one key motivation of the study is this quantitative objective, one would expect that, for example, the effect of spatial resolution in the relevance of the predictors is better discussed, that the spatial correlation and also to the association between each predictor variables and the occurrence of landslides is better discussed (better tested), that the sensitivity of the models are tested (various cal/val partitioning, various combination of predictors, etc.), that the info that represents the landslides (pixel, trigger zone, etc.) are also at least discussed. Here, none of those issues are actually addressed. In addition, the models are calibrated and validated with 179 landslides. Typically, this is not a large datasets and models are clearly sensitive to the size of the inventory (e.g. Steger et al., 2016; Depicker et al., 2020); such an issue would need at least a discussion.
Landslide inventory:
It is clear that the landslide inventory contains processes of various sizes and ages; which implies that the predisposing and triggering factors can be different (rainfall or earthquake triggered landslides can be associated to different environmental conditions :Sidle et al., 2016; Fan et al., 2019; Jones et al., 2021). Furthermore, there is the role of the lake water level. In addition, for landslides along roads and in built environments, most of the conditions that are at their origin (or reactivation) have been strongly altered with respect to the information that can be extracted from ancillary data. For example, the spatial resolution of 30 m DEM used in this analysis does not allow to capture the road cuts in the hillslopes.
Susceptibility analysis:
The susceptibility analysis suffers from a lack of process understanding. Several processes are addressed at the same time. Nothing is said about it. Processes that could be modelled separately are put together and depth distinction is not discussed (Sidle and Bogaard, 2016); deep-seated landslides being not (or in a very limited way) influenced by vegetation cover. The use of slope gradient in the analysis is another example of questioning, especially in the context man-made landform.
The inventory contains landslides from historical records. However, in this research, only the current LULC is used, and nothing about the recent changes of this factors is considered. For the shallow landslides, one could expect that they are sensitive not only to the current LULC pattern, but also to its recent changes (see e.g. Sidle and Bogaard, 2016).
There is no information on how a landslide is processed in the analysis; is this one pixel, the whole trigger zone, the whole landslide body? For example, one cannot model together at the same time source and runout zones (Emberson et al., 2021).
For the largest landslides; was the pre-failure slope topography considered? This has also implication for the susceptibility analysis (for example, Steger et al., 2020).
Discussion of the susceptibility results is poor:
In such an analysis, one would expect that the association between each predictor variables and the occurrence of landslides is better discussed (better tested), that the info that represents the landslides (pixel, trigger zone, etc.) are also at least discussed. Unfortunately, there is no real discussion framed around the understanding of the processes that are analyzed.
This study, like many others, shows that at the end, the differences between models (in terms of quantitative prediction performance) are rather marginal. None of the different modelling approaches really stand out. It has been shown that high AUC values can be achieved with unrealistic landslide susceptibility maps (Steger et al., 2016). In such a context, the strongest effort should be put on the plausibility of the maps in line with the quality of the independent predictors (with regard to a.o. resolution and scale issues) and their relevance for understanding the processes at play. Here, such a discussion is not at all present.
The authors overrate the relevance of their assessment when, for example, they say that it has the potential to be used for early warning system. How such a model with such a limited understanding of the landslide processes could be used by stakeholders at this spatial scale?
To conclude, this research looks like modelling-based recipes are applied in a rather blind way without really providing information that is detailed enough to proof the reliability of the results. One could argue that despite missing the point of applying methods in a relevant and robust way such a study would bring a better understanding of the processes at play. Such case studies are still of high value. However, there is very little in the present research that allows us to say that we have learned something really new of relevant interest for the international scientific community, especially with respect to large body of literature that exists in landslide research in the region.
I hope that these comments will be useful to the authors to improve their analysis.
References
Note that here I provide a list of work that I think could be useful for the authors to reflect on how to improve their research. This list is not exhaustive.
Depicker, A., Jacobs, L., Delvaux, D., Havenith, H.B., Mateso, J.C.M., Govers, G. and Dewitte, O., 2020. The added value of a regional landslide susceptibility assessment: The western branch of the East African Rift. Geomorphology, 353, p.106886.
Emberson, R. a., Kirschbaum, D.B., Stanley, T., 2021. Landslide Hazard and Exposure Modelling in Data-Poor Regions: The Example of the Rohingya Refugee Camps in Bangladesh. Earth’s Future 9, 1–22. doi:10.1029/2020EF001666
Fan, X., Scaringi, G., Korup, O., West, A.J., van Westen, C.J., Tanyas, H., Hovius, N., Hales, T.C., Jibson, R.W., Allstadt, K.E. and Zhang, L., 2019. Earthquake‐induced chains of geologic hazards: Patterns, mechanisms, and impacts. Reviews of geophysics, 57(2), pp.421-503.
Jones, J.N., Boulton, S.J., Bennett, G.L., Stokes, M., 2021. 30-year record of Himalaya mass-wasting reveals landscape perturbations by extreme events. Nature Communications 12, 6701. doi:10.1038/s41467-021-26964-8
Merghadi, Abdelaziz, Ali P. Yunus, Jie Dou, Jim Whiteley, Binh ThaiPham, Dieu Tien Bui, Ram Avtar, and Boumezbeur Abderrahmane. "Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance." Earth-Science Reviews 207 (2020): 103225
Reichenbach, P., Rossi, M., Malamud, B.D., Mihir, M., Guzzetti, F., 2018. A review of statistically-based landslide susceptibility models. Earth-Science Reviews 180, 60–91. doi:10.1016/j.earscirev.2018.03.001
Sidle, R.C., Bogaard, T.A., 2016. Dynamic earth system and ecological controls of rainfall-initiated landslides. Earth-Science Reviews 159, 275–291. doi:10.1016/j.earscirev.2016.05.013
Steger, S., Brenning, A., Bell, R., Petschko, H., Glade, T., 2016. Exploring discrepancies between quantitative validation results and the geomorphic plausibility of statistical landslide susceptibility maps. Geomorphology 262, 8–23. doi:10.1016/j.geomorph.2016.03.015
Steger, S., Mair, V., Kofler, C., Pittore, M., Zebisch, M., Schneiderbauer, S., 2021. Correlation does not imply geomorphic causation in data-driven landslide susceptibility modelling – Benefits of exploring landslide data collection effects. Science of the Total Environment 776, 145935. doi:10.1016/j.scitotenv.2021.145935
Steger, S., Schmaltz, E., Glade, T., 2020. The (f)utility to account for pre-failure topography in data-driven landslide susceptibility modelling. Geomorphology 354, 107041. doi:10.1016/j.geomorph.2020.107041
Citation: https://doi.org/10.5194/nhess-2023-44-RC1 -
EC1: 'Comment on nhess-2023-44', Marc van den Homberg, 03 Nov 2023
We thank the authors for their submission and the reviewer for his review. We have decided to close the discussion and we do not recommend a resubmission.
Citation: https://doi.org/10.5194/nhess-2023-44-EC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
672 | 209 | 51 | 932 | 43 | 49 |
- HTML: 672
- PDF: 209
- XML: 51
- Total: 932
- BibTeX: 43
- EndNote: 49
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
2 citations as recorded by crossref.
- A Comprehensive Comparison of Stable and Unstable Area Sampling Strategies in Large-Scale Landslide Susceptibility Models Using Machine Learning Methods M. Sinčić et al. 10.3390/rs16162923
- GIS-based landslide susceptibility assessment using random forest and support vector machine models: A case study for Chin State, Myanmar S. Tun 10.13168/AGG.2024.0019