Modeling and evaluation of the susceptibility to landslide events using machine learning algorithms in the province of Cha&ntilde;aral, Atacama region, Chile

Parra, Francisco; González, Jaime; Chacón, Max; Marín, Mauricio

doi:https://doi.org/10.5194/nhess-2023-72

Preprints

https://doi.org/10.5194/nhess-2023-72

Preprints

23 Jun 2023

| 23 Jun 2023

Status: this discussion paper is a preprint. It has been under review for the journal Natural Hazards and Earth System Sciences (NHESS). The manuscript was not accepted for further review after discussion.

Modeling and evaluation of the susceptibility to landslide events using machine learning algorithms in the province of Chañaral, Atacama region, Chile

Francisco Parra, Jaime González, Max Chacón, and Mauricio Marín

Abstract. Landslides represent one of the main geological hazards, especially in Chile. The main purpose of this study is to evaluate the application of machine learning algorithms (Support vector vachine, Random forest, XGBoost and logistic regression) and compare the results for the modeling of landslides susceptibility in the province of Chañaral, III region, Chile. A total of 86 sites are identified using various sources, plus another 86 sites as non-landslides, which are randomly divided, and then a cross-validation process is applied to calculate the accuracy of the models. After that, from 23 conditioning factors, 12 were chosen based on the information gain ratio (IGR). Subsequently, 5 factors are excluded by the correlation criterion, of which 2 that have not been used in the literature (Normalized difference glacier index, enhanced vegetation index) are used. The performance of the models is evaluated through the area under the ROC curve (AUC). To study the statistical behavior of the model, the Friedman non-parametric test is performed to compare the performance with the other algorithms and the Nemenyi test for pairwise comparison. Of the algorithms used, the RF (AUC = 0.9095) and the SVM (AUC = 0.9089) has the highest accuracy values measured in AUC compared to the other models and can be used for the same purpose in other geographic areas with similar characteristics. The findings of this investigation have the potential to assist in land use planning, landslide risk reduction, and informed decision making in the surrounding zones.

Received: 07 May 2023 – Discussion started: 23 Jun 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Francisco Parra, Jaime González, Max Chacón, and Mauricio Marín

Status: closed

RC1:
'Comment on nhess-2023-72', Anonymous Referee #1, 12 Jul 2023
This study evaluates the effectiveness of machine learning algorithms in predicting landslide susceptibility in the Chañaral province of Chile. The researchers used SVM, RF, XGBoost, and logistic regression algorithms and compared their results. They identified 86 landslide sites and randomly selected another 86 non-landslide sites for analysis. Overall, the research highlights the potential of machine learning in landslide susceptibility modeling.
I believe the work must be rejected for the following reasons:
This study faces limitations due to the relatively small dataset consisting of only 86 landslide sites and 86 non-landslide sites. This sample size is not sufficient for effective training, tuning, and evaluation of the machine learning models, especially considering the complexity introduced by the 25 variables.

The research lacks clarity regarding its novelty. The study does not explicitly highlight any novel approaches, methodologies, or findings that distinguish it from existing research.
Citation: https://doi.org/10.5194/nhess-2023-72-RC1
- AC1: 'Reply on RC1', Francisco Parra, 14 Jul 2023
  
  Thank you for your feedback. Here are our answers to the previous two statements.
  First Statement
  In terms of disaster risk management, for a predictive model to be useful in practice to a given territory in the absence of wide sensors networks, the ability to produce reasonable results from small training datasets is of paramount importance.
  Most of the existing works in the literature use data sets of similar size to the one presented in our paper. This, considering that for this type of studies the cost and difficulty of obtaining the data is very high. Therefore, the analysis performed considers data science techniques for small datasets, such as repeated cross-validation, which allows to decrease the models' overfitting bias.
  Furthermore, the models built in this work used only 7 variables (the initial 25 variables were proposed as candidates and then discarded using techniques derived from information theory and correlation). Therefore, we believe that the analysis performed with this dataset effectively performs the training, tuning and evaluation of machine learning models. Examples of such work are as follows:
  - In (Achu, 2023), 82 locations and 11 conditioning factors were used.
  - In (Al-Najjar, 2021) 156 locations and 16 conditioning factors were used.
  - In (Al-Najjar, 2021b), 35 locations and 11 conditioning factors were used.
  - In (Alqadhi, 2021), 50 locations and 12 conditioning factors were used.
  - In (Arabameri, 2020), 249 locations and 16 conditioning factors were used.
  - In (Arabameri, 2022), 240 locations and 18 conditioning factors were used.
  - In (Abedini, 2018), 60 locations and 8 conditioning factors were used
  - In (Chen, 2017), 288 locations and 18 conditioning factors were used
  - In (Chen, 2020) 209 locations and 16 conditioning factors were used
  - In (Deng, 2022) 155 locations and 12 conditioning factors were used
  - In (Hong, 2018), 237 locations and 15 conditioning factors were used.
  - In (Hong, 2020), 79 locations and 14 conditioning factors were used.
  - In (Hu, 2021), 114 locations and 10 conditioning factors were used.
  - In (Hussain, 2022) 94 locations and 9 conditioning factors were used.
  - In (Mehrabi, 2021) 92 locations and 13 conditioning factors were used.
  - In (Nhu, 2020), 152 locations and 17 conditioning factors were used.
  - In (Nirbhav, 2023) 54 locations and 9 conditioning factors were used.
  - In (Nurwatik, 2022) 176 locations and 12 conditioning factors were used.
  - In (Pham, 2020), 167 locations and 12 conditioning factors were used.
  - In (Sahin, 2020) 105 removal locations and 15 conditioning factors were used.
  - In (Saha, 2021) 91 locations and 21 conditioning factors were used.
  - In (Shahabi, 2022) 64 locations and 14 conditioning factors were used.
  - In (Tsangaratos, 2017), 112 locations and 11 conditioning factors were used
  - In (Vasu, 2016), 163 locations and 13 conditioning factors were used
  - In (Wu, 2020) 171 locations and 11 conditioning factors were used.
  - In (Youssef, 2022), 243 locations and 12 conditioning factors were used.
  In our case, only 86 locations and only 7 conditioning factors were used (which reduces the complexity of the model).
  Second statement
  From our point of view, this study presents three novelties that could distinguish it from the rest of the literature.
  - The approach in this work for the evaluation of the models is through a repeated cross-validation, unlike the vast majority of the works present in the literature, in which they separate the data set into an arbitrary percentage and perform the calculation of the metrics once, instead of creating a statistical distribution, as is done in the experiments section of our article, thus providing a more complete verification of the results obtained.
  - The factors used in the construction of the model proposed in our article come exclusively from satellite images and digital elevation models, unlike other studies, which consider sources of information with a greater number of data and are therefore more difficult for disaster risk management analysts to apply in practice. The approach has the advantage that it can allow the generation of systems that create susceptibility maps based on the periodic updating of satellite images, which can contribute to the creation of a susceptibility monitoring system that can be implemented by technical agencies in the disaster area.
  - In this work, two factors derived from satellite images are used to build the model: EVI (Enhanced vegetation index) and NDGI (normalized difference glacier index). The novelty is that they have not been used before in the literature, thus providing new factors that can be considered for the construction of future models. In addition, it is shown that a factor such as the VD (valley depth index) is highly relevant for areas similar to the one studied, with narrow and steep valleys.
  References
  - Achu, A. L., Thomas, J., Aju, C. D., Remani, P. K., & Gopinath, G. (2023). Performance evaluation of machine learning and statistical techniques for modelling landslide susceptibility with limited field data. _Earth Science Informatics_, _16_(1), 1025-1039.
  - Al-Najjar, H. A., & Pradhan, B. (2021). Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks. _Geoscience Frontiers_, _12_(2), 625-637.
  - Al-Najjar, H. A., Pradhan, B., Kalantar, B., Sameen, M. I., Santosh, M., & Alamri, A. (2021). Landslide susceptibility modeling: An integrated novel method based on machine learning feature transformation. _Remote Sensing_, _13_(16), 3281.
  - Alqadhi, S., Mallick, J., Talukdar, S., Bindajam, A. A., Saha, T. K., Ahmed, M., & Khan, R. A. (2022). Combining logistic regression-based hybrid optimized machine learning algorithms with sensitivity analysis to achieve robust landslide susceptibility mapping. _Geocarto International_, _37_(25), 9518-9543.
  - Arabameri, A., Saha, S., Roy, J., Chen, W., Blaschke, T., & Tien Bui, D. (2020). Landslide susceptibility evaluation and management using different machine learning methods in the Gallicash River Watershed, Iran. _Remote Sensing_, _12_(3), 475.
  - Arabameri, A., Chandra Pal, S., Rezaie, F., Chakrabortty, R., Saha, A., Blaschke, T., ... & Thi Ngo, P. T. (2022). Decision tree based ensemble machine learning approaches for landslide susceptibility mapping. _Geocarto International_, _37_(16), 4594-4627.
  - Abedini, M., Ghasemian, B., Shirzadi, A., & Bui, D. T. (2019). A comparative study of support vector machine and logistic model tree classifiers for shallow landslide susceptibility modeling. _Environmental Earth Sciences_, _78_, 1-15.
  - Chen, W., Shirzadi, A., Shahabi, H., Ahmad, B. B., Zhang, S., Hong, H., & Zhang, N. (2017). A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. _Geomatics, Natural Hazards and Risk_, _8_(2), 1955-1977.
  - Chen, W., & Li, Y. (2020). GIS-based evaluation of landslide susceptibility using hybrid computational intelligence models. _Catena_, _195_, 104777.
  - Deng, N., Li, Y., Ma, J., Shahabi, H., Hashim, M., de Oliveira, G., & Chaeikar, S. S. (2022). A comparative study for landslide susceptibility assessment using machine learning algorithms based on grid unit and slope unit. _Frontiers in Environmental Science_, _10_, 1009433.
  - Hong, H., Liu, J., Bui, D. T., Pradhan, B., Acharya, T. D., Pham, B. T., ... & Ahmad, B. B. (2018). Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China). _Catena_, _163_, 399-413.
  - Hong, H., Liu, J., & Zhu, A. X. (2020). Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. _Science of the total environment_, _718_, 137231.
  - Hu, H., Wang, C., Liang, Z., Gao, R., & Li, B. (2021). Exploring Complementary Models Consisting of Machine Learning Algorithms for Landslide Susceptibility Mapping. _ISPRS International Journal of Geo-Information_, _10_(10), 639.
  - Hussain, M. A., Chen, Z., Wang, R., Shah, S. U., Shoaib, M., Ali, N., ... & Ma, C. (2022). Landslide susceptibility mapping using machine learning algorithm. _Civ. Eng. J_, _8_, 209-224.
  - Mehrabi, M. (2021). Landslide susceptibility zonation using statistical and machine learning approaches in Northern Lecco, Italy. _Natural Hazards_, 1-37.
  - Nhu, V. H., Mohammadi, A., Shahabi, H., Ahmad, B. B., Al-Ansari, N., Shirzadi, A., ... & Nguyen, H. (2020). Landslide susceptibility mapping using machine learning algorithms and remote sensing data in a tropical environment. _International journal of environmental research and public health_, _17_(14), 4933.
  - Nirbhav, Malik, A., Maheshwar, Prasad, M., Saini, A., & Long, N. T. (2023). A comparative study of different machine learning models for landslide susceptibility prediction: a case study of Kullu-to-Rohtang pass transport corridor, India. _Environmental Earth Sciences_, _82_(7), 167.
  - Nurwatik, N., Ummah, M. H., Cahyono, A. B., Darminto, M. R., & Hong, J. H. (2022). A Comparison Study of Landslide Susceptibility Spatial Modeling Using Machine Learning. _ISPRS International Journal of Geo-Information_, _11_(12), 602.
  - Pham, B. T., Nguyen-Thoi, T., Qi, C., Van Phong, T., Dou, J., Ho, L. S., ... & Prakash, I. (2020). Coupling RBF neural network with ensemble learning techniques for landslide susceptibility mapping. _Catena_, _195_, 104805.
  - Saha, S., Roy, J., Hembram, T. K., Pradhan, B., Dikshit, A., Abdul Maulud, K. N., & Alamri, A. M. (2021). Comparison between deep learning and tree-based machine learning approaches for landslide susceptibility mapping. _Water_, _13_(19), 2664.
  - Sahin, E. K. (2020). Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. _SN Applied Sciences_, _2_(7), 1308.
  - Shahabi, H., Ahmadi, R., Alizadeh, M., Hashim, M., Al-Ansari, N., Shirzadi, A., ... & Ariffin, E. H. (2023). Landslide Susceptibility Mapping in a Mountainous Area Using Machine Learning Algorithms. _Remote Sensing_, _15_(12), 3112.
  - Tsangaratos, P., Ilia, I., Hong, H., Chen, W., & Xu, C. (2017). Applying Information Theory and GIS-based quantitative methods to produce landslide susceptibility maps in Nancheng County, China. _Landslides_, _14_, 1091-1111.
  - Vasu, N. N., & Lee, S. R. (2016). A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon, South Korea. _Geomorphology_, _263_, 50-70.
  - Wu, Y., Ke, Y., Chen, Z., Liang, S., Zhao, H., & Hong, H. (2020). Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. _Catena_, _187_, 104396.
  - Youssef, A. M., & Pourghasemi, H. R. (2021). Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin, Asir Region, Saudi Arabia. _Geoscience Frontiers_, _12_(2), 639-655.
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC1
RC2:
'Comment on nhess-2023-72', Anonymous Referee #2, 13 Jul 2023
The manuscript aims to assess landslide susceptibility using a limited database and machine learning approach. Previous studies have identified a gap in landslide research throughout South America (https://doi.org/10.5194/nhess-15-1821-2015), and this manuscript seeks to contribute to filling that gap by providing alternatives to landslide susceptibility. I encourage the authors to include studies over the Andes region that support their study.
The contribution can potentially become an important resource for the landslide community. However, I believe the manuscript is poorly written, exhibiting shortcomings and over-explanation throughout the document. Additionally, the authors claim to identify novelty without effectively addressing any knowledge gaps. I have identified several major comments the manuscript should address to improve its effectiveness and readiness for future review. I have the following concerns that the authors must correct/modify indemediately

The documents has an extension oversized, showing irrelevant information of data. Moreover, the new science is not found. Preliminary studies in Andes tried to evaluate susceptibility changes under different precipitation scenarios (https://doi.org/10.3390/w15142514 and https://doi.org/10.1016/j.jsames.2022.103824). I encourage to the authors to review new knowledge that their results could contribute to Central Andes.

Mainly, the introduction lacks clarity regarding the research question and fails to adequately address the research gap. It is unclear whether the authors focus on Rainfall-Induced Landslides or Earthquake-Induced Landslides. This distinction becomes crucial in supporting the use of different data in the machine learning models.

The zone in question is characterized by a limited landslide catalog (please state that you consider earthquake or rainfall induced landslides, not both), which is a common challenge in the Central and Southern Andes region. Previous studies have successfully employed data augmentation techniques to overcome these limitations (https://doi.org/10.1007/s10346-022-01981-w and https://doi.org/10.3390/w15142514). By utilizing data augmentation (DA), it is possible to significantly enhance the limited landslide dataset and provide valuable support for machine learning models. Furthermore, DA effectively addresses class imbalance issues by generating additional samples from underrepresented classes, leading to improved performance and accuracy in classification tasks. I strongly urge the authors to investigate strategies for implementing data augmentation in the study zone to evaluate changes in susceptibility.

Section 2 appears to be poorly organized. The geology of the study area is insufficiently covered, so I recommend including additional information to provide a comprehensive overview. Furthermore, the database section lacks cohesion and is scattered across different subsections. I suggest merging sections 2.6 and 2.7 into a single subsection that covers each machine learning model in detail.

The results of the study are concerning, particularly regarding the weight assigned to "slope". It is widely understood that slope plays a crucial role in landslide generation. I strongly recommend re-evaluating the VD variable and their relationship with the actual location of landslide emplacement. It is suspected that many landslides are occurring in hillslope areas within the basin, which could introduce a bias in the results. To address this, the authors should consider removing "slope" from the initial model and assess the impact on the new results.

After conducting a quick review of the Andes region, it is evident that the results of the study suggest a superior performance compared to similar studies that utilized logistic regression (e.g., https://doi.org/10.5194/nhess-22-2169-2022 and https://doi.org/10.1007/s11069-020-03913-0). I strongly urge the authors to consider the differences and similarities that contribute to the improved performance of the models used in this study or explore alternative models that could further enhance the analysis in the context of the Andes region.

Section 3.4 requires a complete rewrite. The thresholds used to define the different levels of susceptibility are currently unclear. It is essential for the authors to justify these values using a reproducible approach.
The content of Section 4 appears to contain sentences and information that should be part of the results section. It is unclear how the authors can discuss the findings without prior presentation of the results. I suggest assessing the relevance of each model and discussing the spatial variability, taking into account additional parameters such as geology and slope. The section currently lacks strength and does not contribute new scientific insights.
Minors comments:
Line 224: Please, consider revising "contest" using a similar word.
Citation: https://doi.org/10.5194/nhess-2023-72-RC2
- AC2: 'Reply on RC2', Francisco Parra, 24 Jul 2023
  
  Thank you very much for your detailed reply.
  Here is the answer, broken down into parts:
  I encourage the authors to include studies over the Andes region that support their study.
  After an extensive and updated literature review, we found few publications linked to susceptibility assessment in the Andes. In this regard, we found that in (Ospina-Gutiérrez, 2021) a susceptibility mapping was performed in a different Andean area in terms of geomorphology and climate, but like our study, the most successful algorithm corresponds to Random Forest. In (Brenning, 2015) they use GAM models for the calculation of susceptibility in areas near roads, and here they note the importance of curvature, like our study, as an important factor in the calculation of susceptibility. In (Lizama, 2022), they also found the relevance in the curvature. Finally, in (Buecchi, 2019) found that they can build useful and effective landslide susceptibility maps using only the DEM of the zone, holding the results obtained in this work, which also uses satellite imagery.
  In the new version of our manuscript we will include these works on the Andes areas.
  The manuscript is poorly written, exhibiting shortcomings and over-explanation throughout the document
  The parts that were observed will be duly reviewed and corrected. In particular, the following will be done:
  * The introduction will be fixed, eliminating redundant parts and clearly stating the research questions and gaps that this work seeks to cover.
  
  * In the study area section, the geological information on the area will be supplemented.
  
  * The inventory section will be arranged more concisely and sections 2.6 and 2.7 will be brought together, as previously suggested.
  
  * Paragraphs from the discussion section that were erroneously added in the discussion section will be included in the results section.
  
  * The discussion will include a paragraph discussing similar publications on susceptibility to removal in the Andes that support the findings of this paper.
  All these improvements are included in the attached zipped file.
  The documents has an extension oversized, showing irrelevant information of data
  As mentioned above, redundant parts will be removed.
  I encourage to the authors to review new knowledge that their results could contribute to Central Andes
  \- The approach for the evaluation of the models is through repeated cross-validation. This is different from the vast majority of the works presented in the literature where they separate the dataset into an arbitrary percentage and perform the calculation of the metrics at once, instead of creating a statistical distribution as it is done in the experiments section of our article. Our approach provides a more comprehensive verification of the results obtained than previous work.
  \- Unlike previous works, the data sources used in the construction of the model proposed in our article come exclusively from satellite images and digital elevation models which consider sources of information with a greater amount of data. This enables the disaster risk analysts to consider these sources of data in practice. Our approach has the advantage that it can allow the generation of systems that create susceptibility maps based on the periodic updating of the satellite images, which can contribute to the creation of a susceptibility monitoring system that can be implemented by technical agencies to implement early warning systems.
  \- In this work, two features derived from satellite images are used to build the model: EVI (Enhanced vegetation index) and NDGI (normalized difference glacier index). The novelty is that they have not been used before in the literature, thus enabling the application of this source of useful data in the construction of future models derived from our approach. In addition, in our paper it is shown that a dataset such as the VD (valley depth index) is highly relevant for areas similar to the one we studied where there exist narrow and steep valleys.
  The introduction lacks clarity regarding the research question and fails to adequately address the research gap.
  In the zipped file you will find the new introduction, with the problems presented here solved.
  It is unclear whether the authors focus on Rainfall-Induced Landslides or Earthquake-Induced Landslides
  The landslides studied correspond to those induced by rainfall. It will be clarified in the text.
  I strongly urge the authors to investigate strategies for implementing data augmentation in the study zone to evaluate changes in susceptibility
  In the suggested publications, it was not possible to find a suitable methodology for this case. However, in (Dou, 2019) it is suggested that by using samples from the landslide scarp polygon, it is possible to increase the accuracy of the model. In this work, the center of the landslide body was being used to characterize the phenomenon. Therefore, to increase the dataset, it was decided to take 10 samples (pixels) within the scarp polygon of each landslide, and in the case of the non-landslide points, a polygon surrounding the initial point was created, and from this, 10 samples are taken within this polygon, which allows to multiply by 10 the studied dataset, and thus provide the statistical robustness needed in this work. Preliminary results are attached, where it is possible to appreciate that the ROC of Random Forest increased from 0.9 to 0.95. XGBoost also improves its performance, and SVM and LR do not improve.
  The geology of the study area is insufficiently covered
  An updated geological section is attached in the zipped file.
  The database section lacks cohesion and is scattered across different subsections. I suggest merging sections 2.6 and 2.7 into a single subsection that covers each machine learning model in detail.
  The new layout of section 2 of the paper will be as follows:
  2.1: Characteristics of the study area and data set.
  2.2: Conditioning factors and selection of factors
  2.2.1: IGR technique
  2.2.2: Correlation calculation
  2.3: Modeling using Machine Learning
  2.3.1: SVM
  2.3.2: LR
  2.3.3: RF
  2.3.4: Xgboost
  2.3.5: Hyperparameter Optimization
  2.3.6: Cross Validation
  2.3.7: Model validation
  In the zipped file the updated section is attached.
  I strongly recommend re-evaluating the VD variable and their relationship with the actual location of landslide emplacement
  The variable VD (Valley Depth) in the study refers to the vertical distance to a base level of the hydrographic network. This index is calculated using an algorithm that involves interpolating the elevation of the base level of the hydrographic network and then subtracting this base level from the original elevations.
  In the context of the study, the valley depth (VD) index was found to provide a lot of information for the model. A high valley depth index may be related to a high susceptibility to landslides due to the steep topography and abrupt relief present in the study area, which favors the occurrence of gravitational processes and increases the rate of erosion on the slopes.
  The authors should consider removing "slope" from the initial model and assess the impact on the new results
  The slope variable is not part of the model. It was only part of the initial set, and was discarded because of the correlation/IGR criterion. The final model is composed of 7 variables (Valley depth, convergence index, curvature, TWI, TPI, EVI, NDGI), which mix variables derived from the digital elevation model and others extracted from satellite images. Therefore, the slope does not affect the analysis in this work.
  I strongly urge the authors to consider the differences and similarities that contribute to the improved performance of the models used in this study
  In (Fustos, 2020) they build a model that allows predicting months of landslide occurrence, using hydrometeorological parameters. Unfortunately, being a completely different approach, it is not possible to make a comparison, since in our case we try to calculate geographic susceptibility, as an inherent characteristic of the terrain, while the other model tries to make temporal predictions using logit and probit models. On the other hand, in (Fustos, 2022), spatio-temporal models are built to calculate the probability of landslide occurrence using factors associated with rainfall and topography (slope). As in the previous case, the methodologies used make it impossible to make a comparison and establish a parallel with our work.
  Section 3.4 requires a complete rewrite. The thresholds used to define the different levels of susceptibility are currently unclear
  For the calculation of thresholds, the Jenks Breaks method will be used, which is widely used in the literature, as it is based on an optimisation algorithm that minimizes the within-class variance and maximizes the between-class variance. Attached is an example map representing the Random Forest model.
  The content of Section 4 appears to contain sentences and information that should be part of the results section. It is unclear how the authors can discuss the findings without prior presentation of the results. I suggest assessing the relevance of each model and discussing the spatial variability, taking into account additional parameters such as geology and slope. The section currently lacks strength and does not contribute new scientific insights
  To improve the discussion section, several paragraphs were moved to the results section.
  Logistic regression does not require strict assumptions about the distribution of independent variables and to model the relationship between a binary dependent variable and one or more independent variables, which may be non-linear. Although not as accurate as other models, it has the advantage of being a simpler model and easier to interpret than many machine learning models. However, it can have difficulties with large or complex data sets. As the amount of data increased, logistic regression may not have been able to effectively model the relationship between variables, resulting in a decrease in accuracy. Therefore, it performed better on the initial set, without the increase in data.
  Support Vector Machines often outperform other statistical models, such as logistic regression, in predicting landslide susceptibility. However, they can be more difficult to interpret than statistical models. While SVM can handle high-dimensional data, it can have difficulties with large datasets due to its computational complexity. In addition, if the data is noisy or overlapping, the performance of SVM may decrease. For that reason, its performance also decreases with respect to the initial dataset.
  Random Forests can capture complex and non-linear interactions between variables. In addition, they are robust to multicollinearity, i.e. they can handle situations where predictor variables are highly correlated. This algorithm is known for its ability to handle large data sets and its resistance to overfitting. By increasing the amount of data, RF may have been able to build more accurate and robust decision trees, resulting in increased accuracy
  XGBoost is an ensemble algorithm that uses the boosting technique. In the case of landslides, this can allow for capturing complex patterns in the data that other models may miss. It also provides a number of options for regularization and model fitting, which can help avoid over-fitting and improve model performance. It is also robust to multicollinearity, i.e. it can handle situations where predictor variables are highly correlated with each other. Similar to RF, XGBoost can also handle large data sets and is robust to overfitting. By having more data, XGBoost may have had more opportunities to learn and improve its performance.
  Regarding the spatial variability, an inspection of the generated susceptibility map in comparison with the geological units of the area shows that the most susceptible areas are located in the Paleozoic - Mesozoic rocks.
  **Minors comments: **
  **Line 224: Please, consider revising "contest" using a similar word.**
  The chosen word in order to replace “contest” was "competition"
  References
  * Brenning, A., Schwinn, M., Ruiz-Páez, A. P., & Muenchow, J. (2015). Landslide susceptibility near highways is increased by 1 order of magnitude in the Andes of southern Ecuador, Loja province. _Natural Hazards and Earth System Sciences_, _15_(1), 45-57.
  
  * Bueechi, E., Klimeš, J., Frey, H., Huggel, C., Strozzi, T., & Cochachin, A. (2019). Regional-scale landslide susceptibility modelling in the Cordillera Blanca, Peru—a comparison of different approaches. _Landslides_, _16_, 395-407.
  
  * Dou, J., Yunus, A. P., Merghadi, A., Shirzadi, A., Nguyen, H., Hussain, Y., ... & Yamagishi, H. (2020). Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning. _Science of the total environment_, _720_, 137320.
  
  * Fustos, I., Abarca-del-Rio, R., Moreno-Yaeger, P., & Somos-Valenzuela, M. (2020). Rainfall-Induced Landslides forecast using local precipitation and global climate indexes. _Natural Hazards_, _102_, 115-131.
  
  * Fustos-Toribio, I., Manque-Roa, N., Vásquez Antipan, D., Hermosilla Sotomayor, M., & Letelier Gonzalez, V. (2022). Rainfall-induced landslide early warning system based on corrected mesoscale numerical models: an application for the southern Andes. _Natural Hazards and Earth System Sciences_, _22_(6), 2169-2183.
  
  * Lizama, E., Morales, B., Somos-Valenzuela, M., Chen, N., & Liu, M. (2022). Understanding landslide susceptibility in Northern Chilean Patagonia: A basin-scale study using machine learning and field data. _Remote Sensing_, _14_(4), 907.
  
  * Ospina-Gutiérrez, J. P., & Aristizábal, E. (2021). Aplicación de inteligencia artificial y técnicas de aprendizaje automático para la evaluación de la susceptibilidad por movimientos en masa. _Revista Mexicana de Ciencias Geológicas_, _38_, 43-54.
  We look forward to hearing from you.
  Have a nice day
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC2
RC3:
'Comment on nhess-2023-72', Anonymous Referee #1, 14 Jul 2023

Thank you for your answers.

First statement

I acknowledge the precedents.

I acknowledge and agree that repeated cross-validation helps address the issue of data scarcity to some extent. However, susceptibility modelling assumes that the available data used for model development and validation are representative of the study area. The assumption is that the data adequately capture the spatial and temporal variations in the susceptibility factors and are sufficient to train and evaluate the model effectively. With a so limited number of landslides the only way this assumption is valid is because the area is quite small. However, in that case, the findings of the research are valid within that very small area (or within you inventory), and therefore, extremely site specific. Overall, in the case of a non-representative landslide inventory, it does not really matter which training and validation strategy you adopt, your result will be biased anyway.

Second statement

I acknowledge and agree with your points 2 and 3. Please emphasise it more in the manuscript.

I wouldn’t consider the use of repeated cv as a novelty… surely a very good strategy though.

Overall my advice is to enlarge the study area and increase the number of landslides. Please also add some more details about the landslides (occurrence, trigger…. If possible).

Another solution would be to repeat the very same approach in other similar areas (to get more transferable findings).

Citation: https://doi.org/10.5194/nhess-2023-72-RC3
- AC3: 'Reply on RC3', Francisco Parra, 24 Jul 2023
  
  Good Day!
  Regarding your concern about the lack of data for the analyisis, we followed this strategy:
  In the suggested publications, it was not possible to find a suitable methodology for this case. However, in (Dou, 2019) it is suggested that by using samples from the landslide scarp polygon, it is possible to increase the accuracy of the model. In this work, the center of the landslide body was being used to characterize the phenomenon. Therefore, to increase the dataset, it was decided to take 10 samples (pixels) within the scarp polygon of each landslide, and in the case of the non-landslide points, a polygon surrounding the initial point was created, and from this, 10 samples are taken within this polygon, which allows to multiply by 10 the studied dataset, and thus provide the statistical robustness needed in this work. Preliminary results are attached, where it is possible to appreciate that the ROC of Random Forest increased from 0.9 to 0.95. XGBoost also improves its performance, and SVM and LR do not improve.
  
  We include our preliminary results in the attached file.
  Have a nice day!
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC3
RC4:
'Comment on nhess-2023-72', Anonymous Referee #3, 28 Jul 2023

The paper compares the results of susceptibility maps in a small basin in Chile using four machine-learning methods.
The quality of the paper is fair. However, the impact is quite limited. They used a small location in Chile to train the ML models with a small sample of landslides which has been done in multiple opportunities and does not provide sufficient new knowledge for a NHESS publication. A section of the papers mentioned that these methods are pretty new in Geosciences, which I think is incorrect since extensive references and books describe the multiple applications of ML in geosciences (Line 86-87) (i.e. https://www.sciencedirect.com/science/article/pii/S0065268720300054)
Major comments:
They used a general definition of landslides, including rockfalls, flows, and falls. Then they discussed mass movements where landslides are part of them (Line 26-30). So my first question is, what kind of events have been used for the models? I think they used all of the mass movement events reported in the available database with no distinction of the nature of the events; if that is the case, do you think all the "mass movements" have the same trigger factors (passive or active)? I used the term "mass movements" here to use the same wording as the authors, but I would like to see a more precise definition of the events analyzed.
Then, you mentioned that you verified the database using "Laboratory analysis" (line 140). What kind of analysis did you run? How did you verify the location of the events reported by Sernageomin?
Then in line 147, you mentioned, "In any way, having a small amount of data, it is necessary to resort to a cross-validation process, and thus obtain an average AUC that can be compared between the models used." Can cross-validation not be done in larger datasets?
They used geology as a background of the study area, but in the discussions, I did not see anything regarding that topic. For example, according to the models, are there geological units that are more susceptible than others? Which model could be more recommended according to the area's geomorphological, climatological, and geological characteristics?
Why is geology not included in the analysis since there is variation in the units in the area? This is not explained anywhere.
The paper claims to have two novelties related to ML applications. I do not think that they are accurate. The authors should support with references this contribution to ML methods.
Then, they mentioned as a novelty the use of a series of R packages which I do not see as a new idea.
I do not understand the purpose of including Colca Valley, Indo Valley, and Colorado River Valley. This model is unlikely to work in those locations since your models have no training data from those valleys.
The literature review does not include similar work in South America, which may provide insight into the state of the art in the Andes.
Finally, assuming the locations of the events are correct. What happens with the temporality? To be consistent, the model should be trained with the characteristics of the landscape just before the events. In this work, all the events (floods, landslides, rockfalls, etc.) are treated the same, so something should describe and differentiate them in the model.
Minor comments:
Figure 2 is not cited in the document.
All the ML review on page 4 does not contribute significantly to the paper.
English grammar and style need further review.
All the acronyms should be explained (GNDVI, EVI, NDMI, etc.)

Citation: https://doi.org/10.5194/nhess-2023-72-RC4
- AC4: 'Reply on RC4', Francisco Parra, 30 Jul 2023
  
  Thanks for your revision.
  Here are the answers of your statements about this study
  The quality of the paper is fair. However, the impact is quite limited. They used a small location in Chile to train the ML models with a small sample of landslides which has been done in multiple opportunities and does not provide sufficient new knowledge for a NHESS publication. A section of the papers mentioned that these methods are pretty new in Geosciences, which I think is incorrect since extensive references and books describe the multiple applications of ML in geosciences.
  The area used in the study is 7400 km2, which is larger than most studies using this methodology. For example:
  - In (Abedini, 2019), the study area is 516.44 km².
  
  - In (Can, 2021), it is 2718.7 km².
  
  - In (Vasu, 2016) it is 5104 km².
  
  - In (Al-Najjar, 2021) it is 1879.5 km².
  
  - In (Dang, 2020) it is 101.3 km².
  Regarding the small sample of landslides, this was an issue that was also criticised by the other reviewers. To resolve this difficulty, we chose to use a data augmentation strategy described in (Dou, 2019), in which it is suggested that by using samples from the landslide scarp polygon, it is possible to increase the accuracy of the model. In this work, the center of the landslide body was being used to characterize the phenomenon. Therefore, to increase the dataset, it was decided to take 10 samples (pixels) within the scarp polygon of each landslide, and in the case of the non-landslide points, a polygon surrounding the initial point was created, and from this, 10 samples are taken within this polygon, which allows to multiply by 10 the studied dataset, and thus provide the statistical robustness needed in this work. Preliminary results are attached, where it is possible to appreciate that the ROC of Random Forest increased from 0.9 to 0.95. XGBoost also improves its performance, and SVM and LR do not improve.
  Regarding this study, at no time is it intended to point out that the novelty of the study lies in the use of ML in geosciences, understanding that there are numerous studies that address the subject in the literature. That said, what is relevant to this study that can make a significant contribution to the area, in our opinion, are the following points:
  - Unlike previous works, the data sources used in the construction of the model proposed in our article come exclusively from satellite images and digital elevation models which consider sources of information with a greater amount of data. This enables the disaster risk analysts to consider these sources of data in practice. Our approach has the advantage that it can allow the generation of systems that create susceptibility maps based on the periodic updating of the satellite images, which can contribute to the creation of a susceptibility monitoring system that can be implemented by technical agencies to implement early warning systems.
  - In this work, two features derived from satellite images are used to build the model: EVI (Enhanced vegetation index) and NDGI (normalized difference glacier index). The novelty is that they have not been used before in the literature, thus enabling the application of this source of useful data in the construction of future models derived from our approach. In addition, in our paper it is shown that a dataset such as the VD (valley depth index) is highly relevant for areas similar to the one we studied where there exist narrow and steep valleys.
  In any case, the part of line 86-87 that causes doubts will be removed from the final version, considering that the use of ML in geosciences can no longer be considered as something new.
  They used a general definition of landslides, including rockfalls, flows, and falls. Then they discussed mass movements where landslides are part of them (Line 26-30). So my first question is, what kind of events have been used for the models? I think they used all of the mass movement events reported in the available database with no distinction of the nature of the events; if that is the case, do you think all the "mass movements" have the same trigger factors (passive or active)? I used the term "mass movements" here to use the same wording as the authors, but I would like to see a more precise definition of the events analyzed.
  The landslides studied in this work correspond to detrital flows and alluviums. All of them are triggered by (active) rainfall processes. We understand that this was not fully clarified in the draft version, but we assure that this will be accounted in the final version.
  Then, you mentioned that you verified the database using "Laboratory analysis" (line 140). What kind of analysis did you run? How did you verify the location of the events reported by Sernageomin?
  The basis of the events used corresponds to the historical elaboration of a set of data compiled from events that have occurred in the country by the National Mining and Geology Service of Chile. The validation corresponds to field visits by professionals from the agency and the respective visual inspection carried out by the researchers present in the study based on the analysis of aerial images obtained from Google Earth and satellite images from the Landsat-9 campaign.
  Then in line 147, you mentioned, "In any way, having a small amount of data, it is necessary to resort to a cross-validation process, and thus obtain an average AUC that can be compared between the models used." Can cross-validation not be done in larger datasets?
  While cross-validation reinforces the possibility of performing data analysis on small datasets, it does not exclude analysis on larger datasets. In fact, with the change that was made through the data augmentation strategy, the use of this technique was maintained, obtaining excellent results in the analysis of the models.
  They used geology as a background of the study area, but in the discussions, I did not see anything regarding that topic. For example, according to the models, are there geological units that are more susceptible than others? Which model could be more recommended according to the area's geomorphological, climatological, and geological characteristics?
  The most susceptible geological formations in the study area correspond to those dated within the Palaeozoic and Mesozoic. No significant variations are observed between the corresponding models and the aforementioned features.
  Why is geology not included in the analysis since there is variation in the units in the area? This is not explained anywhere.
  The non-inclusion of geology responds to practical criteria and the need to demonstrate in the study that it is only necessary to use topographic and satellite image parameters. In addition, these characteristics are directly linked to the geological formations of the area, and considering the performance of the models from a statistical point of view, adding these characteristics could generate an overestimate and not contribute to the development of the model.
  The paper claims to have two novelties related to ML applications. I do not think that they are accurate. The authors should support with references this contribution to ML methods.
  With respect to repeated cross-validation, although its use is not usual in this type of study, there are precedents in the literature. The second statement, on the other hand, is invalid, since the Jenks Break method will be used in the construction of the maps. Therefore, the indicated paragraph will be removed from the final version, and this part will be resolved.
  Then, they mentioned as a novelty the use of a series of R packages which I do not see as a new idea.
  So far, there are no studies in the area that use the MLR3 package with an object-based orientation to perform the corresponding analyses. However, we understand that this is part of the methodology, and should not be considered as a novelty, so it will also be modified.
  I do not understand the purpose of including Colca Valley, Indo Valley, and Colorado River Valley. This model is unlikely to work in those locations since your models have no training data from those valleys.
  This is part of an observation from a superficial analysis of these areas, which share similar geomorphological characteristics. We want to make it clear that this is only a recommendation, and therefore, to validate this we would need to have training data in each valley and perform the same statistical analysis proposed in this study.
  The literature review does not include similar work in South America, which may provide insight into the state of the art in the Andes.
  The final version of the paper includes a paragraph comparing this work with others carried out in the Andes region:
  After an extensive and updated literature review, we found few publications linked to susceptibility assessment in the Andes. In this regard, we found that in (Ospina-Gutiérrez, 2021) a susceptibility mapping was performed in a different Andean area in terms of geomorphology and climate, but like our study, the most successful algorithm corresponds to Random Forest. In (Brenning, 2015) they use GAM models for the calculation of susceptibility in areas near roads, and here they note the importance of curvature, like our study, as an important factor in the calculation of susceptibility. In (Lizama, 2022), they also found the relevance in the curvature. Finally, in (Buecchi, 2019) found that they can build useful and effective landslide susceptibility maps using only the DEM of the zone, holding the results obtained in this work, which also uses satellite imagery.
  Finally, assuming the locations of the events are correct. What happens with the temporality? To be consistent, the model should be trained with the characteristics of the landscape just before the events. In this work, all the events (floods, landslides, rockfalls, etc.) are treated the same, so something should describe and differentiate them in the model.
  The types of events that are analysed in this manuscript correspond to slidings and detrital flows caused by extreme precipitation events. On the other hand, one of the principles used in works of this style corresponds to "the past is the key to understanding future processes" (Guzetti, 2012). That said, all authors construct their maps using a priori past events and measure susceptibility without considering this detail, which gives empirical validity to this idea.
  Minor comments:
  Figure 2 is not cited in the document.
  Figure 2 is quoted on line 269.
  All the ML review on page 4 does not contribute significantly to the paper.
  Following the recommendations obtained by all the referees, this part was completely transformed and reduced so as not to include information not relevant to the work.
  English grammar and style need further review.
  This detail is considered for the final delivery of the document.
  All the acronyms should be explained (GNDVI, EVI, NDMI, etc.)
  To correct this, we will include the explanation of all acronyms in their first appearance in the paper.
  ------------------------------------------------------------------------------------------------------------------------------------------------------
  We include here some corrected parts of the paper and the new results that have followed your recommendations and those of the other referees.
  Have a nice day, and we look forward to hearing from you.
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC4
CC1:
'Comment on nhess-2023-72', Albert Cabré, 30 Jul 2023

I have read other comments from colleagues, and I agree with most of them. I now want to give insights to the authors to encourage a resubmission of a future paper that addresses the weaknesses detected in their attempt to provide remote-based susceptibility maps and to their future application at regional scales in the region.
The research paper of Parra and collaborators in discussion to the NHEES journal of the EGU uses and compares the susceptibility to landslides of an arid watershed of northern Chile in the Atacama Desert. This arid region still lacks regional maps of landslide susceptibility although I immediately want to draw the attention to the authors that at least (to my knowledge) one recent research paper has addressed this https://doi.org/10.5194/nhess-20-1247-2020 in the Atacama Desert and therefore should, at least, be discussed/included.
The authors propose to apply machine learning algorithms to produce landslide susceptibility maps in an understudied region of northern Chile. This is what motivated to read the paper at first and is something that I also think is necessary. Nowadays, private companies are the ones that usually do work on this kind of risk assessment problems in Chile, but the problem is that little literature is available to the community nor to researchers from their investigations. So, initiatives such as the one the authors attempt here are valuable. However, their results although have a sound of being innovative, completely lack a throughout knowledge of hazardous geomorphologic processes that impact this region during rare rainstorm events and this will be the main point of my comments.
The manuscript navigates through technical and methodologically more or less well described steps. However, the main problem I find is that starts in the wrong point (e.g., selection of conditioning factors) from the early beginning proving this research useless as it is and proving that the design of the analysis is not correct. I need to put the focus on this immediately because authors state in (lines 15-17) ‘’ The findings of this investigation have the potential to assist in land use planning, landslide risk reduction, and informed decision making in the surrounding zones.’’ This is dangerous and any land use, risk reduction strategy or informed decision should be made using this results in the region without a further redesign of the study. Since they have stated that this is a major outcome of their research, I want to be crystal clear on that because this cannot continue down this path. Maybe authors may resubmit the paper elsewhere as it I (I hope they don’t) but it’s not rigorous because now lacks consideration to basic and fundamental findings from previous research works in this region of the Atacama Desert and therefore trains a machine learning algorithm with the wrong and thus erroneous conditioning parameters.
Their main graphical outcomes are landslide susceptibility maps that might look good for readers not working in arid zones but that completely lack any link with the reality that two recent rain events have proved and have been reviewed in https://doi.org/10.1002/2016GL069751 for the March 2015 rainstorm event and https://doi.org/10.1007/s11069-022-05707-y for the March 2022 rainstorm event which both impacted this evaluated watershed. On a rapid qualitative assessment, any of the impacted areas as the ones shown in https://doi.org/10.1007/s11069-022-05707-y are identified as susceptible in their susceptibility maps (Figs.10-13).
Surprisingly, they use only 86 sites where landslides have been reported although research papers have recently shown landslides, flash floods and other runoff related hazards in the exact same location (see https://doi.org/10.1002/2016GL069751, see https://doi.org/10.1007/s11069-022-05707-y). Future research needs to consider datasets probably as validation sites and/or conduct/apply similar methodological approaches.
Although the paper has some merit in implementing machine learning with robust statistics, the results misunderstand landslide- related geohazards that impact this region during extreme rainfall events. First, the selection of conditioning factors does not take into consideration previous research done in the southern Atacama Desert done in a ‘less’ arid watershed situated south from El Salado (see Aguilar et al., 2020) and therefore this paper would need a throughout literature review to better chose what conditioning factors. For example, Aguilar suggests including information of the colluvial and alluvial cover which is large in arid landscapes due to the lack of effective sediment removal mechanisms (only intense rains every 30 years?) capable of producing enough runoff to entrain and therefore produce i.e., debris flows.
Then, the authors use one Landsat 9 image from February 2022 from the one they decide that is just okey to do not give further detail. A recent paper addresses understanding the geomorphic change during rainfall events (including landslides, rills, etc) https://doi.org/10.1016/j.rsase.2023.100927 doing a thorough review of the use of optical imagery (with particular focus on the Landsat family) and gives enough insights to the authors to maybe use Olivares approach. However, they decided to derive optical-derived spectral indexes (NDVI, GNDVI, EVI, NDMI, BSI, NDWI, NDGI) without giving further justification nor evidence to use them in an arid region. This point is specially discouraging. First, any kind of present glacial evidence has been reported for this watershed (see https://doi.org/10.1016/j.quaint.2017.04.033). García only shows evidence to the Altiplano-Puna region (endorreic basins to the East of the studied catchment) and second, the use of vegetation indexes in this arid watershed is useless and honestly is one more evidence that the authors of this paper do not even understand where they are applying their set of patterns to automatically extract information.
The manuscript continues with some sentences that are at least confusing to me:
(Lines 389-391): ‘’NDGI uses spectral bands corresponding to green and red, so this would imply that landslide and non-landslide areas create contrast between these wavelengths. Therefore, it is suggested to use these indices in areas similar to the studied in this work.’’ This is unclear. Why would the spectral signature of an area of 86 landslides be contrasting to a whole catchment of 7400 km2? There is a problem in landslide points selection that might be overcome when looking, again, to the results from Wilcox and Cabré because they show large areas (not points) of geomorphic change after recent events and I do not believe the spectral signature would change significantly in alluvial, colluvial or flat surfaces characteristic of this studied catchment. Why? Because significant mineralogical changes would indicate processes that are not accounted in this region. If the authors were right, which they are not, differences in valleys floors, colluvial and ‘old’ surfaces would be easily recognized. Is not the case if you check the recent paper https://doi.org/10.1016/j.geomorph.2022.108504 were their surface maps rather than using spectral signatures are based on surface roughness. Their research paper cannot be overlooked when describing the geological and geomorphological setting of the study area because these flat surfaces cover large areas of the studied catchment.
(lines 431-433): ‘’the model can be expected to be suitable in areas worldwide that is a semi-arid zone, with a variable topography and a Mediterranean climate with a prolonged dry season, in addition to having narrow and deep valleys, where the maximum susceptibility is concentrated. Examples of these zones are the following:’’. Here I have to say that I was surprised by the shift to a global perspective of applying what the authors have done. To support this global idea the authors do not use very suitable examples. So as not to repeat previous comments, I wanto to highlight that all the mentioned rivers are permanent rivers with significant water discharge because they drain areas where annual rainfall amounts are significantly more important that what happens in this catchment of the Atacama.
(Lines 466-469): ‘’These findings provide valuable perspectives for informed decision-making and policy formulation in landslide-prone regions. Overall, our study highlights the potential of machine learning models, particularly SVM and RF, for accurate and reliable landslide susceptibility mapping, which can aid in identifying high-risk areas and implementing effective mitigation strategies, which is useful for stakeholders and land-planning authorities.’’. It is probably not the aim behind a discussion in a peer reviewed journal, but I would like to suggest to the authors that this study need to redesigned from its basics.
My recommendation is to reject the paper as it is.

Citation: https://doi.org/10.5194/nhess-2023-72-CC1
- AC5:
  'Reply on CC1', Francisco Parra, 01 Aug 2023
  
  Good day!
  We deeply appreciate your constructive criticism and the opportunity to clarify certain aspects of our work. We would like to address your concerns as follows:
  The manuscript navigates through technical and methodologically more or less well described steps. However, the main problem I find is that starts in the wrong point (e.g., selection of conditioning factors) from the early beginning proving this research useless as it is and proving that the design of the analysis is not correct. I need to put the focus on this immediately because authors state in (lines 15-17) ‘’ _The findings of this investigation have the potential to assist in land use planning, landslide risk reduction, and informed decision making in the surrounding zones._’’ This is dangerous and any land use, risk reduction strategy or informed decision should be made using this results in the region without a further redesign of the study. Since they have stated that this is a major outcome of their research, I want to be crystal clear on that because this cannot continue down this path. Maybe authors may resubmit the paper elsewhere as it I (I hope they don’t) but it’s not rigorous because now lacks consideration to basic and fundamental findings from previous research works in this region of the Atacama Desert and therefore trains a machine learning algorithm with the wrong and thus erroneous conditioning parameters.
  Reference to previous work: We appreciate your mention of previous research in the Atacama Desert. Our focus is on the application of machine learning algorithms for the generation of landslide susceptibility maps, which may differ from traditional studies. However, we recognise the importance of previous research and commit to include and discuss the paper you mentioned in our review of the manuscript.
  Understanding hazardous geomorphological processes: While we recognise the importance of understanding the hazardous geomorphological processes that impact the region during rare storm events, we believe that our data-driven approach can provide valuable and complementary insight to more traditional geology- and geomorphology-based approaches. Our approach allows us to identify patterns and relationships that may not be evident through direct observation and geological interpretation.
  Selection of conditioning factors: The selection of conditioning factors was based on the Information Gain Ratio (IGR) technique, which quantifies the predictive power of the contributing elements. We selected the most relevant elements among the 22 contributors based on this methodology. While we understand your concern about the starting point of our research, we defend our selection of conditioning factors based on the validity of the IGR methodology. This methodology is widely used in data science and has proven to be effective in selecting the most relevant factors in a variety of contexts. In addition, we use Pearson's correlation to eliminate factors that have a high rate of correlation with each other, which is standard practice in data science to reduce multicollinearity and improve the accuracy of models. Furthermore, it is important to note that many machine learning models have interpretability issues ("the black box problem"), a condition that is generally accepted in the field.
  Analysis design: We understand your concern about the design of our analysis. Our intention was to apply machine learning algorithms to produce landslide susceptibility maps, and we used four different models (Random Forest, Support Vector Machine, XGBoost and Logistic Regression) for this purpose. We compared their performance and found that the RF model obtained the highest AUC indices. While we acknowledge that there may be room for improvement in our methodology, we defend our approach based on best practices in data science. The four models we use are widely recognised for their effectiveness in a variety of machine learning tasks, and cross-validation is standard practice for assessing model accuracy.
  Use of results for decision-making: Our results are intended to contribute to the body of knowledge and provide a starting point for future research. We do not suggest that our results be used for decision-making without further redesign and validation. However, we believe that our results may be useful to inform future research and modelling efforts in this area.
  Their main graphical outcomes are landslide susceptibility maps that might look good for readers not working in arid zones but that completely lack any link with the reality that two recent rain events have proved and have been reviewed in https://doi.org/10.1002/2016GL069751 for the March 2015 rainstorm event and https://doi.org/10.1007/s11069-022-05707-y for the March 2022 rainstorm event which both impacted this evaluated watershed. On a rapid qualitative assessment, any of the impacted areas as the ones shown in https://doi.org/10.1007/s11069-022-05707-y are identified as susceptible in their susceptibility maps (Figs.10-13).
  We understand the importance of these maps accurately reflecting the areas that are actually susceptible to landslides, especially in the context of recent rainfall events that have impacted the watershed we are studying. We appreciate your comments and the opportunity to clarify this point.
  In response to your comment, first of all, we would like to point out that we plan to update our susceptibility maps in the corrected version of our work, which will be submitted after 4 August. These updates will be based on some adjustments to the data we use, as well as a data augmentation strategy, which we hope will improve the accuracy of our maps.
  Our susceptibility maps are based on a combination of factors, including historical landslide data and a number of environmental variables that have been identified as important predictors of susceptibility. These factors are combined using machine learning algorithms to generate the susceptibility maps, which were selected and applied with the aim of maximising the predictive capacity of our models, which, according to our research and analysis, are relevant to the occurrence of this phenomenon in the Atacama Desert region. However, we understand that no model can perfectly capture reality, and there is always room for improvement.
  Regarding your observation that none of the impacted areas shown on https://doi.org/10.1007/s11069-022-05707-y are identified as susceptible on our susceptibility maps, we would like to point out that our susceptibility maps are not intended to be an accurate representation of the specific locations where landslides occurred in past events. Instead, they are intended to provide a general indication of the areas that are most susceptible to landslides, based on a combination of factors. We understand your concern that our maps may not fully reflect the reality of the areas affected by recent rainfall events in 2015 and 2022, as documented in the studies you mentioned. However, we would like to point out that these maps are a probabilistic representation of landslide susceptibility based on the data and conditioning factors we have considered. They are not intended to be an accurate representation of every landslide that has occurred or could occur in the future.
  In addition, we would like to clarify a point that we find curious in your comment. You mention that none of the impacted areas shown in that publication are identified as susceptible on our susceptibility maps. However, in reviewing the publication, we note that it does not present landslide susceptibility maps. Figure 1 of that publication shows coherence and precipitation parameters, but does not provide information on landslide susceptibility. We would therefore like to better understand the basis for your observation and would welcome any additional clarification you can provide.
  Despite these limitations, we believe that our maps provide a valuable contribution to the understanding of landslide susceptibility in the Atacama Desert region. We will continue to refine our models and maps as more data become available and as more research is conducted in this region.
  Surprisingly, they use only 86 sites where landslides have been reported although research papers have recently shown landslides, flash floods and other runoff related hazards in the exact same location (see https://doi.org/10.1002/2016GL069751, see https://doi.org/10.1007/s11069-022-05707-y). Future research needs to consider datasets probably as validation sites and/or conduct/apply similar methodological approaches.
  
  We would like to clarify that the 86 landslide sites used were selected based on data availability and reliability of information sources.
  It is important to note that although there are other studies that have reported landslides, flash floods and other runoff-related hazards in the same location, these studies may have used different methodologies and criteria to identify and classify these events. Therefore, it is not always possible or appropriate to directly combine these datasets with our own.
  That said, we agree that the inclusion of more landslide sites could improve the robustness and representativeness of our model. In response to your suggestion and those of other reviewers, we have implemented a data augmentation strategy to improve the representativeness of our dataset. As a result, we have updated our landslide susceptibility map, which now reflects in greater detail the distribution and frequency of landslides in the Atacama Desert region.
  The updated map is attached here for your review. We welcome your comments and suggestions, and hope that this update addresses your concerns.
  Although the paper has some merit in implementing machine learning with robust statistics, the results misunderstand landslide-related geohazards that impact this region during extreme rainfall events. First, the selection of conditioning factors does not take into consideration previous research done in the southern Atacama Desert done in a ‘less’ arid watershed situated south from El Salado (see Aguilar et al., 2020) and therefore this paper would need a throughout literature review to better chose what conditioning factors. For example, Aguilar suggests including information of the colluvial and alluvial cover which is large in arid landscapes due to the lack of effective sediment removal mechanisms (only intense rains every 30 years?) capable of producing enough runoff to entrain and therefore produce i.e., debris flows.
  Understanding that there are many factors that affect the generation of landslides, which are described in the literature shown in this review, the construction of the models presented responds to a different methodology than the one proposed here, in the sense that machine learning seeks to make predictions.
  The factors chosen for the study are more than twenty that are used in the literature on ML susceptibility models, including in hyper-arid zones, such as the one studied. The seven parameters that were retained respond to the IGR methodology and the elimination of factors using the correlation criterion. Furthermore, the use of criteria such as the ROC curve validates the effectiveness of the model.
  We understand that the non-inclusion of certain factors mentioned in the critique may give the impression that our model is contradictory to the existing literature. However, we would like to emphasise that the absence of these factors does not necessarily imply that the processes they represent are being ignored. It is possible that these factors are correlated with the factors we have included in our model, and therefore their influence may be indirectly represented.
  Then, the authors use one Landsat 9 image from February 2022 from the one they decide that is just okey to do not give further detail. A recent paper addresses understanding the geomorphic change during rainfall events (including landslides, rills, etc) https://doi.org/10.1016/j.rsase.2023.100927 doing a thorough review of the use of optical imagery (with particular focus on the Landsat family) and gives enough insights to the authors to maybe use Olivares approach. However, they decided to derive optical-derived spectral indexes (NDVI, GNDVI, EVI, NDMI, BSI, NDWI, NDGI) without giving further justification nor evidence to use them in an arid region. This point is specially discouraging. First, any kind of present glacial evidence has been reported for this watershed (see https://doi.org/10.1016/j.quaint.2017.04.033). García only shows evidence to the Altiplano-Puna region (endorreic basins to the East of the studied catchment) and second, the use of vegetation indexes in this arid watershed is useless and honestly is one more evidence that the authors of this paper do not even understand where they are applying their set of patterns to automatically extract information.
  On the first point, it is important to clarify that the choice of a single Landsat 9 image was based on the availability and quality of data at the time of the study. While we recognise the importance of using multiple images to capture temporal variability, we also believe that a single image can provide valuable information, especially when combined with other data sources and used in conjunction with machine learning algorithms. In addition, we would like to clarify that, although the studied catchment is arid and vegetation is sparse, vegetation indices can provide valuable information that contributes to our model.
  Regarding the second point, we appreciate the suggestion to consider Olivares' approach. However, we would like to emphasise that the choice of optical spectral indices was based on the Information Gain Ratio (IGR) methodology, which has proven to be useful in similar studies. Through this method, we found that vegetation indices, despite the aridity of the region, provide significant information and contribute to the accuracy of our model. While we understand that the use of vegetation indices may seem inappropriate in an arid region, these indices not only capture vegetation, but can also indicate soil characteristics and environmental conditions that may influence landslide susceptibility. In addition, the presence of vegetation, although sparse, can be an important indicator of soil and water conditions, which are key factors in landslide occurrence.
  Finally, we would like to emphasise that our main objective was to explore the potential of machine learning algorithms for landslide prediction in an understudied region. While we acknowledge that our study has limitations and that there is room for improvement and refinement of our methodology, we believe that our findings provide a valuable basis for future research in this area.
  (Lines 389-391): ‘_’NDGI uses spectral bands corresponding to green and red, so this would imply that landslide and non-landslide areas create contrast between these wavelengths. Therefore, it is suggested to use these indices in areas similar to the studied in this work._’’ This is unclear. Why would the spectral signature of an area of 86 landslides be contrasting to a whole catchment of 7400 km2? There is a problem in landslide points selection that might be overcome when looking, again, to the results from Wilcox and Cabré because they show large areas (not points) of geomorphic change after recent events and I do not believe the spectral signature would change significantly in alluvial, colluvial or flat surfaces characteristic of this studied catchment. Why? Because significant mineralogical changes would indicate processes that are not accounted in this region. If the authors were right, which they are not, differences in valleys floors, colluvial and ‘old’ surfaces would be easily recognized. Is not the case if you check the recent paper https://doi.org/10.1016/j.geomorph.2022.108504 were their surface maps rather than using spectral signatures are based on surface roughness. Their research paper cannot be overlooked when describing the geological and geomorphological setting of the study area because these flat surfaces cover large areas of the studied catchment.
  We would like to clarify that the usefulness of NDGI in our study is not based on the assumption that landslide and non-landslide areas will have contrasting spectral signatures throughout the basin. Instead, NDGI is used as one of several features in our machine learning model, which takes into account the interaction of multiple factors in making its predictions.
  As for landslide point selection, we agree that point selection is a critical aspect of our study. Therefore, we chose to use a strategy that considers landslides as polygons, in order to rescue a larger number of features.
  Finally, regarding the claim that spectral signatures would not change significantly on alluvial, colluvial or flat surfaces, we would like to point out that while it is true that these types of surfaces may have similar spectral signatures, they may also exhibit subtle variations that can be captured by spectral indices and may be relevant for landslide prediction. We believe that our findings provide a valuable basis for future research in this area.
  (lines 431-433): ‘’_the model can be expected to be suitable in areas worldwide that is a semi-arid zone, with a variable topography and a Mediterranean climate with a prolonged dry season, in addition to having narrow and deep valleys, where the maximum susceptibility is concentrated. Examples of these zones are the following:’’. _Here I have to say that I was surprised by the shift to a global perspective of applying what the authors have done. To support this global idea the authors do not use very suitable examples. So as not to repeat previous comments, I wanto to highlight that all the mentioned rivers are permanent rivers with significant water discharge because they drain areas where annual rainfall amounts are significantly more important that what happens in this catchment of the Atacama.
  Our intention in suggesting that our model could be applicable in other semi-arid areas around the world was not to suggest that the specific results of our study would be directly transferable to these areas. Instead, our intention was to highlight that the general methodology we have used - that is, the use of machine learning algorithms to analyse a variety of conditioning factors and produce landslide susceptibility maps - could be useful in these areas.
  We recognise that each region has its own unique characteristics and challenges, and that any application of our methodology to a new region would require careful consideration of these factors. In particular, we agree that differences in climatic and hydrological conditions, such as those mentioned in the comments, would be important factors to consider. Here, the factor taken into account for the comparison was the topographical criteria.
  As for the specific examples we mentioned, we appreciate the feedback and recognise that we could have made a better choice. Our intention was simply to provide some examples of the types of regions that could benefit from an approach similar to ours, but we understand that these examples may have been confusing or misleading. In the future, we will strive to provide clearer and more relevant examples.
  (Lines 466-469): ‘_’These findings provide valuable perspectives for informed decision-making and policy formulation in landslide-prone regions. Overall, our study highlights the potential of machine learning models, particularly SVM and RF, for accurate and reliable landslide susceptibility mapping, which can aid in identifying high-risk areas and implementing effective mitigation strategies, which is useful for stakeholders and land-planning authorities_.’’. It is probably not the aim behind a discussion in a peer reviewed journal, but I would like to suggest to the authors that this study need to redesigned from its basics.
  We would like to clarify that our aim in conducting this study was not to provide a definitive solution to landslide problems in the region, but rather to explore the potential of machine learning models to assist in the identification of high-risk areas.
  Our study has limitations and that there is room for improvement and refinement of our methodology. However, we believe that our findings provide a valuable basis for future research in this area. In particular, our results highlight the importance of considering a variety of conditioning factors in assessing landslide susceptibility, and demonstrate the potential of machine learning models to analyse these complex interactions.
  Regarding the suggestion that our study needs to be redesigned from the ground up, we would like to point out that we are open to feedback and suggestions for improving our work. However, we also believe it is important to recognise that science is an iterative process and that each study contributes to our collective understanding of a problem, even if that study has limitations or leaves unanswered questions.
  Once again, we welcome constructive comments and will take them into account in the future.
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC5
  - CC3: 'Reply on AC5', Albert Cabré, 02 Aug 2023
    
    Dear Francisco and coauthors,
    You are welcome.
    Your work will have to satisfy all the requirements of the various specialists if it is you aim to provide a valuable basis for future research in this area ''. This means that if you select the conditioning parameters that then you will "ask" your Information Gain Ratio (IGR) technique you need to choose the ones that are most relevant to landslides in arid regions. You will have to be sure that you are providing IGR with the most suitable ones and this means that you have to rely on previous experience in arid areas. Luckily, your study area has them. Therefore, I suggest you redefine your study based on, for example, my suggestions provided above (previous comment 30-07-23). Then, you might be able to produce realistic landslide susceptibility maps using the proposed methods.
    This may not be the expected outcome and this can be frustrating, but journals like NHEES give us a good opportunity to learn and to stimulate scientific debate that would be difficult to have in "classical" academic settings where it is difficult to share specialist knowledge far from our experience. I expect to see how machine learning can be integrated but I needs to have a more robust starting point. Having said that, I would like to thank the authors for their fast response and also for their commitment to provide us with new figures. However, in order to to help them in future resubmissions I am commenting to some of their comments in the attached pdf file.
    
    Citation: https://doi.org/10.5194/nhess-2023-72-CC3
    
    AC6: 'Reply on CC3', Francisco Parra, 04 Aug 2023
    
    Thanks again for your response and the positive feedback.
    Here are the answers to the PDF you sent.
    No previous work attempted in the region can be considered traditional since they explore for the first time what are (i) the conditioning factors to be used in this arid region (year 2020, NHEES) and, (ii) are based on very novel remote sensing applications of SAR C-band satellites which are open access and I strongly recommend the authors to consider in future resubmissions (2020, ESPL; 2020; Natural Hazards). It’s a close call since I am the first author, but you can also rely on other works using similar approaches (Castellanezzi et al., 2023 in Australia; Botey i Bassols et al., 2023 in the Salar de Atacama; Olen and Bookhagen, 2020 in Argentina among other references therein that you might fing useful too). There is also the work of Olivares et al. (2023) as I discussed before which might also be a good starting point to your goal of defining susceptibility maps in the Atacama.
    You are right that these recent studies cannot be considered "traditional" in the area and we agree that they constitute an important starting point for our objective of defining susceptibility maps in the Atacama region. Therefore, in the final submission of this paper we undertake to review in detail these references you mention, to contrast them with our work, and to find the reason for the discrepancies you point out.
    The authors now claim to identify patterns and relationships. I have navigated the paper throughout and any physical processes, nor landslides triggering analysis is presented in the sense of giving clues about either the triggering or the possible feed-backs between the studied landscape features. If when authors claim ‘patterns’ they speak about areas under high susceptibility, again, previous work in the area proves them wrong. At least authors should discuss why their maps show differences with available literature in the area.
    Admittedly, we do not present a detailed analysis of physical processes that trigger landslides in the region. Our approach focuses on identifying patterns and relationships using machine learning techniques, but we understand that this does not replace the analysis of such physical processes, although the data-driven approach has certainly proven successful in multiple studies in the area.
    When we refer to identifying "patterns", we mean patterns in the data that allow the model to predict areas of greater or lesser susceptibility to landslides. We recognise that our maps may differ from the available literature on susceptibility in the area.
    It is not under discussion the utility of IGR or any other predictive technique. The problem, as I mention in my previous comment, is that this investigation starts from the wrong selection of conditioning factors by the authors. This selection was done by (I guess) a literature review conducted by the authors and there is where all the problems that this research has starts. You need to better understand, probably by doing a more thorough literature review (I just gave you some references in my previous comment), that studying arid regions is not a novel area and therefore it can be considered a traditional topic in geomorphology (see literature in the Sonoran Desert at the USA 1950’s, etc). I believe a deeper review can be done here if the aim of the authors is to provide the basis of future research of susceptibility to landslides in arid regions since they mention ‘’ we believe that our findings provide a valuable basis for future research in this area. ‘’ many times in their response to my previous comment. This paper now has a lot of conditioning factors selected under unclear criteria (not well justified or cited) and which have been proven wrong in recent and traditional literature in the region and in other arid regions of the world. This needs to be amended.
    While we acknowledge that our initial literature review could have been more comprehensive, we respectfully disagree that the selection of conditioning factors was flawed. The selection was made on the basis of 22 factors used in the literature on modeling susceptibility to landslides, including in arid areas such as the one studied.
    The 7 retained parameters respond to the IGR methodology and the elimination of factors using the correlation criterion, both standard procedures in data science. While we understand your concern, we defend our selection of factors on the basis of the validity of these methodologies.
    On the other hand, model effectiveness is validated by criteria such as the ROC curve, with values above 0.9 for all models. We believe that this demonstrates the predictive ability of the selected factors in combination with the applied machine learning algorithms. Furthermore, we insist on the "black box" effect in many Machine Learning models (Goetz, 2015), so it is normal to have factors whose action in the model is not easy to explain. However, there are factors that can be correlated with others if commented in the literature, and thus be "replaced" for the construction of the model.
    With regard to spectral indices, there are precedents in the literature regarding their use in arid zones. In (Kumar, 2023), NDVI is used as a conditioning factor in an arid region in Peru. In (Nhu, 2020) it is used for an arid region in Iran. In (Were, 2023) NDVI was used for an arid region in Kenya to calculate gully erosion susceptibility. In (Yaseen, 2022) NDVI is used to map flash floods in arid regions.
    In your comments you state that spectral indices cannot be used in arid or hyper-arid areas. However, you do not provide any evidence for this claim, so we would be very grateful if you could do so.
    You give a dataset which relies on observations done near the main roads, villages, mining districts. This is typical for this region and can be a limitation. However, many authors have done efforts to overcome this by providing regional observations (see Wilcox et al., 2016; Tapia et al., 2018; Cabré et al. 2022). The combination with other datasets is possible because although using hydraulic and remote sensing methods, the works from Wilcox and Cabré use a lot of field data (see figure 2 in Cabré as an example).
    Thank you very much. We are currently doing new work in the area related to landslides that incorporates the spatio-temporal nature of the problem, so the comparisons you make are sure to be very useful in that characterisation.
    Thank you for the attached figure of susceptibility map. I believe that authors may have come to a similar map by only doing a sort of slope thresholded map here. This makes me refer the authors to another work of myself (https://doi.org/10.1002/esp.4868) where we clearly showed (only 70kms south) that slope is not controlling rainfall-triggered hazards in this region. This might sound contra intuitive for unaware readers, but gentle surfaces are the ones that remarkably are more impacted during rainstorm events in this region. This can be explained because gentle slopes allow a great and better development of alluvial cover and thus make sediment available at any storm.
    It should be noted that slope is not being used in the model. Furthermore, our susceptibility model constructs a static view of the probability of occurrence of the phenomenon, which seeks to find the most likely "starting" point for a landslide event. All the literature shown as evidence characterises the flows as a whole, and are mostly associated with the temporal evolution of a detrital flow acquiring volume. This has nothing to do with the exercise we are carrying out, since in our study the points that are labelled correspond to the landslide scarp. Therefore, the pre-existing characterisations will never coincide with the ones we have made.
    On the other hand, regarding your assertion that the map produced with our methodology could be generated using a slope map with thresholds, we would like to respectfully disagree. As proof of this, we attach in PDF the slope map and another comparison map (using the same Jenks Breaks method), in which clearly different characteristics can be appreciated in each of them. In particular, the susceptibility map marks many low slope areas as susceptible.
    Many thanks again.
    References
    - Goetz, J. N., Brenning, A., Petschko, H., & Leopold, P. (2015). Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. _Computers & geosciences_, _81_, 1-11.
    - Kumar, C., Walton, G., Santi, P., & Luza, C. (2023). An Ensemble Approach of Feature Selection and Machine Learning Models for Regional Landslide Susceptibility Mapping in the Arid Mountainous Terrain of Southern Peru. _Remote Sensing_, _15_(5), 1376.
    
    - Nhu, V. H., Shirzadi, A., Shahabi, H., Chen, W., Clague, J. J., Geertsema, M., ... & Lee, S. (2020). Shallow landslide susceptibility mapping by random forest base classifier and its ensembles in a semi-arid region of Iran. _Forests_, _11_(4), 421.
    
    - Yaseen, A., Lu, J., & Chen, X. (2022). Flood susceptibility mapping in an arid region of Pakistan through ensemble machine learning model. _Stochastic Environmental Research and Risk Assessment_, _36_(10), 3041-3061.
    
    Citation: https://doi.org/10.5194/nhess-2023-72-AC6

Status: closed

RC1:
'Comment on nhess-2023-72', Anonymous Referee #1, 12 Jul 2023
This study evaluates the effectiveness of machine learning algorithms in predicting landslide susceptibility in the Chañaral province of Chile. The researchers used SVM, RF, XGBoost, and logistic regression algorithms and compared their results. They identified 86 landslide sites and randomly selected another 86 non-landslide sites for analysis. Overall, the research highlights the potential of machine learning in landslide susceptibility modeling.
I believe the work must be rejected for the following reasons:
This study faces limitations due to the relatively small dataset consisting of only 86 landslide sites and 86 non-landslide sites. This sample size is not sufficient for effective training, tuning, and evaluation of the machine learning models, especially considering the complexity introduced by the 25 variables.

The research lacks clarity regarding its novelty. The study does not explicitly highlight any novel approaches, methodologies, or findings that distinguish it from existing research.
Citation: https://doi.org/10.5194/nhess-2023-72-RC1
- AC1: 'Reply on RC1', Francisco Parra, 14 Jul 2023
  
  Thank you for your feedback. Here are our answers to the previous two statements.
  First Statement
  In terms of disaster risk management, for a predictive model to be useful in practice to a given territory in the absence of wide sensors networks, the ability to produce reasonable results from small training datasets is of paramount importance.
  Most of the existing works in the literature use data sets of similar size to the one presented in our paper. This, considering that for this type of studies the cost and difficulty of obtaining the data is very high. Therefore, the analysis performed considers data science techniques for small datasets, such as repeated cross-validation, which allows to decrease the models' overfitting bias.
  Furthermore, the models built in this work used only 7 variables (the initial 25 variables were proposed as candidates and then discarded using techniques derived from information theory and correlation). Therefore, we believe that the analysis performed with this dataset effectively performs the training, tuning and evaluation of machine learning models. Examples of such work are as follows:
  - In (Achu, 2023), 82 locations and 11 conditioning factors were used.
  - In (Al-Najjar, 2021) 156 locations and 16 conditioning factors were used.
  - In (Al-Najjar, 2021b), 35 locations and 11 conditioning factors were used.
  - In (Alqadhi, 2021), 50 locations and 12 conditioning factors were used.
  - In (Arabameri, 2020), 249 locations and 16 conditioning factors were used.
  - In (Arabameri, 2022), 240 locations and 18 conditioning factors were used.
  - In (Abedini, 2018), 60 locations and 8 conditioning factors were used
  - In (Chen, 2017), 288 locations and 18 conditioning factors were used
  - In (Chen, 2020) 209 locations and 16 conditioning factors were used
  - In (Deng, 2022) 155 locations and 12 conditioning factors were used
  - In (Hong, 2018), 237 locations and 15 conditioning factors were used.
  - In (Hong, 2020), 79 locations and 14 conditioning factors were used.
  - In (Hu, 2021), 114 locations and 10 conditioning factors were used.
  - In (Hussain, 2022) 94 locations and 9 conditioning factors were used.
  - In (Mehrabi, 2021) 92 locations and 13 conditioning factors were used.
  - In (Nhu, 2020), 152 locations and 17 conditioning factors were used.
  - In (Nirbhav, 2023) 54 locations and 9 conditioning factors were used.
  - In (Nurwatik, 2022) 176 locations and 12 conditioning factors were used.
  - In (Pham, 2020), 167 locations and 12 conditioning factors were used.
  - In (Sahin, 2020) 105 removal locations and 15 conditioning factors were used.
  - In (Saha, 2021) 91 locations and 21 conditioning factors were used.
  - In (Shahabi, 2022) 64 locations and 14 conditioning factors were used.
  - In (Tsangaratos, 2017), 112 locations and 11 conditioning factors were used
  - In (Vasu, 2016), 163 locations and 13 conditioning factors were used
  - In (Wu, 2020) 171 locations and 11 conditioning factors were used.
  - In (Youssef, 2022), 243 locations and 12 conditioning factors were used.
  In our case, only 86 locations and only 7 conditioning factors were used (which reduces the complexity of the model).
  Second statement
  From our point of view, this study presents three novelties that could distinguish it from the rest of the literature.
  - The approach in this work for the evaluation of the models is through a repeated cross-validation, unlike the vast majority of the works present in the literature, in which they separate the data set into an arbitrary percentage and perform the calculation of the metrics once, instead of creating a statistical distribution, as is done in the experiments section of our article, thus providing a more complete verification of the results obtained.
  - The factors used in the construction of the model proposed in our article come exclusively from satellite images and digital elevation models, unlike other studies, which consider sources of information with a greater number of data and are therefore more difficult for disaster risk management analysts to apply in practice. The approach has the advantage that it can allow the generation of systems that create susceptibility maps based on the periodic updating of satellite images, which can contribute to the creation of a susceptibility monitoring system that can be implemented by technical agencies in the disaster area.
  - In this work, two factors derived from satellite images are used to build the model: EVI (Enhanced vegetation index) and NDGI (normalized difference glacier index). The novelty is that they have not been used before in the literature, thus providing new factors that can be considered for the construction of future models. In addition, it is shown that a factor such as the VD (valley depth index) is highly relevant for areas similar to the one studied, with narrow and steep valleys.
  References
  - Achu, A. L., Thomas, J., Aju, C. D., Remani, P. K., & Gopinath, G. (2023). Performance evaluation of machine learning and statistical techniques for modelling landslide susceptibility with limited field data. _Earth Science Informatics_, _16_(1), 1025-1039.
  - Al-Najjar, H. A., & Pradhan, B. (2021). Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks. _Geoscience Frontiers_, _12_(2), 625-637.
  - Al-Najjar, H. A., Pradhan, B., Kalantar, B., Sameen, M. I., Santosh, M., & Alamri, A. (2021). Landslide susceptibility modeling: An integrated novel method based on machine learning feature transformation. _Remote Sensing_, _13_(16), 3281.
  - Alqadhi, S., Mallick, J., Talukdar, S., Bindajam, A. A., Saha, T. K., Ahmed, M., & Khan, R. A. (2022). Combining logistic regression-based hybrid optimized machine learning algorithms with sensitivity analysis to achieve robust landslide susceptibility mapping. _Geocarto International_, _37_(25), 9518-9543.
  - Arabameri, A., Saha, S., Roy, J., Chen, W., Blaschke, T., & Tien Bui, D. (2020). Landslide susceptibility evaluation and management using different machine learning methods in the Gallicash River Watershed, Iran. _Remote Sensing_, _12_(3), 475.
  - Arabameri, A., Chandra Pal, S., Rezaie, F., Chakrabortty, R., Saha, A., Blaschke, T., ... & Thi Ngo, P. T. (2022). Decision tree based ensemble machine learning approaches for landslide susceptibility mapping. _Geocarto International_, _37_(16), 4594-4627.
  - Abedini, M., Ghasemian, B., Shirzadi, A., & Bui, D. T. (2019). A comparative study of support vector machine and logistic model tree classifiers for shallow landslide susceptibility modeling. _Environmental Earth Sciences_, _78_, 1-15.
  - Chen, W., Shirzadi, A., Shahabi, H., Ahmad, B. B., Zhang, S., Hong, H., & Zhang, N. (2017). A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. _Geomatics, Natural Hazards and Risk_, _8_(2), 1955-1977.
  - Chen, W., & Li, Y. (2020). GIS-based evaluation of landslide susceptibility using hybrid computational intelligence models. _Catena_, _195_, 104777.
  - Deng, N., Li, Y., Ma, J., Shahabi, H., Hashim, M., de Oliveira, G., & Chaeikar, S. S. (2022). A comparative study for landslide susceptibility assessment using machine learning algorithms based on grid unit and slope unit. _Frontiers in Environmental Science_, _10_, 1009433.
  - Hong, H., Liu, J., Bui, D. T., Pradhan, B., Acharya, T. D., Pham, B. T., ... & Ahmad, B. B. (2018). Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China). _Catena_, _163_, 399-413.
  - Hong, H., Liu, J., & Zhu, A. X. (2020). Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. _Science of the total environment_, _718_, 137231.
  - Hu, H., Wang, C., Liang, Z., Gao, R., & Li, B. (2021). Exploring Complementary Models Consisting of Machine Learning Algorithms for Landslide Susceptibility Mapping. _ISPRS International Journal of Geo-Information_, _10_(10), 639.
  - Hussain, M. A., Chen, Z., Wang, R., Shah, S. U., Shoaib, M., Ali, N., ... & Ma, C. (2022). Landslide susceptibility mapping using machine learning algorithm. _Civ. Eng. J_, _8_, 209-224.
  - Mehrabi, M. (2021). Landslide susceptibility zonation using statistical and machine learning approaches in Northern Lecco, Italy. _Natural Hazards_, 1-37.
  - Nhu, V. H., Mohammadi, A., Shahabi, H., Ahmad, B. B., Al-Ansari, N., Shirzadi, A., ... & Nguyen, H. (2020). Landslide susceptibility mapping using machine learning algorithms and remote sensing data in a tropical environment. _International journal of environmental research and public health_, _17_(14), 4933.
  - Nirbhav, Malik, A., Maheshwar, Prasad, M., Saini, A., & Long, N. T. (2023). A comparative study of different machine learning models for landslide susceptibility prediction: a case study of Kullu-to-Rohtang pass transport corridor, India. _Environmental Earth Sciences_, _82_(7), 167.
  - Nurwatik, N., Ummah, M. H., Cahyono, A. B., Darminto, M. R., & Hong, J. H. (2022). A Comparison Study of Landslide Susceptibility Spatial Modeling Using Machine Learning. _ISPRS International Journal of Geo-Information_, _11_(12), 602.
  - Pham, B. T., Nguyen-Thoi, T., Qi, C., Van Phong, T., Dou, J., Ho, L. S., ... & Prakash, I. (2020). Coupling RBF neural network with ensemble learning techniques for landslide susceptibility mapping. _Catena_, _195_, 104805.
  - Saha, S., Roy, J., Hembram, T. K., Pradhan, B., Dikshit, A., Abdul Maulud, K. N., & Alamri, A. M. (2021). Comparison between deep learning and tree-based machine learning approaches for landslide susceptibility mapping. _Water_, _13_(19), 2664.
  - Sahin, E. K. (2020). Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. _SN Applied Sciences_, _2_(7), 1308.
  - Shahabi, H., Ahmadi, R., Alizadeh, M., Hashim, M., Al-Ansari, N., Shirzadi, A., ... & Ariffin, E. H. (2023). Landslide Susceptibility Mapping in a Mountainous Area Using Machine Learning Algorithms. _Remote Sensing_, _15_(12), 3112.
  - Tsangaratos, P., Ilia, I., Hong, H., Chen, W., & Xu, C. (2017). Applying Information Theory and GIS-based quantitative methods to produce landslide susceptibility maps in Nancheng County, China. _Landslides_, _14_, 1091-1111.
  - Vasu, N. N., & Lee, S. R. (2016). A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon, South Korea. _Geomorphology_, _263_, 50-70.
  - Wu, Y., Ke, Y., Chen, Z., Liang, S., Zhao, H., & Hong, H. (2020). Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. _Catena_, _187_, 104396.
  - Youssef, A. M., & Pourghasemi, H. R. (2021). Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin, Asir Region, Saudi Arabia. _Geoscience Frontiers_, _12_(2), 639-655.
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC1
RC2:
'Comment on nhess-2023-72', Anonymous Referee #2, 13 Jul 2023
The manuscript aims to assess landslide susceptibility using a limited database and machine learning approach. Previous studies have identified a gap in landslide research throughout South America (https://doi.org/10.5194/nhess-15-1821-2015), and this manuscript seeks to contribute to filling that gap by providing alternatives to landslide susceptibility. I encourage the authors to include studies over the Andes region that support their study.
The contribution can potentially become an important resource for the landslide community. However, I believe the manuscript is poorly written, exhibiting shortcomings and over-explanation throughout the document. Additionally, the authors claim to identify novelty without effectively addressing any knowledge gaps. I have identified several major comments the manuscript should address to improve its effectiveness and readiness for future review. I have the following concerns that the authors must correct/modify indemediately

The documents has an extension oversized, showing irrelevant information of data. Moreover, the new science is not found. Preliminary studies in Andes tried to evaluate susceptibility changes under different precipitation scenarios (https://doi.org/10.3390/w15142514 and https://doi.org/10.1016/j.jsames.2022.103824). I encourage to the authors to review new knowledge that their results could contribute to Central Andes.

Mainly, the introduction lacks clarity regarding the research question and fails to adequately address the research gap. It is unclear whether the authors focus on Rainfall-Induced Landslides or Earthquake-Induced Landslides. This distinction becomes crucial in supporting the use of different data in the machine learning models.

The zone in question is characterized by a limited landslide catalog (please state that you consider earthquake or rainfall induced landslides, not both), which is a common challenge in the Central and Southern Andes region. Previous studies have successfully employed data augmentation techniques to overcome these limitations (https://doi.org/10.1007/s10346-022-01981-w and https://doi.org/10.3390/w15142514). By utilizing data augmentation (DA), it is possible to significantly enhance the limited landslide dataset and provide valuable support for machine learning models. Furthermore, DA effectively addresses class imbalance issues by generating additional samples from underrepresented classes, leading to improved performance and accuracy in classification tasks. I strongly urge the authors to investigate strategies for implementing data augmentation in the study zone to evaluate changes in susceptibility.

Section 2 appears to be poorly organized. The geology of the study area is insufficiently covered, so I recommend including additional information to provide a comprehensive overview. Furthermore, the database section lacks cohesion and is scattered across different subsections. I suggest merging sections 2.6 and 2.7 into a single subsection that covers each machine learning model in detail.

The results of the study are concerning, particularly regarding the weight assigned to "slope". It is widely understood that slope plays a crucial role in landslide generation. I strongly recommend re-evaluating the VD variable and their relationship with the actual location of landslide emplacement. It is suspected that many landslides are occurring in hillslope areas within the basin, which could introduce a bias in the results. To address this, the authors should consider removing "slope" from the initial model and assess the impact on the new results.

After conducting a quick review of the Andes region, it is evident that the results of the study suggest a superior performance compared to similar studies that utilized logistic regression (e.g., https://doi.org/10.5194/nhess-22-2169-2022 and https://doi.org/10.1007/s11069-020-03913-0). I strongly urge the authors to consider the differences and similarities that contribute to the improved performance of the models used in this study or explore alternative models that could further enhance the analysis in the context of the Andes region.

Section 3.4 requires a complete rewrite. The thresholds used to define the different levels of susceptibility are currently unclear. It is essential for the authors to justify these values using a reproducible approach.
The content of Section 4 appears to contain sentences and information that should be part of the results section. It is unclear how the authors can discuss the findings without prior presentation of the results. I suggest assessing the relevance of each model and discussing the spatial variability, taking into account additional parameters such as geology and slope. The section currently lacks strength and does not contribute new scientific insights.
Minors comments:
Line 224: Please, consider revising "contest" using a similar word.
Citation: https://doi.org/10.5194/nhess-2023-72-RC2
- AC2: 'Reply on RC2', Francisco Parra, 24 Jul 2023
  
  Thank you very much for your detailed reply.
  Here is the answer, broken down into parts:
  I encourage the authors to include studies over the Andes region that support their study.
  After an extensive and updated literature review, we found few publications linked to susceptibility assessment in the Andes. In this regard, we found that in (Ospina-Gutiérrez, 2021) a susceptibility mapping was performed in a different Andean area in terms of geomorphology and climate, but like our study, the most successful algorithm corresponds to Random Forest. In (Brenning, 2015) they use GAM models for the calculation of susceptibility in areas near roads, and here they note the importance of curvature, like our study, as an important factor in the calculation of susceptibility. In (Lizama, 2022), they also found the relevance in the curvature. Finally, in (Buecchi, 2019) found that they can build useful and effective landslide susceptibility maps using only the DEM of the zone, holding the results obtained in this work, which also uses satellite imagery.
  In the new version of our manuscript we will include these works on the Andes areas.
  The manuscript is poorly written, exhibiting shortcomings and over-explanation throughout the document
  The parts that were observed will be duly reviewed and corrected. In particular, the following will be done:
  * The introduction will be fixed, eliminating redundant parts and clearly stating the research questions and gaps that this work seeks to cover.
  
  * In the study area section, the geological information on the area will be supplemented.
  
  * The inventory section will be arranged more concisely and sections 2.6 and 2.7 will be brought together, as previously suggested.
  
  * Paragraphs from the discussion section that were erroneously added in the discussion section will be included in the results section.
  
  * The discussion will include a paragraph discussing similar publications on susceptibility to removal in the Andes that support the findings of this paper.
  All these improvements are included in the attached zipped file.
  The documents has an extension oversized, showing irrelevant information of data
  As mentioned above, redundant parts will be removed.
  I encourage to the authors to review new knowledge that their results could contribute to Central Andes
  \- The approach for the evaluation of the models is through repeated cross-validation. This is different from the vast majority of the works presented in the literature where they separate the dataset into an arbitrary percentage and perform the calculation of the metrics at once, instead of creating a statistical distribution as it is done in the experiments section of our article. Our approach provides a more comprehensive verification of the results obtained than previous work.
  \- Unlike previous works, the data sources used in the construction of the model proposed in our article come exclusively from satellite images and digital elevation models which consider sources of information with a greater amount of data. This enables the disaster risk analysts to consider these sources of data in practice. Our approach has the advantage that it can allow the generation of systems that create susceptibility maps based on the periodic updating of the satellite images, which can contribute to the creation of a susceptibility monitoring system that can be implemented by technical agencies to implement early warning systems.
  \- In this work, two features derived from satellite images are used to build the model: EVI (Enhanced vegetation index) and NDGI (normalized difference glacier index). The novelty is that they have not been used before in the literature, thus enabling the application of this source of useful data in the construction of future models derived from our approach. In addition, in our paper it is shown that a dataset such as the VD (valley depth index) is highly relevant for areas similar to the one we studied where there exist narrow and steep valleys.
  The introduction lacks clarity regarding the research question and fails to adequately address the research gap.
  In the zipped file you will find the new introduction, with the problems presented here solved.
  It is unclear whether the authors focus on Rainfall-Induced Landslides or Earthquake-Induced Landslides
  The landslides studied correspond to those induced by rainfall. It will be clarified in the text.
  I strongly urge the authors to investigate strategies for implementing data augmentation in the study zone to evaluate changes in susceptibility
  In the suggested publications, it was not possible to find a suitable methodology for this case. However, in (Dou, 2019) it is suggested that by using samples from the landslide scarp polygon, it is possible to increase the accuracy of the model. In this work, the center of the landslide body was being used to characterize the phenomenon. Therefore, to increase the dataset, it was decided to take 10 samples (pixels) within the scarp polygon of each landslide, and in the case of the non-landslide points, a polygon surrounding the initial point was created, and from this, 10 samples are taken within this polygon, which allows to multiply by 10 the studied dataset, and thus provide the statistical robustness needed in this work. Preliminary results are attached, where it is possible to appreciate that the ROC of Random Forest increased from 0.9 to 0.95. XGBoost also improves its performance, and SVM and LR do not improve.
  The geology of the study area is insufficiently covered
  An updated geological section is attached in the zipped file.
  The database section lacks cohesion and is scattered across different subsections. I suggest merging sections 2.6 and 2.7 into a single subsection that covers each machine learning model in detail.
  The new layout of section 2 of the paper will be as follows:
  2.1: Characteristics of the study area and data set.
  2.2: Conditioning factors and selection of factors
  2.2.1: IGR technique
  2.2.2: Correlation calculation
  2.3: Modeling using Machine Learning
  2.3.1: SVM
  2.3.2: LR
  2.3.3: RF
  2.3.4: Xgboost
  2.3.5: Hyperparameter Optimization
  2.3.6: Cross Validation
  2.3.7: Model validation
  In the zipped file the updated section is attached.
  I strongly recommend re-evaluating the VD variable and their relationship with the actual location of landslide emplacement
  The variable VD (Valley Depth) in the study refers to the vertical distance to a base level of the hydrographic network. This index is calculated using an algorithm that involves interpolating the elevation of the base level of the hydrographic network and then subtracting this base level from the original elevations.
  In the context of the study, the valley depth (VD) index was found to provide a lot of information for the model. A high valley depth index may be related to a high susceptibility to landslides due to the steep topography and abrupt relief present in the study area, which favors the occurrence of gravitational processes and increases the rate of erosion on the slopes.
  The authors should consider removing "slope" from the initial model and assess the impact on the new results
  The slope variable is not part of the model. It was only part of the initial set, and was discarded because of the correlation/IGR criterion. The final model is composed of 7 variables (Valley depth, convergence index, curvature, TWI, TPI, EVI, NDGI), which mix variables derived from the digital elevation model and others extracted from satellite images. Therefore, the slope does not affect the analysis in this work.
  I strongly urge the authors to consider the differences and similarities that contribute to the improved performance of the models used in this study
  In (Fustos, 2020) they build a model that allows predicting months of landslide occurrence, using hydrometeorological parameters. Unfortunately, being a completely different approach, it is not possible to make a comparison, since in our case we try to calculate geographic susceptibility, as an inherent characteristic of the terrain, while the other model tries to make temporal predictions using logit and probit models. On the other hand, in (Fustos, 2022), spatio-temporal models are built to calculate the probability of landslide occurrence using factors associated with rainfall and topography (slope). As in the previous case, the methodologies used make it impossible to make a comparison and establish a parallel with our work.
  Section 3.4 requires a complete rewrite. The thresholds used to define the different levels of susceptibility are currently unclear
  For the calculation of thresholds, the Jenks Breaks method will be used, which is widely used in the literature, as it is based on an optimisation algorithm that minimizes the within-class variance and maximizes the between-class variance. Attached is an example map representing the Random Forest model.
  The content of Section 4 appears to contain sentences and information that should be part of the results section. It is unclear how the authors can discuss the findings without prior presentation of the results. I suggest assessing the relevance of each model and discussing the spatial variability, taking into account additional parameters such as geology and slope. The section currently lacks strength and does not contribute new scientific insights
  To improve the discussion section, several paragraphs were moved to the results section.
  Logistic regression does not require strict assumptions about the distribution of independent variables and to model the relationship between a binary dependent variable and one or more independent variables, which may be non-linear. Although not as accurate as other models, it has the advantage of being a simpler model and easier to interpret than many machine learning models. However, it can have difficulties with large or complex data sets. As the amount of data increased, logistic regression may not have been able to effectively model the relationship between variables, resulting in a decrease in accuracy. Therefore, it performed better on the initial set, without the increase in data.
  Support Vector Machines often outperform other statistical models, such as logistic regression, in predicting landslide susceptibility. However, they can be more difficult to interpret than statistical models. While SVM can handle high-dimensional data, it can have difficulties with large datasets due to its computational complexity. In addition, if the data is noisy or overlapping, the performance of SVM may decrease. For that reason, its performance also decreases with respect to the initial dataset.
  Random Forests can capture complex and non-linear interactions between variables. In addition, they are robust to multicollinearity, i.e. they can handle situations where predictor variables are highly correlated. This algorithm is known for its ability to handle large data sets and its resistance to overfitting. By increasing the amount of data, RF may have been able to build more accurate and robust decision trees, resulting in increased accuracy
  XGBoost is an ensemble algorithm that uses the boosting technique. In the case of landslides, this can allow for capturing complex patterns in the data that other models may miss. It also provides a number of options for regularization and model fitting, which can help avoid over-fitting and improve model performance. It is also robust to multicollinearity, i.e. it can handle situations where predictor variables are highly correlated with each other. Similar to RF, XGBoost can also handle large data sets and is robust to overfitting. By having more data, XGBoost may have had more opportunities to learn and improve its performance.
  Regarding the spatial variability, an inspection of the generated susceptibility map in comparison with the geological units of the area shows that the most susceptible areas are located in the Paleozoic - Mesozoic rocks.
  **Minors comments: **
  **Line 224: Please, consider revising "contest" using a similar word.**
  The chosen word in order to replace “contest” was "competition"
  References
  * Brenning, A., Schwinn, M., Ruiz-Páez, A. P., & Muenchow, J. (2015). Landslide susceptibility near highways is increased by 1 order of magnitude in the Andes of southern Ecuador, Loja province. _Natural Hazards and Earth System Sciences_, _15_(1), 45-57.
  
  * Bueechi, E., Klimeš, J., Frey, H., Huggel, C., Strozzi, T., & Cochachin, A. (2019). Regional-scale landslide susceptibility modelling in the Cordillera Blanca, Peru—a comparison of different approaches. _Landslides_, _16_, 395-407.
  
  * Dou, J., Yunus, A. P., Merghadi, A., Shirzadi, A., Nguyen, H., Hussain, Y., ... & Yamagishi, H. (2020). Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning. _Science of the total environment_, _720_, 137320.
  
  * Fustos, I., Abarca-del-Rio, R., Moreno-Yaeger, P., & Somos-Valenzuela, M. (2020). Rainfall-Induced Landslides forecast using local precipitation and global climate indexes. _Natural Hazards_, _102_, 115-131.
  
  * Fustos-Toribio, I., Manque-Roa, N., Vásquez Antipan, D., Hermosilla Sotomayor, M., & Letelier Gonzalez, V. (2022). Rainfall-induced landslide early warning system based on corrected mesoscale numerical models: an application for the southern Andes. _Natural Hazards and Earth System Sciences_, _22_(6), 2169-2183.
  
  * Lizama, E., Morales, B., Somos-Valenzuela, M., Chen, N., & Liu, M. (2022). Understanding landslide susceptibility in Northern Chilean Patagonia: A basin-scale study using machine learning and field data. _Remote Sensing_, _14_(4), 907.
  
  * Ospina-Gutiérrez, J. P., & Aristizábal, E. (2021). Aplicación de inteligencia artificial y técnicas de aprendizaje automático para la evaluación de la susceptibilidad por movimientos en masa. _Revista Mexicana de Ciencias Geológicas_, _38_, 43-54.
  We look forward to hearing from you.
  Have a nice day
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC2
RC3:
'Comment on nhess-2023-72', Anonymous Referee #1, 14 Jul 2023

Thank you for your answers.

First statement

I acknowledge the precedents.

I acknowledge and agree that repeated cross-validation helps address the issue of data scarcity to some extent. However, susceptibility modelling assumes that the available data used for model development and validation are representative of the study area. The assumption is that the data adequately capture the spatial and temporal variations in the susceptibility factors and are sufficient to train and evaluate the model effectively. With a so limited number of landslides the only way this assumption is valid is because the area is quite small. However, in that case, the findings of the research are valid within that very small area (or within you inventory), and therefore, extremely site specific. Overall, in the case of a non-representative landslide inventory, it does not really matter which training and validation strategy you adopt, your result will be biased anyway.

Second statement

I acknowledge and agree with your points 2 and 3. Please emphasise it more in the manuscript.

I wouldn’t consider the use of repeated cv as a novelty… surely a very good strategy though.

Overall my advice is to enlarge the study area and increase the number of landslides. Please also add some more details about the landslides (occurrence, trigger…. If possible).

Another solution would be to repeat the very same approach in other similar areas (to get more transferable findings).

Citation: https://doi.org/10.5194/nhess-2023-72-RC3
- AC3: 'Reply on RC3', Francisco Parra, 24 Jul 2023
  
  Good Day!
  Regarding your concern about the lack of data for the analyisis, we followed this strategy:
  In the suggested publications, it was not possible to find a suitable methodology for this case. However, in (Dou, 2019) it is suggested that by using samples from the landslide scarp polygon, it is possible to increase the accuracy of the model. In this work, the center of the landslide body was being used to characterize the phenomenon. Therefore, to increase the dataset, it was decided to take 10 samples (pixels) within the scarp polygon of each landslide, and in the case of the non-landslide points, a polygon surrounding the initial point was created, and from this, 10 samples are taken within this polygon, which allows to multiply by 10 the studied dataset, and thus provide the statistical robustness needed in this work. Preliminary results are attached, where it is possible to appreciate that the ROC of Random Forest increased from 0.9 to 0.95. XGBoost also improves its performance, and SVM and LR do not improve.
  
  We include our preliminary results in the attached file.
  Have a nice day!
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC3
RC4:
'Comment on nhess-2023-72', Anonymous Referee #3, 28 Jul 2023

The paper compares the results of susceptibility maps in a small basin in Chile using four machine-learning methods.
The quality of the paper is fair. However, the impact is quite limited. They used a small location in Chile to train the ML models with a small sample of landslides which has been done in multiple opportunities and does not provide sufficient new knowledge for a NHESS publication. A section of the papers mentioned that these methods are pretty new in Geosciences, which I think is incorrect since extensive references and books describe the multiple applications of ML in geosciences (Line 86-87) (i.e. https://www.sciencedirect.com/science/article/pii/S0065268720300054)
Major comments:
They used a general definition of landslides, including rockfalls, flows, and falls. Then they discussed mass movements where landslides are part of them (Line 26-30). So my first question is, what kind of events have been used for the models? I think they used all of the mass movement events reported in the available database with no distinction of the nature of the events; if that is the case, do you think all the "mass movements" have the same trigger factors (passive or active)? I used the term "mass movements" here to use the same wording as the authors, but I would like to see a more precise definition of the events analyzed.
Then, you mentioned that you verified the database using "Laboratory analysis" (line 140). What kind of analysis did you run? How did you verify the location of the events reported by Sernageomin?
Then in line 147, you mentioned, "In any way, having a small amount of data, it is necessary to resort to a cross-validation process, and thus obtain an average AUC that can be compared between the models used." Can cross-validation not be done in larger datasets?
They used geology as a background of the study area, but in the discussions, I did not see anything regarding that topic. For example, according to the models, are there geological units that are more susceptible than others? Which model could be more recommended according to the area's geomorphological, climatological, and geological characteristics?
Why is geology not included in the analysis since there is variation in the units in the area? This is not explained anywhere.
The paper claims to have two novelties related to ML applications. I do not think that they are accurate. The authors should support with references this contribution to ML methods.
Then, they mentioned as a novelty the use of a series of R packages which I do not see as a new idea.
I do not understand the purpose of including Colca Valley, Indo Valley, and Colorado River Valley. This model is unlikely to work in those locations since your models have no training data from those valleys.
The literature review does not include similar work in South America, which may provide insight into the state of the art in the Andes.
Finally, assuming the locations of the events are correct. What happens with the temporality? To be consistent, the model should be trained with the characteristics of the landscape just before the events. In this work, all the events (floods, landslides, rockfalls, etc.) are treated the same, so something should describe and differentiate them in the model.
Minor comments:
Figure 2 is not cited in the document.
All the ML review on page 4 does not contribute significantly to the paper.
English grammar and style need further review.
All the acronyms should be explained (GNDVI, EVI, NDMI, etc.)

Citation: https://doi.org/10.5194/nhess-2023-72-RC4
- AC4: 'Reply on RC4', Francisco Parra, 30 Jul 2023
  
  Thanks for your revision.
  Here are the answers of your statements about this study
  The quality of the paper is fair. However, the impact is quite limited. They used a small location in Chile to train the ML models with a small sample of landslides which has been done in multiple opportunities and does not provide sufficient new knowledge for a NHESS publication. A section of the papers mentioned that these methods are pretty new in Geosciences, which I think is incorrect since extensive references and books describe the multiple applications of ML in geosciences.
  The area used in the study is 7400 km2, which is larger than most studies using this methodology. For example:
  - In (Abedini, 2019), the study area is 516.44 km².
  
  - In (Can, 2021), it is 2718.7 km².
  
  - In (Vasu, 2016) it is 5104 km².
  
  - In (Al-Najjar, 2021) it is 1879.5 km².
  
  - In (Dang, 2020) it is 101.3 km².
  Regarding the small sample of landslides, this was an issue that was also criticised by the other reviewers. To resolve this difficulty, we chose to use a data augmentation strategy described in (Dou, 2019), in which it is suggested that by using samples from the landslide scarp polygon, it is possible to increase the accuracy of the model. In this work, the center of the landslide body was being used to characterize the phenomenon. Therefore, to increase the dataset, it was decided to take 10 samples (pixels) within the scarp polygon of each landslide, and in the case of the non-landslide points, a polygon surrounding the initial point was created, and from this, 10 samples are taken within this polygon, which allows to multiply by 10 the studied dataset, and thus provide the statistical robustness needed in this work. Preliminary results are attached, where it is possible to appreciate that the ROC of Random Forest increased from 0.9 to 0.95. XGBoost also improves its performance, and SVM and LR do not improve.
  Regarding this study, at no time is it intended to point out that the novelty of the study lies in the use of ML in geosciences, understanding that there are numerous studies that address the subject in the literature. That said, what is relevant to this study that can make a significant contribution to the area, in our opinion, are the following points:
  - Unlike previous works, the data sources used in the construction of the model proposed in our article come exclusively from satellite images and digital elevation models which consider sources of information with a greater amount of data. This enables the disaster risk analysts to consider these sources of data in practice. Our approach has the advantage that it can allow the generation of systems that create susceptibility maps based on the periodic updating of the satellite images, which can contribute to the creation of a susceptibility monitoring system that can be implemented by technical agencies to implement early warning systems.
  - In this work, two features derived from satellite images are used to build the model: EVI (Enhanced vegetation index) and NDGI (normalized difference glacier index). The novelty is that they have not been used before in the literature, thus enabling the application of this source of useful data in the construction of future models derived from our approach. In addition, in our paper it is shown that a dataset such as the VD (valley depth index) is highly relevant for areas similar to the one we studied where there exist narrow and steep valleys.
  In any case, the part of line 86-87 that causes doubts will be removed from the final version, considering that the use of ML in geosciences can no longer be considered as something new.
  They used a general definition of landslides, including rockfalls, flows, and falls. Then they discussed mass movements where landslides are part of them (Line 26-30). So my first question is, what kind of events have been used for the models? I think they used all of the mass movement events reported in the available database with no distinction of the nature of the events; if that is the case, do you think all the "mass movements" have the same trigger factors (passive or active)? I used the term "mass movements" here to use the same wording as the authors, but I would like to see a more precise definition of the events analyzed.
  The landslides studied in this work correspond to detrital flows and alluviums. All of them are triggered by (active) rainfall processes. We understand that this was not fully clarified in the draft version, but we assure that this will be accounted in the final version.
  Then, you mentioned that you verified the database using "Laboratory analysis" (line 140). What kind of analysis did you run? How did you verify the location of the events reported by Sernageomin?
  The basis of the events used corresponds to the historical elaboration of a set of data compiled from events that have occurred in the country by the National Mining and Geology Service of Chile. The validation corresponds to field visits by professionals from the agency and the respective visual inspection carried out by the researchers present in the study based on the analysis of aerial images obtained from Google Earth and satellite images from the Landsat-9 campaign.
  Then in line 147, you mentioned, "In any way, having a small amount of data, it is necessary to resort to a cross-validation process, and thus obtain an average AUC that can be compared between the models used." Can cross-validation not be done in larger datasets?
  While cross-validation reinforces the possibility of performing data analysis on small datasets, it does not exclude analysis on larger datasets. In fact, with the change that was made through the data augmentation strategy, the use of this technique was maintained, obtaining excellent results in the analysis of the models.
  They used geology as a background of the study area, but in the discussions, I did not see anything regarding that topic. For example, according to the models, are there geological units that are more susceptible than others? Which model could be more recommended according to the area's geomorphological, climatological, and geological characteristics?
  The most susceptible geological formations in the study area correspond to those dated within the Palaeozoic and Mesozoic. No significant variations are observed between the corresponding models and the aforementioned features.
  Why is geology not included in the analysis since there is variation in the units in the area? This is not explained anywhere.
  The non-inclusion of geology responds to practical criteria and the need to demonstrate in the study that it is only necessary to use topographic and satellite image parameters. In addition, these characteristics are directly linked to the geological formations of the area, and considering the performance of the models from a statistical point of view, adding these characteristics could generate an overestimate and not contribute to the development of the model.
  The paper claims to have two novelties related to ML applications. I do not think that they are accurate. The authors should support with references this contribution to ML methods.
  With respect to repeated cross-validation, although its use is not usual in this type of study, there are precedents in the literature. The second statement, on the other hand, is invalid, since the Jenks Break method will be used in the construction of the maps. Therefore, the indicated paragraph will be removed from the final version, and this part will be resolved.
  Then, they mentioned as a novelty the use of a series of R packages which I do not see as a new idea.
  So far, there are no studies in the area that use the MLR3 package with an object-based orientation to perform the corresponding analyses. However, we understand that this is part of the methodology, and should not be considered as a novelty, so it will also be modified.
  I do not understand the purpose of including Colca Valley, Indo Valley, and Colorado River Valley. This model is unlikely to work in those locations since your models have no training data from those valleys.
  This is part of an observation from a superficial analysis of these areas, which share similar geomorphological characteristics. We want to make it clear that this is only a recommendation, and therefore, to validate this we would need to have training data in each valley and perform the same statistical analysis proposed in this study.
  The literature review does not include similar work in South America, which may provide insight into the state of the art in the Andes.
  The final version of the paper includes a paragraph comparing this work with others carried out in the Andes region:
  After an extensive and updated literature review, we found few publications linked to susceptibility assessment in the Andes. In this regard, we found that in (Ospina-Gutiérrez, 2021) a susceptibility mapping was performed in a different Andean area in terms of geomorphology and climate, but like our study, the most successful algorithm corresponds to Random Forest. In (Brenning, 2015) they use GAM models for the calculation of susceptibility in areas near roads, and here they note the importance of curvature, like our study, as an important factor in the calculation of susceptibility. In (Lizama, 2022), they also found the relevance in the curvature. Finally, in (Buecchi, 2019) found that they can build useful and effective landslide susceptibility maps using only the DEM of the zone, holding the results obtained in this work, which also uses satellite imagery.
  Finally, assuming the locations of the events are correct. What happens with the temporality? To be consistent, the model should be trained with the characteristics of the landscape just before the events. In this work, all the events (floods, landslides, rockfalls, etc.) are treated the same, so something should describe and differentiate them in the model.
  The types of events that are analysed in this manuscript correspond to slidings and detrital flows caused by extreme precipitation events. On the other hand, one of the principles used in works of this style corresponds to "the past is the key to understanding future processes" (Guzetti, 2012). That said, all authors construct their maps using a priori past events and measure susceptibility without considering this detail, which gives empirical validity to this idea.
  Minor comments:
  Figure 2 is not cited in the document.
  Figure 2 is quoted on line 269.
  All the ML review on page 4 does not contribute significantly to the paper.
  Following the recommendations obtained by all the referees, this part was completely transformed and reduced so as not to include information not relevant to the work.
  English grammar and style need further review.
  This detail is considered for the final delivery of the document.
  All the acronyms should be explained (GNDVI, EVI, NDMI, etc.)
  To correct this, we will include the explanation of all acronyms in their first appearance in the paper.
  ------------------------------------------------------------------------------------------------------------------------------------------------------
  We include here some corrected parts of the paper and the new results that have followed your recommendations and those of the other referees.
  Have a nice day, and we look forward to hearing from you.
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC4
CC1:
'Comment on nhess-2023-72', Albert Cabré, 30 Jul 2023

I have read other comments from colleagues, and I agree with most of them. I now want to give insights to the authors to encourage a resubmission of a future paper that addresses the weaknesses detected in their attempt to provide remote-based susceptibility maps and to their future application at regional scales in the region.
The research paper of Parra and collaborators in discussion to the NHEES journal of the EGU uses and compares the susceptibility to landslides of an arid watershed of northern Chile in the Atacama Desert. This arid region still lacks regional maps of landslide susceptibility although I immediately want to draw the attention to the authors that at least (to my knowledge) one recent research paper has addressed this https://doi.org/10.5194/nhess-20-1247-2020 in the Atacama Desert and therefore should, at least, be discussed/included.
The authors propose to apply machine learning algorithms to produce landslide susceptibility maps in an understudied region of northern Chile. This is what motivated to read the paper at first and is something that I also think is necessary. Nowadays, private companies are the ones that usually do work on this kind of risk assessment problems in Chile, but the problem is that little literature is available to the community nor to researchers from their investigations. So, initiatives such as the one the authors attempt here are valuable. However, their results although have a sound of being innovative, completely lack a throughout knowledge of hazardous geomorphologic processes that impact this region during rare rainstorm events and this will be the main point of my comments.
The manuscript navigates through technical and methodologically more or less well described steps. However, the main problem I find is that starts in the wrong point (e.g., selection of conditioning factors) from the early beginning proving this research useless as it is and proving that the design of the analysis is not correct. I need to put the focus on this immediately because authors state in (lines 15-17) ‘’ The findings of this investigation have the potential to assist in land use planning, landslide risk reduction, and informed decision making in the surrounding zones.’’ This is dangerous and any land use, risk reduction strategy or informed decision should be made using this results in the region without a further redesign of the study. Since they have stated that this is a major outcome of their research, I want to be crystal clear on that because this cannot continue down this path. Maybe authors may resubmit the paper elsewhere as it I (I hope they don’t) but it’s not rigorous because now lacks consideration to basic and fundamental findings from previous research works in this region of the Atacama Desert and therefore trains a machine learning algorithm with the wrong and thus erroneous conditioning parameters.
Their main graphical outcomes are landslide susceptibility maps that might look good for readers not working in arid zones but that completely lack any link with the reality that two recent rain events have proved and have been reviewed in https://doi.org/10.1002/2016GL069751 for the March 2015 rainstorm event and https://doi.org/10.1007/s11069-022-05707-y for the March 2022 rainstorm event which both impacted this evaluated watershed. On a rapid qualitative assessment, any of the impacted areas as the ones shown in https://doi.org/10.1007/s11069-022-05707-y are identified as susceptible in their susceptibility maps (Figs.10-13).
Surprisingly, they use only 86 sites where landslides have been reported although research papers have recently shown landslides, flash floods and other runoff related hazards in the exact same location (see https://doi.org/10.1002/2016GL069751, see https://doi.org/10.1007/s11069-022-05707-y). Future research needs to consider datasets probably as validation sites and/or conduct/apply similar methodological approaches.
Although the paper has some merit in implementing machine learning with robust statistics, the results misunderstand landslide- related geohazards that impact this region during extreme rainfall events. First, the selection of conditioning factors does not take into consideration previous research done in the southern Atacama Desert done in a ‘less’ arid watershed situated south from El Salado (see Aguilar et al., 2020) and therefore this paper would need a throughout literature review to better chose what conditioning factors. For example, Aguilar suggests including information of the colluvial and alluvial cover which is large in arid landscapes due to the lack of effective sediment removal mechanisms (only intense rains every 30 years?) capable of producing enough runoff to entrain and therefore produce i.e., debris flows.
Then, the authors use one Landsat 9 image from February 2022 from the one they decide that is just okey to do not give further detail. A recent paper addresses understanding the geomorphic change during rainfall events (including landslides, rills, etc) https://doi.org/10.1016/j.rsase.2023.100927 doing a thorough review of the use of optical imagery (with particular focus on the Landsat family) and gives enough insights to the authors to maybe use Olivares approach. However, they decided to derive optical-derived spectral indexes (NDVI, GNDVI, EVI, NDMI, BSI, NDWI, NDGI) without giving further justification nor evidence to use them in an arid region. This point is specially discouraging. First, any kind of present glacial evidence has been reported for this watershed (see https://doi.org/10.1016/j.quaint.2017.04.033). García only shows evidence to the Altiplano-Puna region (endorreic basins to the East of the studied catchment) and second, the use of vegetation indexes in this arid watershed is useless and honestly is one more evidence that the authors of this paper do not even understand where they are applying their set of patterns to automatically extract information.
The manuscript continues with some sentences that are at least confusing to me:
(Lines 389-391): ‘’NDGI uses spectral bands corresponding to green and red, so this would imply that landslide and non-landslide areas create contrast between these wavelengths. Therefore, it is suggested to use these indices in areas similar to the studied in this work.’’ This is unclear. Why would the spectral signature of an area of 86 landslides be contrasting to a whole catchment of 7400 km2? There is a problem in landslide points selection that might be overcome when looking, again, to the results from Wilcox and Cabré because they show large areas (not points) of geomorphic change after recent events and I do not believe the spectral signature would change significantly in alluvial, colluvial or flat surfaces characteristic of this studied catchment. Why? Because significant mineralogical changes would indicate processes that are not accounted in this region. If the authors were right, which they are not, differences in valleys floors, colluvial and ‘old’ surfaces would be easily recognized. Is not the case if you check the recent paper https://doi.org/10.1016/j.geomorph.2022.108504 were their surface maps rather than using spectral signatures are based on surface roughness. Their research paper cannot be overlooked when describing the geological and geomorphological setting of the study area because these flat surfaces cover large areas of the studied catchment.
(lines 431-433): ‘’the model can be expected to be suitable in areas worldwide that is a semi-arid zone, with a variable topography and a Mediterranean climate with a prolonged dry season, in addition to having narrow and deep valleys, where the maximum susceptibility is concentrated. Examples of these zones are the following:’’. Here I have to say that I was surprised by the shift to a global perspective of applying what the authors have done. To support this global idea the authors do not use very suitable examples. So as not to repeat previous comments, I wanto to highlight that all the mentioned rivers are permanent rivers with significant water discharge because they drain areas where annual rainfall amounts are significantly more important that what happens in this catchment of the Atacama.
(Lines 466-469): ‘’These findings provide valuable perspectives for informed decision-making and policy formulation in landslide-prone regions. Overall, our study highlights the potential of machine learning models, particularly SVM and RF, for accurate and reliable landslide susceptibility mapping, which can aid in identifying high-risk areas and implementing effective mitigation strategies, which is useful for stakeholders and land-planning authorities.’’. It is probably not the aim behind a discussion in a peer reviewed journal, but I would like to suggest to the authors that this study need to redesigned from its basics.
My recommendation is to reject the paper as it is.

Citation: https://doi.org/10.5194/nhess-2023-72-CC1
- AC5:
  'Reply on CC1', Francisco Parra, 01 Aug 2023
  
  Good day!
  We deeply appreciate your constructive criticism and the opportunity to clarify certain aspects of our work. We would like to address your concerns as follows:
  The manuscript navigates through technical and methodologically more or less well described steps. However, the main problem I find is that starts in the wrong point (e.g., selection of conditioning factors) from the early beginning proving this research useless as it is and proving that the design of the analysis is not correct. I need to put the focus on this immediately because authors state in (lines 15-17) ‘’ _The findings of this investigation have the potential to assist in land use planning, landslide risk reduction, and informed decision making in the surrounding zones._’’ This is dangerous and any land use, risk reduction strategy or informed decision should be made using this results in the region without a further redesign of the study. Since they have stated that this is a major outcome of their research, I want to be crystal clear on that because this cannot continue down this path. Maybe authors may resubmit the paper elsewhere as it I (I hope they don’t) but it’s not rigorous because now lacks consideration to basic and fundamental findings from previous research works in this region of the Atacama Desert and therefore trains a machine learning algorithm with the wrong and thus erroneous conditioning parameters.
  Reference to previous work: We appreciate your mention of previous research in the Atacama Desert. Our focus is on the application of machine learning algorithms for the generation of landslide susceptibility maps, which may differ from traditional studies. However, we recognise the importance of previous research and commit to include and discuss the paper you mentioned in our review of the manuscript.
  Understanding hazardous geomorphological processes: While we recognise the importance of understanding the hazardous geomorphological processes that impact the region during rare storm events, we believe that our data-driven approach can provide valuable and complementary insight to more traditional geology- and geomorphology-based approaches. Our approach allows us to identify patterns and relationships that may not be evident through direct observation and geological interpretation.
  Selection of conditioning factors: The selection of conditioning factors was based on the Information Gain Ratio (IGR) technique, which quantifies the predictive power of the contributing elements. We selected the most relevant elements among the 22 contributors based on this methodology. While we understand your concern about the starting point of our research, we defend our selection of conditioning factors based on the validity of the IGR methodology. This methodology is widely used in data science and has proven to be effective in selecting the most relevant factors in a variety of contexts. In addition, we use Pearson's correlation to eliminate factors that have a high rate of correlation with each other, which is standard practice in data science to reduce multicollinearity and improve the accuracy of models. Furthermore, it is important to note that many machine learning models have interpretability issues ("the black box problem"), a condition that is generally accepted in the field.
  Analysis design: We understand your concern about the design of our analysis. Our intention was to apply machine learning algorithms to produce landslide susceptibility maps, and we used four different models (Random Forest, Support Vector Machine, XGBoost and Logistic Regression) for this purpose. We compared their performance and found that the RF model obtained the highest AUC indices. While we acknowledge that there may be room for improvement in our methodology, we defend our approach based on best practices in data science. The four models we use are widely recognised for their effectiveness in a variety of machine learning tasks, and cross-validation is standard practice for assessing model accuracy.
  Use of results for decision-making: Our results are intended to contribute to the body of knowledge and provide a starting point for future research. We do not suggest that our results be used for decision-making without further redesign and validation. However, we believe that our results may be useful to inform future research and modelling efforts in this area.
  Their main graphical outcomes are landslide susceptibility maps that might look good for readers not working in arid zones but that completely lack any link with the reality that two recent rain events have proved and have been reviewed in https://doi.org/10.1002/2016GL069751 for the March 2015 rainstorm event and https://doi.org/10.1007/s11069-022-05707-y for the March 2022 rainstorm event which both impacted this evaluated watershed. On a rapid qualitative assessment, any of the impacted areas as the ones shown in https://doi.org/10.1007/s11069-022-05707-y are identified as susceptible in their susceptibility maps (Figs.10-13).
  We understand the importance of these maps accurately reflecting the areas that are actually susceptible to landslides, especially in the context of recent rainfall events that have impacted the watershed we are studying. We appreciate your comments and the opportunity to clarify this point.
  In response to your comment, first of all, we would like to point out that we plan to update our susceptibility maps in the corrected version of our work, which will be submitted after 4 August. These updates will be based on some adjustments to the data we use, as well as a data augmentation strategy, which we hope will improve the accuracy of our maps.
  Our susceptibility maps are based on a combination of factors, including historical landslide data and a number of environmental variables that have been identified as important predictors of susceptibility. These factors are combined using machine learning algorithms to generate the susceptibility maps, which were selected and applied with the aim of maximising the predictive capacity of our models, which, according to our research and analysis, are relevant to the occurrence of this phenomenon in the Atacama Desert region. However, we understand that no model can perfectly capture reality, and there is always room for improvement.
  Regarding your observation that none of the impacted areas shown on https://doi.org/10.1007/s11069-022-05707-y are identified as susceptible on our susceptibility maps, we would like to point out that our susceptibility maps are not intended to be an accurate representation of the specific locations where landslides occurred in past events. Instead, they are intended to provide a general indication of the areas that are most susceptible to landslides, based on a combination of factors. We understand your concern that our maps may not fully reflect the reality of the areas affected by recent rainfall events in 2015 and 2022, as documented in the studies you mentioned. However, we would like to point out that these maps are a probabilistic representation of landslide susceptibility based on the data and conditioning factors we have considered. They are not intended to be an accurate representation of every landslide that has occurred or could occur in the future.
  In addition, we would like to clarify a point that we find curious in your comment. You mention that none of the impacted areas shown in that publication are identified as susceptible on our susceptibility maps. However, in reviewing the publication, we note that it does not present landslide susceptibility maps. Figure 1 of that publication shows coherence and precipitation parameters, but does not provide information on landslide susceptibility. We would therefore like to better understand the basis for your observation and would welcome any additional clarification you can provide.
  Despite these limitations, we believe that our maps provide a valuable contribution to the understanding of landslide susceptibility in the Atacama Desert region. We will continue to refine our models and maps as more data become available and as more research is conducted in this region.
  Surprisingly, they use only 86 sites where landslides have been reported although research papers have recently shown landslides, flash floods and other runoff related hazards in the exact same location (see https://doi.org/10.1002/2016GL069751, see https://doi.org/10.1007/s11069-022-05707-y). Future research needs to consider datasets probably as validation sites and/or conduct/apply similar methodological approaches.
  
  We would like to clarify that the 86 landslide sites used were selected based on data availability and reliability of information sources.
  It is important to note that although there are other studies that have reported landslides, flash floods and other runoff-related hazards in the same location, these studies may have used different methodologies and criteria to identify and classify these events. Therefore, it is not always possible or appropriate to directly combine these datasets with our own.
  That said, we agree that the inclusion of more landslide sites could improve the robustness and representativeness of our model. In response to your suggestion and those of other reviewers, we have implemented a data augmentation strategy to improve the representativeness of our dataset. As a result, we have updated our landslide susceptibility map, which now reflects in greater detail the distribution and frequency of landslides in the Atacama Desert region.
  The updated map is attached here for your review. We welcome your comments and suggestions, and hope that this update addresses your concerns.
  Although the paper has some merit in implementing machine learning with robust statistics, the results misunderstand landslide-related geohazards that impact this region during extreme rainfall events. First, the selection of conditioning factors does not take into consideration previous research done in the southern Atacama Desert done in a ‘less’ arid watershed situated south from El Salado (see Aguilar et al., 2020) and therefore this paper would need a throughout literature review to better chose what conditioning factors. For example, Aguilar suggests including information of the colluvial and alluvial cover which is large in arid landscapes due to the lack of effective sediment removal mechanisms (only intense rains every 30 years?) capable of producing enough runoff to entrain and therefore produce i.e., debris flows.
  Understanding that there are many factors that affect the generation of landslides, which are described in the literature shown in this review, the construction of the models presented responds to a different methodology than the one proposed here, in the sense that machine learning seeks to make predictions.
  The factors chosen for the study are more than twenty that are used in the literature on ML susceptibility models, including in hyper-arid zones, such as the one studied. The seven parameters that were retained respond to the IGR methodology and the elimination of factors using the correlation criterion. Furthermore, the use of criteria such as the ROC curve validates the effectiveness of the model.
  We understand that the non-inclusion of certain factors mentioned in the critique may give the impression that our model is contradictory to the existing literature. However, we would like to emphasise that the absence of these factors does not necessarily imply that the processes they represent are being ignored. It is possible that these factors are correlated with the factors we have included in our model, and therefore their influence may be indirectly represented.
  Then, the authors use one Landsat 9 image from February 2022 from the one they decide that is just okey to do not give further detail. A recent paper addresses understanding the geomorphic change during rainfall events (including landslides, rills, etc) https://doi.org/10.1016/j.rsase.2023.100927 doing a thorough review of the use of optical imagery (with particular focus on the Landsat family) and gives enough insights to the authors to maybe use Olivares approach. However, they decided to derive optical-derived spectral indexes (NDVI, GNDVI, EVI, NDMI, BSI, NDWI, NDGI) without giving further justification nor evidence to use them in an arid region. This point is specially discouraging. First, any kind of present glacial evidence has been reported for this watershed (see https://doi.org/10.1016/j.quaint.2017.04.033). García only shows evidence to the Altiplano-Puna region (endorreic basins to the East of the studied catchment) and second, the use of vegetation indexes in this arid watershed is useless and honestly is one more evidence that the authors of this paper do not even understand where they are applying their set of patterns to automatically extract information.
  On the first point, it is important to clarify that the choice of a single Landsat 9 image was based on the availability and quality of data at the time of the study. While we recognise the importance of using multiple images to capture temporal variability, we also believe that a single image can provide valuable information, especially when combined with other data sources and used in conjunction with machine learning algorithms. In addition, we would like to clarify that, although the studied catchment is arid and vegetation is sparse, vegetation indices can provide valuable information that contributes to our model.
  Regarding the second point, we appreciate the suggestion to consider Olivares' approach. However, we would like to emphasise that the choice of optical spectral indices was based on the Information Gain Ratio (IGR) methodology, which has proven to be useful in similar studies. Through this method, we found that vegetation indices, despite the aridity of the region, provide significant information and contribute to the accuracy of our model. While we understand that the use of vegetation indices may seem inappropriate in an arid region, these indices not only capture vegetation, but can also indicate soil characteristics and environmental conditions that may influence landslide susceptibility. In addition, the presence of vegetation, although sparse, can be an important indicator of soil and water conditions, which are key factors in landslide occurrence.
  Finally, we would like to emphasise that our main objective was to explore the potential of machine learning algorithms for landslide prediction in an understudied region. While we acknowledge that our study has limitations and that there is room for improvement and refinement of our methodology, we believe that our findings provide a valuable basis for future research in this area.
  (Lines 389-391): ‘_’NDGI uses spectral bands corresponding to green and red, so this would imply that landslide and non-landslide areas create contrast between these wavelengths. Therefore, it is suggested to use these indices in areas similar to the studied in this work._’’ This is unclear. Why would the spectral signature of an area of 86 landslides be contrasting to a whole catchment of 7400 km2? There is a problem in landslide points selection that might be overcome when looking, again, to the results from Wilcox and Cabré because they show large areas (not points) of geomorphic change after recent events and I do not believe the spectral signature would change significantly in alluvial, colluvial or flat surfaces characteristic of this studied catchment. Why? Because significant mineralogical changes would indicate processes that are not accounted in this region. If the authors were right, which they are not, differences in valleys floors, colluvial and ‘old’ surfaces would be easily recognized. Is not the case if you check the recent paper https://doi.org/10.1016/j.geomorph.2022.108504 were their surface maps rather than using spectral signatures are based on surface roughness. Their research paper cannot be overlooked when describing the geological and geomorphological setting of the study area because these flat surfaces cover large areas of the studied catchment.
  We would like to clarify that the usefulness of NDGI in our study is not based on the assumption that landslide and non-landslide areas will have contrasting spectral signatures throughout the basin. Instead, NDGI is used as one of several features in our machine learning model, which takes into account the interaction of multiple factors in making its predictions.
  As for landslide point selection, we agree that point selection is a critical aspect of our study. Therefore, we chose to use a strategy that considers landslides as polygons, in order to rescue a larger number of features.
  Finally, regarding the claim that spectral signatures would not change significantly on alluvial, colluvial or flat surfaces, we would like to point out that while it is true that these types of surfaces may have similar spectral signatures, they may also exhibit subtle variations that can be captured by spectral indices and may be relevant for landslide prediction. We believe that our findings provide a valuable basis for future research in this area.
  (lines 431-433): ‘’_the model can be expected to be suitable in areas worldwide that is a semi-arid zone, with a variable topography and a Mediterranean climate with a prolonged dry season, in addition to having narrow and deep valleys, where the maximum susceptibility is concentrated. Examples of these zones are the following:’’. _Here I have to say that I was surprised by the shift to a global perspective of applying what the authors have done. To support this global idea the authors do not use very suitable examples. So as not to repeat previous comments, I wanto to highlight that all the mentioned rivers are permanent rivers with significant water discharge because they drain areas where annual rainfall amounts are significantly more important that what happens in this catchment of the Atacama.
  Our intention in suggesting that our model could be applicable in other semi-arid areas around the world was not to suggest that the specific results of our study would be directly transferable to these areas. Instead, our intention was to highlight that the general methodology we have used - that is, the use of machine learning algorithms to analyse a variety of conditioning factors and produce landslide susceptibility maps - could be useful in these areas.
  We recognise that each region has its own unique characteristics and challenges, and that any application of our methodology to a new region would require careful consideration of these factors. In particular, we agree that differences in climatic and hydrological conditions, such as those mentioned in the comments, would be important factors to consider. Here, the factor taken into account for the comparison was the topographical criteria.
  As for the specific examples we mentioned, we appreciate the feedback and recognise that we could have made a better choice. Our intention was simply to provide some examples of the types of regions that could benefit from an approach similar to ours, but we understand that these examples may have been confusing or misleading. In the future, we will strive to provide clearer and more relevant examples.
  (Lines 466-469): ‘_’These findings provide valuable perspectives for informed decision-making and policy formulation in landslide-prone regions. Overall, our study highlights the potential of machine learning models, particularly SVM and RF, for accurate and reliable landslide susceptibility mapping, which can aid in identifying high-risk areas and implementing effective mitigation strategies, which is useful for stakeholders and land-planning authorities_.’’. It is probably not the aim behind a discussion in a peer reviewed journal, but I would like to suggest to the authors that this study need to redesigned from its basics.
  We would like to clarify that our aim in conducting this study was not to provide a definitive solution to landslide problems in the region, but rather to explore the potential of machine learning models to assist in the identification of high-risk areas.
  Our study has limitations and that there is room for improvement and refinement of our methodology. However, we believe that our findings provide a valuable basis for future research in this area. In particular, our results highlight the importance of considering a variety of conditioning factors in assessing landslide susceptibility, and demonstrate the potential of machine learning models to analyse these complex interactions.
  Regarding the suggestion that our study needs to be redesigned from the ground up, we would like to point out that we are open to feedback and suggestions for improving our work. However, we also believe it is important to recognise that science is an iterative process and that each study contributes to our collective understanding of a problem, even if that study has limitations or leaves unanswered questions.
  Once again, we welcome constructive comments and will take them into account in the future.
  
  Citation: https://doi.org/10.5194/nhess-2023-72-AC5
  - CC3: 'Reply on AC5', Albert Cabré, 02 Aug 2023
    
    Dear Francisco and coauthors,
    You are welcome.
    Your work will have to satisfy all the requirements of the various specialists if it is you aim to provide a valuable basis for future research in this area ''. This means that if you select the conditioning parameters that then you will "ask" your Information Gain Ratio (IGR) technique you need to choose the ones that are most relevant to landslides in arid regions. You will have to be sure that you are providing IGR with the most suitable ones and this means that you have to rely on previous experience in arid areas. Luckily, your study area has them. Therefore, I suggest you redefine your study based on, for example, my suggestions provided above (previous comment 30-07-23). Then, you might be able to produce realistic landslide susceptibility maps using the proposed methods.
    This may not be the expected outcome and this can be frustrating, but journals like NHEES give us a good opportunity to learn and to stimulate scientific debate that would be difficult to have in "classical" academic settings where it is difficult to share specialist knowledge far from our experience. I expect to see how machine learning can be integrated but I needs to have a more robust starting point. Having said that, I would like to thank the authors for their fast response and also for their commitment to provide us with new figures. However, in order to to help them in future resubmissions I am commenting to some of their comments in the attached pdf file.
    
    Citation: https://doi.org/10.5194/nhess-2023-72-CC3
    
    AC6: 'Reply on CC3', Francisco Parra, 04 Aug 2023
    
    Thanks again for your response and the positive feedback.
    Here are the answers to the PDF you sent.
    No previous work attempted in the region can be considered traditional since they explore for the first time what are (i) the conditioning factors to be used in this arid region (year 2020, NHEES) and, (ii) are based on very novel remote sensing applications of SAR C-band satellites which are open access and I strongly recommend the authors to consider in future resubmissions (2020, ESPL; 2020; Natural Hazards). It’s a close call since I am the first author, but you can also rely on other works using similar approaches (Castellanezzi et al., 2023 in Australia; Botey i Bassols et al., 2023 in the Salar de Atacama; Olen and Bookhagen, 2020 in Argentina among other references therein that you might fing useful too). There is also the work of Olivares et al. (2023) as I discussed before which might also be a good starting point to your goal of defining susceptibility maps in the Atacama.
    You are right that these recent studies cannot be considered "traditional" in the area and we agree that they constitute an important starting point for our objective of defining susceptibility maps in the Atacama region. Therefore, in the final submission of this paper we undertake to review in detail these references you mention, to contrast them with our work, and to find the reason for the discrepancies you point out.
    The authors now claim to identify patterns and relationships. I have navigated the paper throughout and any physical processes, nor landslides triggering analysis is presented in the sense of giving clues about either the triggering or the possible feed-backs between the studied landscape features. If when authors claim ‘patterns’ they speak about areas under high susceptibility, again, previous work in the area proves them wrong. At least authors should discuss why their maps show differences with available literature in the area.
    Admittedly, we do not present a detailed analysis of physical processes that trigger landslides in the region. Our approach focuses on identifying patterns and relationships using machine learning techniques, but we understand that this does not replace the analysis of such physical processes, although the data-driven approach has certainly proven successful in multiple studies in the area.
    When we refer to identifying "patterns", we mean patterns in the data that allow the model to predict areas of greater or lesser susceptibility to landslides. We recognise that our maps may differ from the available literature on susceptibility in the area.
    It is not under discussion the utility of IGR or any other predictive technique. The problem, as I mention in my previous comment, is that this investigation starts from the wrong selection of conditioning factors by the authors. This selection was done by (I guess) a literature review conducted by the authors and there is where all the problems that this research has starts. You need to better understand, probably by doing a more thorough literature review (I just gave you some references in my previous comment), that studying arid regions is not a novel area and therefore it can be considered a traditional topic in geomorphology (see literature in the Sonoran Desert at the USA 1950’s, etc). I believe a deeper review can be done here if the aim of the authors is to provide the basis of future research of susceptibility to landslides in arid regions since they mention ‘’ we believe that our findings provide a valuable basis for future research in this area. ‘’ many times in their response to my previous comment. This paper now has a lot of conditioning factors selected under unclear criteria (not well justified or cited) and which have been proven wrong in recent and traditional literature in the region and in other arid regions of the world. This needs to be amended.
    While we acknowledge that our initial literature review could have been more comprehensive, we respectfully disagree that the selection of conditioning factors was flawed. The selection was made on the basis of 22 factors used in the literature on modeling susceptibility to landslides, including in arid areas such as the one studied.
    The 7 retained parameters respond to the IGR methodology and the elimination of factors using the correlation criterion, both standard procedures in data science. While we understand your concern, we defend our selection of factors on the basis of the validity of these methodologies.
    On the other hand, model effectiveness is validated by criteria such as the ROC curve, with values above 0.9 for all models. We believe that this demonstrates the predictive ability of the selected factors in combination with the applied machine learning algorithms. Furthermore, we insist on the "black box" effect in many Machine Learning models (Goetz, 2015), so it is normal to have factors whose action in the model is not easy to explain. However, there are factors that can be correlated with others if commented in the literature, and thus be "replaced" for the construction of the model.
    With regard to spectral indices, there are precedents in the literature regarding their use in arid zones. In (Kumar, 2023), NDVI is used as a conditioning factor in an arid region in Peru. In (Nhu, 2020) it is used for an arid region in Iran. In (Were, 2023) NDVI was used for an arid region in Kenya to calculate gully erosion susceptibility. In (Yaseen, 2022) NDVI is used to map flash floods in arid regions.
    In your comments you state that spectral indices cannot be used in arid or hyper-arid areas. However, you do not provide any evidence for this claim, so we would be very grateful if you could do so.
    You give a dataset which relies on observations done near the main roads, villages, mining districts. This is typical for this region and can be a limitation. However, many authors have done efforts to overcome this by providing regional observations (see Wilcox et al., 2016; Tapia et al., 2018; Cabré et al. 2022). The combination with other datasets is possible because although using hydraulic and remote sensing methods, the works from Wilcox and Cabré use a lot of field data (see figure 2 in Cabré as an example).
    Thank you very much. We are currently doing new work in the area related to landslides that incorporates the spatio-temporal nature of the problem, so the comparisons you make are sure to be very useful in that characterisation.
    Thank you for the attached figure of susceptibility map. I believe that authors may have come to a similar map by only doing a sort of slope thresholded map here. This makes me refer the authors to another work of myself (https://doi.org/10.1002/esp.4868) where we clearly showed (only 70kms south) that slope is not controlling rainfall-triggered hazards in this region. This might sound contra intuitive for unaware readers, but gentle surfaces are the ones that remarkably are more impacted during rainstorm events in this region. This can be explained because gentle slopes allow a great and better development of alluvial cover and thus make sediment available at any storm.
    It should be noted that slope is not being used in the model. Furthermore, our susceptibility model constructs a static view of the probability of occurrence of the phenomenon, which seeks to find the most likely "starting" point for a landslide event. All the literature shown as evidence characterises the flows as a whole, and are mostly associated with the temporal evolution of a detrital flow acquiring volume. This has nothing to do with the exercise we are carrying out, since in our study the points that are labelled correspond to the landslide scarp. Therefore, the pre-existing characterisations will never coincide with the ones we have made.
    On the other hand, regarding your assertion that the map produced with our methodology could be generated using a slope map with thresholds, we would like to respectfully disagree. As proof of this, we attach in PDF the slope map and another comparison map (using the same Jenks Breaks method), in which clearly different characteristics can be appreciated in each of them. In particular, the susceptibility map marks many low slope areas as susceptible.
    Many thanks again.
    References
    - Goetz, J. N., Brenning, A., Petschko, H., & Leopold, P. (2015). Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. _Computers & geosciences_, _81_, 1-11.
    - Kumar, C., Walton, G., Santi, P., & Luza, C. (2023). An Ensemble Approach of Feature Selection and Machine Learning Models for Regional Landslide Susceptibility Mapping in the Arid Mountainous Terrain of Southern Peru. _Remote Sensing_, _15_(5), 1376.
    
    - Nhu, V. H., Shirzadi, A., Shahabi, H., Chen, W., Clague, J. J., Geertsema, M., ... & Lee, S. (2020). Shallow landslide susceptibility mapping by random forest base classifier and its ensembles in a semi-arid region of Iran. _Forests_, _11_(4), 421.
    
    - Yaseen, A., Lu, J., & Chen, X. (2022). Flood susceptibility mapping in an arid region of Pakistan through ensemble machine learning model. _Stochastic Environmental Research and Risk Assessment_, _36_(10), 3041-3061.
    
    Citation: https://doi.org/10.5194/nhess-2023-72-AC6

Francisco Parra, Jaime González, Max Chacón, and Mauricio Marín

Viewed

Total article views: 1,673 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,034	576	63	1,673	49	79

HTML: 1,034
PDF: 576
XML: 63
Total: 1,673
BibTeX: 49
EndNote: 79

Views and downloads (calculated since 23 Jun 2023)

Month	HTML	PDF	XML	Total
Jun 2023	73	20	3	96
Jul 2023	150	49	19	218
Aug 2023	130	33	12	175
Sep 2023	34	22	0	56
Oct 2023	18	15	2	35
Nov 2023	14	15	0	29
Dec 2023	24	18	0	42
Jan 2024	28	22	1	51
Feb 2024	19	16	4	39
Mar 2024	35	13	1	49
Apr 2024	29	10	1	40
May 2024	36	12	6	54
Jun 2024	41	4	1	46
Jul 2024	43	5	0	48
Aug 2024	20	11	1	32
Sep 2024	22	13	0	35
Oct 2024	30	5	0	35
Nov 2024	50	9	0	59
Dec 2024	27	20	0	47
Jan 2025	18	20	1	39
Feb 2025	41	10	0	51
Mar 2025	27	6	5	38
Apr 2025	14	18	3	35
May 2025	34	32	2	68
Jun 2025	55	80	1	136
Jul 2025	18	76	0	94
Aug 2025	4	22	0	26

Cumulative views and downloads (calculated since 23 Jun 2023)

Month	HTML	PDF	XML	Total
Jun 2023	73	20	3	96
Jul 2023	150	49	19	218
Aug 2023	130	33	12	175
Sep 2023	34	22	0	56
Oct 2023	18	15	2	35
Nov 2023	14	15	0	29
Dec 2023	24	18	0	42
Jan 2024	28	22	1	51
Feb 2024	19	16	4	39
Mar 2024	35	13	1	49
Apr 2024	29	10	1	40
May 2024	36	12	6	54
Jun 2024	41	4	1	46
Jul 2024	43	5	0	48
Aug 2024	20	11	1	32
Sep 2024	22	13	0	35
Oct 2024	30	5	0	35
Nov 2024	50	9	0	59
Dec 2024	27	20	0	47
Jan 2025	18	20	1	39
Feb 2025	41	10	0	51
Mar 2025	27	6	5	38
Apr 2025	14	18	3	35
May 2025	34	32	2	68
Jun 2025	55	80	1	136
Jul 2025	18	76	0	94
Aug 2025	4	22	0	26

Viewed (geographical distribution)

Total article views: 1,613 (including HTML, PDF, and XML) Thereof 1,613 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 08 Aug 2025

Short summary

The objective of this work is to evaluate the susceptibility of mass removal in the province of Chañaral, Chile through the comparison of Machine Learning algorithms and the choice of factors. The results indicate that the most accurate algorithm in the study area corresponds to RF. On the other hand, and from 23 conditioning factors, 7 are chosen, which maximize the accuracy of the model. The results of this study are useful for the planning of relevant institutions.


Total:	0
HTML:	0
PDF:	0
XML:	0

Modeling and evaluation of the susceptibility to landslide events using machine learning algorithms in the province of Chañaral, Atacama region, Chile

Viewed

Viewed (geographical distribution)

Cited

3 citations as recorded by crossref.