the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Rainfall thresholds estimation for shallow landslides in Peru from gridded daily data
Carlos Millán-Arancibia
Waldo Lavado-Casimiro
Abstract. The objective of this work was to generate and evaluate regional rainfall thresholds obtained from a combination of high-resolution gridded precipitation data (PISCOpd_Op), developed by the National Service of Meteorology and Hydrology of Peru (SENAMHI), and information from observed shallow landslide events. The landslide data were associated with rainfall data, determining triggering and non-triggering rainfall events with rainfall properties from which rainfall thresholds were determined. The validation of the performance of the thresholds was carried out with events that occurred during 2020 and focused on evaluating the operability of these thresholds in landslide warning systems in Peru. Thresholds were determined for 11 rainfall regions. The method of determining the thresholds was based on an empirical–statistical approach, and the predictive performance of the thresholds was evaluated from the “true skill statistics” (TSS) and the area under the curve (AUC). The best predictive performance was obtained by the mean daily intensity-duration (Imean – D) threshold curve, followed by accumulated rainfall E. This work is the first attempt to estimate regional thresholds on a country scale in order to better understand landslides, and the results obtained reveal the potential of using thresholds in the monitoring and forecasting of shallow landslides caused by intense rainfall and in supporting the actions of disaster risk management.
Carlos Millán-Arancibia and Waldo Lavado-Casimiro
Status: closed
-
RC1: 'Comment on nhess-2022-199', Anonymous Referee #1, 04 Aug 2022
GENERAL COMMENTS
Millan-Arancibia and Lavado-Casimiro describe how they determine regional rainfall thresholds for landslides in Peru at the national scale. They use recently developed methods to objectively assess these thresholds, which makes the study interesting more from a technical than from a scientific perspective for those who aim at implementing early-warning systems. The study is a bit hard to read and seems unorganized in some parts. As a consequence, parts of the methods, results and conclusions were not clear to me. I have a few general comments and more specific ones below, which should be addressed before publication in NHESS.
- There are quite many specifications and clarifications needed in order to make the methods they used unambiguous and reproducible. This also resulted in quite a long list of specific comments below.
- Some paragraphs seem unnecessary wordy or seem like a random list of unrelated statements, which makes it difficult to follow. For example, in L. 177 “TSS is more objective than simple random estimate”, it could be explained what makes TSS objective (e.g. balancing TPR and FPR). Some of these arguments are in the text but unorganized and unclear. I think the authors will easily identify such paragraphs themselves when editing. See also comments below.
- I miss mainly two discussion points. One is the spatial variability of thresholds and the origin of this. Can it be explained with climatology/lithology or is it related to the quality of the data set? See also comments to Figure 7. The second point is related to how calibration/validation is performed, there is almost no discussion about that. I appreciate that this important step is taken and I understand that the dataset is new and short. However, I think it should be stated more clearly that a validation set of one year is quite short and there is a risk of overinterpreting. I suggest at least to discuss other possible validation techniques than splitting years, and flag that as a topic for future research.
- There are some results and conclusions that are not clear or surprising to me, which should be checked. For example, I would expect Imean-D and E-D thresholds to result in the same performance, but this is not the case here. See comments below.
SPECIFIC COMMENTS
L. 24: Citation needed for the original cause and the different processes leading to saturation
L. 27: (e.g. Prenner…)
L. 31: rainfall thresholds
L. 35: time
L. 31: The literature you cite only considers statistical methods. Berti et al. (2020) and Tang et al. (2019) are examples of thresholds based on physically-based modelling. Please also change “physical bases” to “physically-based models”
L. 37: in the way it’s written it makes one think that the difference between the global and national rainfall thresholds is that one is based on antecedent precip and the other on empirical-statistical approaches. Please rephrase. Also, if you use “antecedent”, does it have the same meaning as in L. 29? Antecedent conditions can refer to the conditions prior to the triggering rainfall or prior to the exact time of landslide occurrence. Please specify and use consistently.
L. 45: I think this section is to justify the methods used. Given the uncertainties in the rainfall product that you mention later in the ms one could ask why you’re not using physically-based modelling, which considers the actual mechanisms causing landslides, to back-calculate rainfall thresholds. Hence, I would also mention the challenges accompanied with such models: mainly the many high-quality input data such as soil information that is needed, which is associated with high uncertainties, too.
L. 56: maximum at what scale? Daily, annual?
L. 60: gridded
L. 80: Just out of curiosity. It’s funny enough that the precipitation dataset is named after Peru’s national liquor. Is PISCOpd_Op actually the abbreviation of something?
L. 84: Can you give some information on the number of rain gauges or the average distance? Maybe even add them to the map in Figure 2 if you have such a map.
L. 85: What do you mean by “multipliers that are based on monthly climatology”?
Table 1: I’m not sure this table is so important. To me, only the spatial resolution and the time period are of relevance. But why compare these two datasets if you only use one of them?
L. 92-93: these two sentences can be simplified, now it is confusing. So SLIP covers the period 2018-2020 but do you have greater certainty for 2019 and 2020?
L. 101: Figure 3
L. 88-101: I don’t understand how the two landslide databases were combined. The time periods do not overlap and Figure 3 only starts in 2019. If one event was excluded it should be 382 events in total. So which was your study period?
Figure 3: What is this rainfall? One grid cell? Which location are we looking at? And the colour is one rainfall event? Please specify in the caption and add labels a) and b= to subplots.
L. 103: Since you describe the sequence of your methods here, Figure 1 would fit here. And describe the steps in the text and refer to the figure.
L. 116: How can the PISCO report Pr>0 and the station Pr=0 if Pisco is interpolated from the stations?
L. 118: How were rainfall events defined? Are two events independent if they are separated by at least one non-rainy day?
L. 131: events
L. 134: I think that E-D and Imean-D should result in the same thresholds, only that b(E-D) = b(Imean-D)+1. That’s what I get when substituting Imean with E/D. So there is no point in comparing both thresholds. This said I’m surprised by the numbers in table 3. Either I’m misunderstanding something or something went wrong here. Please clarify.
L. 135: a and b are scale and shape parameters, but in the log-log space they become the intersection and slope of the linear threshold
Figure 4: These box plots are nice but it’s not clear from the text why you show them. Is it to show that the two can be separated well? Considering the methods you use, it would be nice to see some AUC curves instead which would also help you in explaining the methods
L. 146: Max precip at what time scale? And what is the motivation for using this for regionalization and one of the other indices?
L. 158: Please consider rewriting or reorganizing this section. The information to certain steps are spread across the entire section, for example, how the dataset was split into calibration and validation data sets.
L. 179-182: This will be confusing for many readers. You have two definitions for TSS, and two for sensitivity. Please be consistent and avoid introducing alternative definitions if they actually mean the same. Also, the TSS itself doesn’t seek to maximize TPR and 1-FPR, but you do so by choosing a threshold that maximizes TSS.
L. 185: Please be more specific. It’s not clear what you did using ROC, TPR, FPR. Which is the “most widely used technique”? Did you choose some variables with large AUC and dropped the others? If so, what was the threshold AUC. Or did you define thresholds by maximizing TSS? There are many possible
L. 192: It’s not clear to me how exactly the validation was performed. Was the performance of the validation data set calculated for the thresholds determined with the calibration data set or was a new threshold determined for the validation data set to see if the performance is similar?
L. 196: The values of 0.4 and 0.7 seem somewhat random. Could you elaborate a bit on the meaning of these values? Are these values commonly used or why is this classification needed?
L. 205-214: I’m surprised that Imean-D and E-D don’t have the same performances. See comment L. 134.
Table 2: “D (days)”. Is this the full data set, calibration or validation? How many events per region? The same for Table 3.
L. 270: do you mean first in Peru? Please specify.
L. 277: Table 3
L. 280: Yes, landslide detection is sacrificed but false alarms are reduced. There are various scores one could chose depending on if you want to give more weight to the detection or false alarms. But you chose TSS because it’s a good balance between the two.
L. 283: What is a high-impact stream?
L. 284: what do you mean by constant landslide occurrence?
L. 284: Imax-D-D?
L. 285: do you mean entire event?
L. 286: is the background condition scenario the antecedent condition scenario?
L. 286-290: I can’t follow. If you’re the validation results are better than the calibration, then maybe your validation set is too small. I don’t see how you can conclude the importance of antecedent conditions from this. Also, the sentence “in the validation stage…showed growth in calibration performance” is confusing
L. 296: The absence of extreme events does not imply poorer threshold performance. An option would be to do calibration/validation on more data splits.
L. 298: “the number of landslides was lower than in other years” but the only reliable year you can compare with is 2019, right?
L. 313: Again, you mean first in Peru, right? Please specify.
L. 315: Well, you cannot compute empirical-statistical thresholds without landslide observations so this is not really an advantage. An advantage is that you have used datasets available at the national scale to objectively determine and compare rainfall thresholds.
L. 318: it is still not entirely clear to me what process we are talking about. Here you say shallow landslide and earlier you mention streams and debris flow. Is it a mix of processes? Please add some information on this in the dataset description and clearly define what collection of processes you refer to when using “landslide” throughout the ms.
L. 324: More interesting would be why the performances can be so different. Can you say something about that?
L. 329: high sensitivity to what?
Figure 7: Is there a reason for showing sensitivity/specificity? Wouldn’t it be easier to interpret if you would just colour according to TSS?
This figure is very interesting and shows high spatial variability in the thresholds. Can you say something about this variability? E.g. is the threshold higher in wet regions? See e.g. Leonarduzzi et al. (2017) Figure 7 or Marc et al. (2019).Marc, O., Gosset, M., Saito, H., Uchida, T., Malet, J.P., 2019. Spatial Patterns of Storm-Induced Landslides and Their Relation to Rainfall Anomaly Maps. Geophys. Res. Lett. 167–177. https://doi.org/10.1029/2019GL083173
-
AC1: 'Reply on RC1', Carlos Millan, 14 Oct 2022
RESPONSE TO GENERAL COMMENTS
Millan-Arancibia and Lavado-Casimiro describe how they determine regional rainfall thresholds for landslides in Peru at the national scale. They use recently developed methods to objectively assess these thresholds, which makes the study interesting more from a technical than from a scientific perspective for those who aim at implementing early-warning systems. The study is a bit hard to read and seems unorganized in some parts. As a consequence, parts of the methods, results and conclusions were not clear to me. I have a few general comments and more specific ones below, which should be addressed before publication in NHESS.
Comment response: Thank you very much for your review, in the new version of the mn we have tried to make it not a bit difficult to read and also not seems unorganized, considering all your comments. Additionally, this document is highly important for the scientific community related to landslides in Peru since this type of work has not been developed in Peru, which, in addition, faces the limited availability of data compared to other countries. Lastly, other investigations also faced similar difficulties (e.g., Kirschbaum et al., 2015; Abraham et al., 2019).
- There are quite many specifications and clarifications needed in order to make the methods they used unambiguous and reproducible. This also resulted in quite a long list of specific comments below.
Comment response: Thanks for the comment. All your comments and the list of specific observations have been taken into account and included in the new version of the mn. - Some paragraphs seem unnecessary wordy or seem like a random list of unrelated statements, which makes it difficult to follow. For example, in L. 177 “TSS is more objective than simple random estimate”, it could be explained what makes TSS objective (e.g. balancing TPR and FPR). Some of these arguments are in the text but unorganized and unclear. I think the authors will easily identify such paragraphs themselves when editing. See also comments below.
Comment response: Thanks for the observation. All your comments and the list of specific observations have been taken into account and included in the new version of the mn. We have made an exhaustive revision of the mn and we have identified some paragraphs and we have organized them with greater clarity to avoid their difficult reading. - I miss mainly two discussion points. One is the spatial variability of thresholds and the origin of this. Can it be explained with climatology/lithology or is it related to the quality of the data set? See also comments to Figure 7. The second point is related to how calibration/validation is performed, there is almost no discussion about that. I appreciate that this important step is taken and I understand that the dataset is new and short. However, I think it should be stated more clearly that a validation set of one year is quite short and there is a risk of overinterpreting. I suggest at least to discuss other possible validation techniques than splitting years, and flag that as a topic for future research.
Comment response: Thanks for the observation. We have taken into account your observations and recommendations and have included them in the discussions of the new version of the mn. Regarding the first point of discussion:
“Regarding the variability of the thresholds, we can explain it mainly to the rainfall climatology in Peru. It can be seen that the magnitudes have a relationship with respect to the spatial distribution of rainfall in Peru, that is, low thresholds related to rainfall of lesser magnitude in the arid zones in the western part of Peru (Pacific region), thresholds intermediates related to the increase in the magnitude of rainfall in the middle part or mountainous region (Andes region) and the highest thresholds related to wet regions (Amazon region). However, the Andes 1, Andes 3 and Andes 6 regions do not have this relationship, so this discussion is not conclusive and is considered to be related to limited data, so it is suggested that this variability be discussed in future research that include more shallow landslides events data.”
Just to comment, that the lithology in Peru is still highly general and we hope in the future to do exercises with lithological data (e.g., soil tests) that we are developing at small basins level.
About the second point, regarding calibration/validation we have added your observation and we have discussed about it, as you can see below:
“The calibration/validation methodology, based on take one year of observations for validation set, which was used in other research works (e.g., Dikshit et al., 2019; Kirschbaum et al., 2015), is quite short and there is the risk of overinterpretation. It is therefore highly recommended for future research to expand the dataset and explore other calibration/validation methods, for example, a random selection of the differentiated data set for the calibration and validation (e.g., 70% for calibration and 30% for validation) (Brunetti et al., 2021; Gariano et al., 2020).”
In addition, in our future research we hope to advance in these limitations in Peru, for example, our perspective is to expand the database, for which we are working with INDECI (entity in charge of the attention of the population when landslides occur) for future studies that include greater data extension. - There are some results and conclusions that are not clear or surprising to me, which should be checked. For example, I would expect Imean-D and E-D thresholds to result in the same performance, but this is not the case here. See comments below.
Comment response: Thanks for observation. We have taken into account your comment. For better understand, according to the way we have defined the variables for a dataset, Imean, that is affected by D, does not have the same distribution as E. For example, two events with the same E (e.g. E=10), can have different D (e.g. D equal to 2 and 4 days), therefore, the Imean of both resulting events are different (Imean equal to 5 and 2.5 respectively), so the threshold could not be defined as the division of both. A more specific example for a example dataset is shown in the specific comments below.
RESPONSE TO SPECIFIC COMMENTS
L. 24: Citation needed for the original cause and the different processes leading to saturation
Comment response: Thanks for the observation. The citation is: Lynn Highland. 2006. Landslide Types and Processes. USGS Fact Sheet 2004–3072. But it was removed for better understand according the general comments.L. 27: (e.g. Prenner…)
Comment response: Thanks for the observation. It was edited in the new version of the mn.L. 31: rainfall thresholds
Comment response: Thanks for the observation. It was edited.L. 35: time
Comment response: Thanks for the observation. It was edited.L. 31: The literature you cite only considers statistical methods. Berti et al. (2020) and Tang et al. (2019) are examples of thresholds based on physically-based modelling. Please also change “physical bases” to “physically-based models”
Comment response: Thanks for the observation. It was added the citation examples and edited “physical bases” to “physically-based models”. Additionally, we have recently instrumented some basins to collect more accurate data for future research, where we could explore physically-based models.L. 37: in the way it’s written it makes one think that the difference between the global and national rainfall thresholds is that one is based on antecedent precip and the other on empirical-statistical approaches. Please rephrase. Also, if you use “antecedent”, does it have the same meaning as in L. 29? Antecedent conditions can refer to the conditions prior to the triggering rainfall or prior to the exact time of landslide occurrence. Please specify and use consistently.
Comment response: Thanks for the observation. The text has been rephrased in order to clarify the main idea, as you can see below:
“For example, there is been developed empirical–statistical approach to the estimation of global thresholds (Caine, 1980; Guzzetti et al., 2008; Kirschbaum and Stanley, 2018), and national thresholds (Leonarduzzi et al., 2017; Peruccacci et al., 2017a; Uwihirwe et al., 2020).”L. 45: I think this section is to justify the methods used. Given the uncertainties in the rainfall product that you mention later in the ms one could ask why you’re not using physically-based modelling, which considers the actual mechanisms causing landslides, to back-calculate rainfall thresholds. Hence, I would also mention the challenges accompanied with such models: mainly the many high-quality input data such as soil information that is needed, which is associated with high uncertainties, too.
Comment response: Thanks for the observation. It was edited, as you can see below:
“This empirical approach is widely applied because its analysis and implementation do not require the constant monitoring of the other physical variables on which other types of most robust models are based (e.g., physically-based models), and this drawback of the robust models is the main advantage of empirical approaches and its applicability over large areas (Rosi et al., 2012). Another advantage for its application is that it is not subject to the challenges accompanied with other models, mainly the many high-quality input data, such as soil information that is needed, which is associated with high uncertainties too.”
To comment, we are recently developing studies on a local scale with less uncertainties that we will use to define rainfall thresholds at local scale (Asencios Astorayme, 2020a, b). https://repositorio.senamhi.gob.pe/handle/20.500.12542/478 https://repositorio.senamhi.gob.pe/handle/20.500.12542/476L. 56: maximum at what scale? Daily, annual?
Comment response: Thanks for the observation. It´s daily scale. It was edited.L. 60: gridded
Comment response: Thanks for the observation. It was edited.L. 80: Just out of curiosity. It’s funny enough that the precipitation dataset is named after Peru’s national liquor. Is PISCOpd_Op actually the abbreviation of something?
Comment response: Thanks for the observation. Yeah, the name helped us a lot as a hydrometeorological service to be able to spread the information in a fun way. The PISCO is derived from: Peruvian Interpolated data of the SENAMHI’s Climatological and Hydrological Observations. PISCO is a base name of different products of SENAMHI, i.e., PISCOpd_Op is derived from PISCO Precipitation-Daily-Operative Gridded data. It was edited for better understanding, as you can see below.L. 84: Can you give some information on the number of rain gauges or the average distance? Maybe even add them to the map in Figure 2 if you have such a map.
Comment response: Thanks for the observation. For the PISCOpd_Op purpose, we use 416 rain gauges and them were added to Fig 1 (before Fig 2).L. 85: What do you mean by “multipliers that are based on monthly climatology”?
Comment response: Thanks for the comment. These multipliers are the ratio between the value of the monthly background grid at location x (extracted from PISCOp monthly climatology) and the value of the monthly back- ground grid at the gauge location for every gauge (derived from rain gauges) to create a set of multipliers from the gauges to the given grid cell. For more information about genre Interpolation Method is shown in: van Osnabrugge, B., Weerts, A. H., & Uijlenhoet, R. (2017). genRE: A method to extend gridded precipitation climatology data sets in near real-time for hydrological forecasting purposes. Water Resources Research, 53, 9284– 9303. https://doi.org/10.1002/ 2017WR021201.Table 1: I’m not sure this table is so important. To me, only the spatial resolution and the time period are of relevance. But why compare these two datasets if you only use one of them?
Comment response: In consideration of the observation, we decided to remove the table and show only the relevant information (i.e., spatial resolution and the time resolution).L. 92-93: these two sentences can be simplified, now it is confusing. So SLIP covers the period 2018-2020 but do you have greater certainty for 2019 and 2020?
Comment response: Thanks for the observation. The SLIP covers the period 2014-2020, it was corrected, and we have more certainty from 2019-2020 just because we were more data and number of events these last years. It was edited, as you can see below.
SLIP was implemented in January 2019 and has 330 records from the 2014–2020 period. Therefore, there is a greater degree of certainty regarding the number of events recorded in recent years.L. 101: Figure 3
Comment response: Thanks for the observation. It was edited.L. 88-101: I don’t understand how the two landslide databases were combined. The time periods do not overlap and Figure 3 only starts in 2019. If one event was excluded it should be 382 events in total. So which was your study period?
Comment response: Thanks for the observation. According previous comment, the period was 2007-2020. The number of events was edited. The figure it’s just an extracted period for show how we define an event.Figure 3: What is this rainfall? One grid cell? Which location are we looking at? And the colour is one rainfall event? Please specify in the caption and add labels a) and b= to subplots.
Comment response: Thanks for the observation, we have taken into account your comment and the figure has been modified. It’s a daily rainfall data for one basin (from GEOGloWS discretization, fig1) where occurred a landslides event. The purpose of the figure was to show how its defined rainfall events (each color it´s a rainfall event). The figure it’s just an extracted period for show how we define an event. It was edited, as you can see below. - There are quite many specifications and clarifications needed in order to make the methods they used unambiguous and reproducible. This also resulted in quite a long list of specific comments below.