I have reviewed the manuscript entitled “Brief communication: Threshold and probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves” as an additional reviewer for this round. This paper examines the conceptual differences between the ID thresholds and IDF curves in relation to debris-flow and landslide triggering. In Sections 1–4, the authors clearly and concisely describe the essential issues regarding rainfall data processing for ID thresholds and IDF curves. In Section 5, based on a well-constrained debris-flow dataset, the authors convincingly demonstrate how these processing differences can affect our understanding of rainfall conditions that trigger debris flows. The results are robust and highly meaningful, and will likely stimulate further analyses of rainfall-induced landslide and debris-flow initiation processes. Although the presented framework requires independent IDF curves, it is widely applicable across diverse environmental settings and effectively removes the dependence on local conditions and subjective interpretation. Overall, this paper represents a necessary and timely contribution to our community and could serve as a benchmark study. It is a promising and well-written manuscript that I believe will reach a wide audience. I have a few comments and suggestions that may help improve the clarity and overall flow of the paper. I hope my feedback will assist the authors in further refining this excellent work.
Sincerely,
Haruka Tsunetaka
1. Differences between debris flows and landslides
My first concern relates to the differences between debris flows and landslides in the triggering mechanisms implicitly evaluated by ID thresholds. As described in L23-24, ID thresholds for landslide triggering commonly evaluate whether slope is activated “trigger” and “cause” prepared by rainfall input (Bogaad and Greco, 2018). However, some researchers argue that ID thresholds for debris-flow triggering reflect various processes, such as changes in sediment availability (e.g., Pastorello et al., 2018; Tsunetaka et al., 2021a), whether debris flows reach to the monitoring station (e.g., Bel et al., 2017), and changes in sediment composition (e.g., Guo et al., 2016; Tsunetaka et al., 2021b). These differences, which ID pairs may evaluate different initiation mechanisms between landslides and debris flows, should be well considered through the manuscript.
In my view, these differences relate to only interpretation of real world example (i.e., Section 5). Thus, my recommendation is that, either deleting or moving the related sentence (L23-24) and paragraph (L78-87) to Section 5, or providing a more precise explanation of the above differences. By doing so, the explanations provided in Sections 1–4 would more clearly highlight that they describe generalized issues concerning the ID–IDF relationship, which apply universally, regardless of whether the triggering process regarding debris flows or landslides.
2. Difference in the definition of W* between Sections 4 and 5
In the previous review round, both reviewers raised several concerns regarding the analysis presented in Section 5. In my view, these concerns mainly stemmed from the difference in the definition of W* between Sections 4 (L72: true, unknown triggering interval) and 5 (L105: time interval during which the most severe intensity was observed). In the revised manuscript, the authors have addressed this issue by clearly describing how W* is defined. However, I am still concerned that this difference may cause readers to misunderstand how the authors distinguish between theoretical facts and their interpretations throughout the paper. Indeed, it appears that the authors themselves may still be somewhat uncertain about this distinction (L139–140). It might be clearer to readers if, in Section 5, all results were consistently described in terms of W corresponding to max(Tw), with the subsequent discussion and interpretation developed under the assumption that such W is approximately equivalent to W*.
Some readers may also wonder why the authors did not apply the same framework to a broader dataset that includes landslides or other regions. However, I recognize that the authors have used a well-constrained, high-quality debris-flow dataset, and that extending the same analysis to other phenomena or regions would be extremely challenging due to the inherent difficulties in identifying a sufficient number of ID pairs and determining reliable W* values. Therefore, I consider this dataset and the associated analysis to be particularly valuable. That said, these practical and methodological challenges may not be readily apparent to some readers, so providing a brief clarification in the text could further help convey the value and uniqueness of this dataset.
For landslides, it is generally impossible to predict where they will occur in advance or to identify the exact time of initiation. The recurrence interval of landslides in a given region typically spans several decades to centuries, making it difficult to obtain a sufficient number of ID pairs at the regional scale. Consequently, most ID thresholds for landslides have been derived at the national scale. Preparing a landslide dataset suitable for an analysis such as that presented in Section 5 is therefore extremely difficult.
For debris flows, regional ID thresholds are often derived from ID pairs based on observed occurrences within the same or nearby catchments. However, in many cases, “occurrence” refers to the arrival of debris flow at the observation point rather than the initiation of motion. As mentioned in Comment 1 and the references therein, this implies that the threshold inherently includes the processes of debris-flow development and runout, which depend not only on rainfall but also on sediment availability, distribution, and composition. Hence, the strict identification of W* is practically difficult.
The paragraph in Section 4 (L78-87) already mentions, at least in part, that W* is practically indeterminable. As mentioned earlier, I suggest moving that paragraph to Section 5 and expanding on it there. My recommendation is to strengthen the explanation of the practical difficulties in determining W* and in obtaining numerous ID pairs from real-world data. The authors could clarify that the analysis in Section 5 essentially deals with another metric (such as W corresponding to max(Tw)) but that, in this study area and dataset, assuming W = W* is reasonably valid, citing adequate references to support this rationale. This approach would clearly separate Sections 1–4 as describing theoretical principles and Section 5 as demonstrating an empirical application and interpretation based on real data.
I also agree with the discussion in L123-134. However, as noted in Comment 1, it is important to emphasize that the key finding, that W* ranges from 30 minutes to 6 hours, was derived specifically for debris flows. Whether landslides exhibit a similar pattern remains unknown. If the debris flows analyzed here are sourced primarily from channel-bed sediment, this relatively broad time interval may reflect temporal variations in sediment availability within the catchment (e.g., Tsunetaka et al., 2021a).
3. Scaling limitation of IDF curves
The authors, for convenience, have estimated the return periods of very short-duration rainfall (less than 15 minutes) based on existing IDF curves. I am concerned that the validated lower limit of the existing IDF scaling (Borga et al., 2005) may be around 15-minute rainfall durations. The estimation of return periods for such short-duration rainfall involves high uncertainty. In fact, in Figure 1b, there appear to be at least two unrealistic data points plotted at less than 1 hour on the x-axis and around 200 mm h⁻¹ on the y-axis. Although I understand that there is currently no practical alternative approach, a brief mention of this limitation in the main text would further improve the clarity of the manuscript.
I also believe that this concern is independently addressed by the results presented in Figure 3, which show the decorrelation time of rainfall. I was quite impressed by how closely this figure conceptually aligns with Figure 2. If space permits, adding a more detailed explanation and discussion of Figure 3 would make the manuscript even more refined and insightful.
4. Comparison of slopes regarding ID, IW*, and IDF
The discussion comparing the slopes of ID, IW*, and IDF relationships may need to be moderated, as the current analysis does not provide sufficient evidence to draw a definitive conclusion. Because the D values of the ID pairs are relatively large, the data points in this dataset appear only within the range of approximately 1 to 48 hours on the x-axis in Figure 1b. The scaling for durations shorter than 1 hour is essentially extrapolated.
Considering this, when focusing on the range between 1 and 48 hours in Figure 1b, the differences in slope among the ID threshold, IW* threshold, and IDF scaling appear to be nearly equivalent. Therefore, the overall difference in slope might simply reflect the data limitation that there are few ID pairs with small D values. For landslides, triggering rainfall events with D < 1 hour are extremely rare. However, for debris flows, such short-duration triggering events have been reported in various catchments (e.g., Abancó et al., 2016; Bel et al., 2017; Tsunetaka et al., 2021a).
Line by line comments
Title: Since the case study focuses specifically on debris flows, it might be helpful to include the term “debris flow” in the title to clearly indicate the study target.
L27-29: Readers who are less familiar with rainfall thresholds may wonder why the parameter E is sometimes used. A brief explanation of its meaning and rationale, supported by an appropriate reference, would help clarify this point.
L50: IDF are -> IDF curves are
L61: an user defined -> a user defined
L78-79: Please consider softening the tone of the explanation slightly to make it more balanced and accessible to a broader readership.
L89-90: It would be helpful to clarify whether these debris flows were initiated by landslides or if they mainly resulted from bulking and entrainment of unconsolidated channel-bed material. A brief explanation would improve the reader’s understanding.
L98: It is not entirely clear whether these represent the triggering locations or the observation locations. Please clarify how the triggering location was defined in this dataset.
Figure 1: To further aid interpretation, you might consider adding summary scatter plots for all 133 events alongside Figure 1a: specifically, W∗ vs. D and I∗ vs. I. Including such plots could make the relationships easier to grasp, particularly for readers who are less familiar with rainfall threshold analyses.
References
Abancó, C., Hürlimann, M., Moya, J., & Berenguer, M. (2016). Critical rainfall conditions for the initiation of torrential flows. Results from the Rebaixader catchment (Central Pyrenees). Journal of hydrology, 541, 218-229.
Bel, C., Liébault, F., Navratil, O., Eckert, N., Bellot, H., Fontaine, F., & Laigle, D. (2017). Rainfall control of debris-flow triggering in the Réal Torrent, Southern French Prealps. Geomorphology, 291, 17-32.
Guo, X., Cui, P., Li, Y., Fan, J., Yan, Y., & Ge, Y. (2016). Temporal differentiation of rainfall thresholds for debris flows in Wenchuan earthquake-affected areas. Environmental Earth Sciences, 75(2), 109.
Pastorello, R., Hürlimann, M., & D’Agostino, V. (2018). Correlation between the rainfall, sediment recharge, and triggering of torrential flows in the Rebaixader catchment (Pyrenees, Spain). Landslides, 15(10), 1921-1934.
Tsunetaka, H., Hotta, N., Imaizumi, F., Hayakawa, Y. S., & Masui, T. (2021a). Variation in rainfall patterns triggering debris flow in the initiation zone of the Ichino-sawa torrent, Ohya landslide, Japan. Geomorphology, 375, 107529.
Tsunetaka, H., Shinohara, Y., Hotta, N., Gomez, C., & Sakai, Y. (2021b). Multi‐decadal changes in the relationships between rainfall characteristics and debris‐flow occurrences in response to gully evolution after the 1990–1995 Mount Unzen eruptions. Earth Surface Processes and Landforms, 46(11), 2141-2162. |
This brief communication is very… brief! I mean, in the positive sense of the word. Indeed, it is a clear, concise manuscript that is perfectly written in fluent English - something very rare for a reviewer to find. I thank the authors for that! The paper gets straight to the point: landslide-triggering intensity-duration thresholds and precipitation intensity-duration-frequency curves cannot be confounded, compared, or plotted together. Neither one can be used to quantify the return time of the other.
Frankly, having worked on rainfall analysis and landslide prediction for years, the idea of mixing/comparing ID thresholds and IDF curves is something that never came to my mind. In the few cases I have seen in the extensive literature on these topics, it has always seemed very strange, not to say a downright methodological error. So, I can say that I certainly agree with the authors of this paper, although I do not think the article addresses a relevant scientific and/or technical question. I simply think that mixing ID thresholds and IDF curves is a misconception that does not even require discussion.
The authors list the differences between ID thresholds and IDF curves, focusing on the different durations (D and W) considered by the two tools, and then analysing the differences in terms of return time referring to these durations. In my opinion, they forgot the main and most important difference. That is: since their definition from pioneering works (Nel Caine and also previous pioneers), ID thresholds have been defined considering ID pairs that are somehow - arbitrarily or not, subjectively or not - linked to the initiation or re-activation of one or more landslides. On the other hand, IDF curves are defined considering IW (using the same terminology as the authors) pairs that are not linked to landslide/debris flow occurrence, referring only to rainfall itself. Indeed, the authors write “IDF are obtained by collecting the highest rainfall intensities observed any year over the time windows of interest” (lines 45-46). Therefore, the two tools summarise or describe different variables (the ID pairs by which the thresholds are defined are different by definition from the IW pairs with pre-fixed durations of the IDF curves, having different characteristics consequently) and different processes (landslide or debris flow initiation and rainfall severity). This is, in my opinion, the main reason why the two tools must not be compared or mixed. I wouldn't have added anything else to this discussion
However, the authors added more to the discussion, deserving attention. I list below some other comments on this paper.
First, I don’t understand the first part of the title “Threshold not probability”. Actually, thresholds can be probabilistic. As a matter of fact, the Bayesian thresholds mentioned by the authors are probabilistic. Moreover, the frequentist thresholds also mentioned by the authors allow defining probabilistic diagrams to be used for early warning purposes. Therefore, I would remove this part of the title, which works only for deterministic, binary thresholds.
In several parts of the text, the authors write that quantifying the return period of a given intensity used to define ID thresholds using probabilities estimated from the IW space is erroneous and causes an underestimation of the severity of the triggering rainfall. I agree with the authors, totally. However, I’d suggest mentioning some works in which this erroneous approach was adopted, also because these are cited again in the last sentence of the paper (“Some results in the literature may thus be quantitatively inexact”). Moreover, I would add that the return period of a given ID thresholds should not be calculated at all. Indeed, rather than adopting dichotomous approaches (above/below threshold), using statistical and probabilistic approaches, as the two mentioned above, allows the probabilistic characterisation of the thresholds without introducing (erroneously) the concept of return time, which is also highly questionable for a variable not easily measurable as landslide or debris flow occurrence/triggering. In addition, as the authors certainly know, the concept of return time and how it changes in relation to non-stationarity is a topic of discussion in the scientific community.
Moving to sections 2 and 3, the differences between ID thresholds and IDF curves are listed, focusing in particular on the different ways to define the duration of the ID/IW pairs.
According to the authors’ view, the durations D are user- (or arbitrary-) defined while the durations W are not. But, actually, W are also user- (or arbitrary-) defined using running windows of x minutes or hours: 5, 10, … 45 minutes or 1, 2, … 48 hours were also defined by a user. Moreover, the authors didn’t mention that IDF curves can be defined using the partial duration series approach as well, so introducing another point of discussion.
In section 2 (lines 29-32) the authors write “rainfall records are often not available at hourly resolutions nor in close range of the landslide (Marra et al., 2016; Marra, 2019), which makes the events separation dependent also on these aspects.”. Actually, this issue affects the definitions of W too. Indeed, if only daily measurements are available in a given area, sub-daily values of W (e.g. the classical 1, 3, 6, 12, 24 hours) can’t be defined, and the IDF curves cannot be drawn for sub-daily durations.
In section 4 (lines 62-63), the authors write “In a univariate framework, the return period T* of a rainfall event can reasonably be defined as the maximum among the return periods TW associated with all possible temporal scales”. I think that some examples should be provided to support this statement.
Moving to section 5, I have some comments regarding the dataset used. First, it should be noted (and somewhere acknowledged by the authors) that the dataset is quite dated, having been collected over ten years ago. Second, spatial and temporal information of the debris flow records is missing. In particular, authors should specify whether the time of occurrence is known for the debris flows included in the dataset used. This is extremely important information for a dataset to be used for the definition of rainfall thresholds. Moreover, it is relevant for another issue that I write further on in my comment. Third, it is not described how the triggering precipitation events used to draw the thresholds were defined. This is also very relevant, given the comparison with IDF curves done in the paper.
Further on in section 5, the authors describe the procedure used to calculate W* (lines 88-92). It should be acknowledged that the outcomes of this procedure are not related to debris flow triggering. Indeed, the fact that they have the highest return time among all IW pairs does not mean that they triggered debris flow. It would be useful to know when these IW* pairs occurred within the whole event duration, in order to establish whether they are relevant to the triggering of debris flows or not. If the IW* pairs occurred many hours (or days) before the occurrence of the debris flows, it cannot be said that they were certainly relevant to the initiation; at least, not more important than the entire event. This is the reason why knowing the exact time of occurrence of the debris flows is essential to prove that “what is really important for triggering are the rain intensities over time scales that can be much shorter than the total length of the identified rainfall event in combination with the hydrological antecedent conditions”. In my opinion, selecting IW* pairs using the maximum return time as the only constraint is not sufficient to prove this hypothesis, and adds subjectivity in the process.
Then (lines 98-99), the authors write that “IW* pairs are associated with temporal scales W* that are always smaller than the duration D of ID pairs. In addition, by design, the corresponding intensities are systematically higher”. This is tautological and led to what is written in lines 109-111 (i.e., the underestimation of the return times of the whole events compared to the IW* pairs). Again, having a lower return time does not imply that an ID pair is less severe in terms of landslide/debris flow triggering. This is another point to be added in the conceptual difference between ID thresholds and IDF curves.
Moreover, the authors assumed that ID thresholds are always defined considering D as the whole duration of the rainfall events. This is not always true. There are several examples in the literature in which sub-events are distinguished (automatically or not) within the entire rainfall events and used to define rainfall thresholds. This can be considered a solution to the issues about durations being too long. I’d suggest mentioning it in the discussion.
Before moving to the conclusions, two comments on Figs. 2 and 3. In Fig. 2, the (a) and (b) labels are missing. Fig. 3 and its description are not very clear; a better description a more discussion are needed.
Going to the conclusions of the work, I totally agree that the calculation of return times of triggering conditions should be avoided, for several reasons including the ones described by the authors. However, the main motivation should be that it’s better to use statistical/probabilistic approaches to define rainfall thresholds rather than calculating return times of the triggering conditions. Moreover, the underestimation of the return periods should be better evaluated considering the time of occurrence of the IW pairs and landslides/debris flows.
Overall, I think that the main message of the work is clear and shareable. However, I believe that the conclusions would need results based on an accurate dataset and improved methodology. In my opinion, more temporal details on the dataset are needed, in order to allow the most important methodological improvement needed in the work: that is, find the time of occurrence of the IW* pairs and their temporal distance from the debris flow occurrences. Only in this way will the conclusions be adequately justified by the data and results.
So, my suggestion is that the work needs major revisions before being reconsidered for publication. The revised version of the paper should include an analysis of the temporal instants of the IW* pairs, so as to say with certainty that they can be considered the cause for debris-flow-triggering. This may be done using information from the proposed dataset (if any) or using other datasets. Moreover, I’d kindly suggest taking into consideration all my comments regarding theoretical and methodological aspects of the work.