the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Introducing SlideforMAP: a probabilistic finite slope approach for modelling shallowlandslide probability in forested situations
Feiko Bernard Zadelhoff
Adel Albaba
Denis Cohen
Chris Phillips
Bettina Schaefli
Luuk Dorren
Massimiliano Schwarz
Download
 Final revised paper (published on 15 Aug 2022)
 Supplement to the final revised paper
 Preprint (discussion started on 25 May 2021)
 Supplement to the preprint
Interactive discussion
Status: closed

RC1: 'Comment on nhess2021140', Anonymous Referee #1, 29 Jun 2021
GENERAL COMMENT
Dear Editor, Dear Authors
I reviewed with interest this manuscript for possible publication in NEHSS journal. The work describes a comprehensive modeling tool to assess shallow landslides initiated by rainfall, in a probabilistic framework. The manuscript provides an interesting contribution in this field, although some aspects are strongly simplified, in contrast with others. The scientific quality is good, the reading is agile although the manuscript is overall a bit long and often dispersive. The literature review can be improved with additional appropriate references of strictly related works. The description of the climate forcing that initiates (or not) the landslide events requires significant improvement.
To my opinion, the work can be published after some important clarifications and revisions.
SPECIFIC COMMENTS
Please read in the following my specific observations.
1. Literature review (introduction/discussion).
The discussion on the impacts and costs of the natural hazards, from the point of view of insurance institutes, is interesting. However, in general, I found the introduction a bit dispersive, lacking in some aspects. The work of Dietrich and Montgomery, 1994, (SHALSTAB) represents the pioneering work within this approach, and it has been followed by many other deterministic work that gave different contributions in improving the hydrological modeling at support for the shallow landslide, such as the cited Iverson 2000, and, additionally, Rosso et al., 2006; Claessens et al., 2007, Arnone et al., 2011; Lepore et al., 2013; Simoni et al., 2008, Baum et al., 2002 (TRIGRS), Montrasio et al., (2011) (SLIP) (among the others).
With regard to the effect of vegetation, the aspects related to the hydrological effects should be at least discussed, which can sometime be even more significant than the mechanical ones (Feng et al., 2020).
An interesting review are by Chae et al., 2017, Gasser et al., 2019 and the just published by Masi et al., 2021. (SEE REFERENCES LIST)
2. Definition of the stability problem.
I found the definition of the problem of stability estimation (section 2.2, Figure 2) a bit misleading. It is not clear the definition of the volume of soil to which forces are applied. In the method of the limit equilibrium, under the hypothesis that the width of the landslide is sufficiently large so that the deformations are in the plane parallel to the soil thickness H_{soil} (i.e. perpendicular to the elliptic landslide in figure 2), forces are assessed by considering a ‘slice’ of soil with unit width (in the direction parallel to the elliptic landslide plane). Figure 2 is confusing and the planes of forces are not well drawn. The limit equilibrium method (and infinite slope model) is based on the hypothesis of large and elongated element with respect to the soil thickness, so that a unit in width element can be considered. Also, P_{water} is not indicated in the Figure 2. According to the definition in the manuscript, R_{lat} and F_{res} apply on different planes.
I suggest to modify in a 3D perspective the Figure 2 and specify the hypothesis/assumptions.
3. Hydrology and precipitation.
Here is my main comment. The proposed modeling framework addresses shallow landslides that are initiated by rainfall, which is the triggering factor. The approach used (based on TOPMODEL) is extremely simplified because based on steady state conditions, which do not take into account the transient of the hydrological processes (Chae et al., 2017). The authors declare the limitation of the approach used in the discussion section, but this should be clearly stated soon in the methodology. As correctly written by the author, the stationarity is supposed to be reached within the hour of timestep. Clearly, this cannot be largely verified.
That said I arise two more critical issues that are not mentioned by the authors:
 under unsaturated conditions, soil (especially fine and clayey soils) exerts a strong water uptake effect due to suction, which leads to an apparent ‘hydrological’ cohesion. This represent a further limitation of the Montgomery and Dietrich approach that the authors should mention (see, works mentioned in Chae et al., 2017, e.g. Lepore et al., 2013).
 In the description of the model application (section 3.4.2) it is not clear how rainfall initiating events are selected. If I understood well, only events of 1hour duration are selected, whose intensity is identified from the DepthDurationFrequency (DDF) curve at different return periods (i.e. from 10 to 100 years). Therefore, I guess 10 events of 1 hours are simulated. Is that correct? If so, it should be explained and justified the reason of analyzing events of only 1 hours, which cannot be ‘critical’ for landslide initiation. Authors should deeply clarify this part in the manuscript, explain the methodology used to define the events, and report the parameters of the DDF curves.
4. Data inventory
The proposed methodology used to characterize the hypothetical landslides (extent) is strictly dependent on the data inventory (section 2.3), as also stated somewhere by the authors. However, it is important that the observed landslides used to characterize the model are of the same type, according to the hypothesis of the stability model used and all triggered by rainfall. Is it so? Please specify.
5. Calibration/sensitivity analysis
With regard to the best set of parameters, my question is: are the found parameters consistent and realistic? For example, I argue the choice of including the precipitation intensity as calibration parameter. As discussed in the previous comment, rainfall represents the triggering forcing and it is a dynamic variable. Ideally, we should know the precipitation intensity associated to each observed landslide. Otherwise, if used as parameter, it seems that the model is tuned ad hoc just to reproduce the past events. If so, which could be its utility?
Additionally, it would be interesting to see the AUC curves for the calibrated and the best model combinations. The shape of the curve also tells about the model performance.
Then, to my opinion, sensitivity analysis should go before the model calibration. Normally, calibration is done on parameters that are more sensitive. I understand figure 7 and 8, but not sure this is the most efficient way to verify the sensitivity of the parameters. I am curious to see how, for example, the landslide probability varies with the chance of parameters values. This test could be shown with the least and most sensitive parameters.
6. Results
The result of high m_f and low m_c is quite obvious; as the author clearly say in the discussion, and as found by other past works, in the end only few parameters really affect the process: the geometry of the slope (i.e. the soil thickness), the mechanical properties (i.e. friction angle) and the characteristic of the trigger (i.e. precipitation) whose effects are controlled by the soil transmissivity.
With regard to the vegetation: different vegetation scenarios are analyzed (and this is fine). which is the real configuration? Which is the ultimate target of the simulations?
7. General
I suggest to clearly state which is the ultimate main target of the model. Can we use it as forecast tool in an early warning system? If so, in which way? My impression is that it is too constrained to the calibration parameters, which, in some cases, may lose their physical meaning.
TECHNICAL CORRECTIONS
Please see in the following technical observations:
 Abstract: I strongly recommend to reduce the abstract to make it more concise.
 L3: I do not completely agree with this sentence given that there are of works that take into account the effect of vegetation, although from different perspective such as the hydrological one, together with the mechanical one. Please remove this sentence from the abstract, where you do not have room to discuss.
 L72L80 I suggest to synthesize.
 Figure 1: it is useful and appropriate. However, consider to improve it to make it clearer. Not clear from where to start. “extract mean value for each landslide”: do you mean hypothetical landslide? Emphasize the ‘append’ box where everything converges. Avoid text outside from the box. Also, I suggest to use the symbol used in the section (instead of the description). For example: definition of rho_ls; it would improve the correspondence with, for example, section 2.3.
 Line 417: Not clear which single rainfall event you refer to. I understand that the database include landslide triggered by different storms across the years.
 Lines 378379: it is not really clear the procedure. Please try to write more clearly.
 Lines 385386: I understand the reference, but please give an explanation also here, based on your results.
 Section 2.7: how did you define the threshold from daily to hourly??
REFERENCES
Baum, R.L., Savage, W.Z., and Godt, J.W., 2002, TRIGRS – a fortran program for transient rainfall infiltration and gridbased regional slopestability analysis. U.S. Geological Survey OpenFile Report 020424. http://pubs.usgs.gov/of/2002/ofr02424
Arnone E., D. Caracciolo, L. V. Noto, F. Preti, and R. L. Bras (2016) Modeling the hydrological and mechanical effect of roots on shallow landslides. Water Resources Research, 52 (11), 85908612
Chae, BG., Park, HJ., Catani, F. et al. Landslide prediction, monitoring and early warning: a concise review of stateoftheart. Geosci J 21, 1033–1070 (2017). https://doi.org/10.1007/s1230301700344
Claessens, L., Schoorl, J. M., and Veldkamp, A.: Modelling the location of shallow landslides and their effects on landscape dynamics in large watersheds: An application for Northern New Zealand, Most, 87, 16–27, 2007.
Feng, H.W. Liu, C.W.W. Ng (2020). Analytical analysis of the mechanical and hydrological effects of vegetation on shallow slope stability. Computers and Geotechnics 118, February 2020, 103335
Gasser, M Schwarz, A Simon, P Perona, C Phillips, J Hübl, L Dorren. 2019. A review of modeling the effects of vegetation on large wood recruitment processes in mountain catchments. EarthScience Reviews, 194, July 2019, Pages 350373.
GonzalezOllauri, C Hudek, SB Mickovski, D Viglietti, N. Ceretto, M. Freppaz (2021). Describing the vertical root distribution of alpine plants with simple climate, soil, and plant attributes Catena, 203, 2021, 105305
Lepore, C., Arnone, E., Noto, L.V., Sivandran, G., Bras, R.L., 2013. Physically based modeling of rainfalltriggered landslides: a case study in the Luquillo forest, Puerto Rico. Hydrol. Earth Syst. Sci. 17, 3371–3387.
Masi, Segoni and Tofani (2021). Root Reinforcement in Slope Stability Models: A Review. Geosciences (Switzerland) 11(5):212. DOI: 10.3390/geosciences11050212
Montrasio, L., Valentino, R., and Losi, G.L., 2011, Towards a realtime susceptibility assessment of rainfallinduced shallow landslides on a regional scale. Natural Hazards and Earth System Sciences, 11, 1927
Rosso, R., Rulli, M. C., and Vannucchi, G.: A physically based model for the hydrologic control on shallow landsliding, Water Resour. Res., 42, W06410, doi:10.1029/2005WR004369, 2006.
Simoni, S., Zanotti, F., Bertoldi, G., and Rigon, R.: Modelling the probability of occurrence of shallow landslides and channelized debris flows using GEOtopFS, Hydrol. Process., 22, 532–545, doi:10.1002/hyp.6886, 2008.

AC1: 'Reply on RC1', Feiko van Zadelhoff, 23 Jul 2021
We would like to thank the reviewer for the overall positive assessment of our paper and the detailed suggestions to further improve the manuscript. In small script below we give the original comment and in normal font our answers.GENERAL COMMENTI reviewed with interest this manuscript for possible publication in NEHSS journal. The work describes a comprehensive modeling tool to assess shallow landslides initiated by rainfall, in a probabilistic framework. The manuscript provides an interesting contribution in this field, although some aspects are strongly simplified, in contrast with others. The scientific quality is good, the reading is agile although the manuscript is overall a bit long and often dispersive. The literature review can be improved with additional appropriate references of strictly related works. The description of the climate forcing that initiates (or not) the landslide events requires significant improvement. To my opinion, the work can be published after some important clarifications and revisions.Thanks for the nice summary and the positive assessment. We will give the manuscript a critical read and see where shortening and more 'to the pointness' can be applied.SPECIFIC COMMENTS1. Literature review (introduction/discussion).The discussion on the impacts and costs of the natural hazards, from the point of view of insurance institutes, is interesting. However, in general, I found the introduction a bit dispersive, lacking in some aspects. The work of Dietrich and Montgomery, 1994, (SHALSTAB) represents the pioneering work within this approach, and it has been followed by many other deterministic work that gave different contributions in improving the hydrological modeling at support for the shallow landslide, such as the cited Iverson 2000, and, additionally, Rosso et al., 2006; Claessens et al., 2007, Arnone et al., 2011; Lepore et al., 2013; Simoni et al., 2008, Baum et al., 2002 (TRIGRS), Montrasio et al., (2011) (SLIP) (among the others). With regard to the effect of vegetation, the aspects related to the hydrological effects should be at least discussed, which can sometime be even more significant than the mechanical ones (Feng et al., 2020). An interesting review are by Chae et al., 2017, Gasser et al., 2019 and the just published by Masi et al., 2021.Thank you for the positive reply on the insurance institute view and the many suggestions on hydrological papers employing the model scheme. We will include some of these to embed our approach better in current research. We agree that the hydrological effects of vegetation are significant in slope stability and it is important to discuss this in the introduction. In Feng et al., 2020, it is concluded that in extreme rainfall events, the hydrological effects decrease almost completely in importance. SlideforMap is focussed on these events and therefore we would like to keep it to a short discussion pointing this out.2. Definition of the stability problem.I found the definition of the problem of stability estimation (section 2.2, Figure 2) a bit misleading. It is not clear the definition of the volume of soil to which forces are applied. In the method of the limit equilibrium, under the hypothesis that the width of the landslide is sufficiently large so that the deformations are in the plane parallel to the soil thickness Hsoil (i.e. perpendicular to the elliptic landslide in figure 2), forces are assessed by considering a ‘slice’ of soil with unit width (in the direction parallel to the elliptic landslide plane). Figure 2 is confusing and the planes of forces are not well drawn. The limit equilibrium method (and infinite slope model) is based on the hypothesis of large and elongated element with respect to the soil thickness, so that a unit in width element can be considered. Also, Pwater is not indicated in the Figure 2. According to the definition in the manuscript, Rlat and Fres apply on different planes. I suggest to modify in a 3D perspective the Figure 2 and specify the hypothesis/assumptions.Thank you for pointing out the confusion arising from figure 2. The authors will give the figure a 3D perspective in order to enhance clarity on the dimensions ,volume and force application planes of the assumed shallow landslide. Specifically we will adequately scale the shallow landslide elongation in an enlarged side view. We will add the water pressure (Pwater) as a subtraction of perpendicular force and emphasize the points or fields on which the forces apply.3. Hydrology and precipitation.Here is my main comment. The proposed modeling framework addresses shallow landslides that are initiated by rainfall, which is the triggering factor. The approach used (based on TOPMODEL) is extremely simplified because based on steady state conditions, which do not take into account the transient of the hydrological processes (Chae et al., 2017). The authors declare the limitation of the approach used in the discussion section, but this should be clearly stated soon in the methodology. As correctly written by the author, the stationarity is supposed to be reached within the hour of timestep. Clearly, this cannot be largely verified. That said I arise two more critical issues that are not mentioned by the authors:Under unsaturated conditions, soil (especially fine and clayey soils) exerts a strong water uptake effect due to suction, which leads to an apparent ‘hydrological’ cohesion. This represent a further limitation of the Montgomery and Dietrich approach that the authors should mention (see, works mentioned in Chae et al., 2017, e.g. Lepore et al., 2013).Thank you for pointing this out. The TOPMODEL approach does indeed have important limitations and is not verified in this research. The authors will address the issues relating to the TOPMODEL approach more distinctly in the methodology. The water uptake due to suction will also be addressed by the authors in the methodology. We note that the majority of the shallow landslides in the used inventory are in coarse soil material and over intense rainfall leading to high degrees of saturation. According to Monstrasio and Valentino (2008), these conditions result in a limited added cohesion. We believe, although the apparent hydrological cohesion is valid, it's effects are marginal in our case studies. This will be clarified in the revised version in the methodology section.In the description of the model application (section 3.4.2) it is not clear how rainfall initiating events are selected. If I understood well, only events of 1hour duration are selected, whose intensity is identified from the DepthDurationFrequency (DDF) curve at different return periods (i.e. from 10 to 100 years). Therefore, I guess 10 events of 1 hours are simulated. Is that correct? If so, it should be explained and justified the reason of analyzing events of only 1 hours, which cannot be ‘critical’ for landslide initiation. Authors should deeply clarify this part in the manuscript, explain the methodology used to define the events, and report the parameters of the DDF curves.The DDF curves are used to give an upper (100 year return period) and a lower (10 year return period) boundary for the range used in our sensitivity analysis. Subsequently 1000 (LHS) samples are drawn between these boundaries, along with all other model parameters that are calibrated. There are not 10 events simulated for 1 hour. The type of rainfall in Switzerland that triggers shallow landslides corresponds to this range of return period and magnitude (e.g. Rickli & Graf 2009). The focus on rainfall intensity is in line with the assumption that a steady state of pore water pressure in the soil is generated by preferential flow through macropores. As stated in line 73:78, a long duration low intensity rainfall is generally not critical in landslide initiation, whereas short duration (enough to reach a steady state) high intensity rainfall is critical (e. g. Guzzetti et al., 2004). 1 hour seemed us indicative to the situation, although a different duration could be critical. Unfortunately, we don't have data that gives us the exact intensity and duration of the triggering. We choose a steady stage approach and decided to vary the return periods instead of the duration, but both conditions could overlap. The method to identify the events will be better clarified in the revised version.4. Data inventoryThe proposed methodology used to characterize the hypothetical landslides (extent) is strictly dependent on the data inventory (section 2.3), as also stated somewhere by the authors. However, it is important that the observed landslides used to characterize the model are of the same type, according to the hypothesis of the stability model used and all triggered by rainfall. Is it so? Please specify.That's a good point. All landslides of the used inventory are triggered by rainfall. As shown in figure 4, most of the slides are shallower than 1.5 m. These are the ones we want to model and assume these to be representative for Switzerland.5. Calibration/sensitivity analysisWith regard to the best set of parameters, my question is: are the found parameters consistent and realistic?Consistency in the found parameters is arguable. There appears to be a certain equifinality at play, which is quite common in multiparameter modelling (e.g. Beven and Binley, 1992). Table 1, below gives an example of the 5 best parameter sets (highest AUC) as used in the paper. Parameters with a high influence such as mean soil depth (mde) and Transmissivity (Tra) are consistent in their value proportional to the range of the sensitivity analysis. Parameters with a low(er) influence show higher variability.Table 1: The 5 best parameter combinations in the Eriz study arealsdrhomdesddmcoscomepsdpTraPPPRLaWvgauc0.0961.451.280.154.910.2139.93.00.0004445.43.20.050.9160.0941.101.710.116.030.0435.92.40.0004246.95.10.040.9140.0681.491.400.175.190.3633.72.70.0003636.914.50.010.9110.0891.111.460.236.320.1628.24.10.0003948.51.60.030.9100.0791.031.560.346.390.3633.62.60.0003339.113.90.080.910Realism of the parameters is determined by the ranges the authors choose in table 5. We believe, with the data available to us, we made the most realistic assumptions on the parameter ranges. We will add a comment on consistency and realism to the revised version.For example, I argue the choice of including the precipitation intensity as calibration parameter. As discussed in the previous comment, rainfall represents the triggering forcing and it is a dynamic variable. Ideally, we should know the precipitation intensity associated to each observed landslide. Otherwise, if used as parameter, it seems that the model is tuned ad hoc just to reproduce the past events. If so, which could be its utility?Rickli and Graf., 2009 mention an estimate of the rainfall events in the landslide inventory. The estimated duration of the event, however, is variable. This is hard to relate to hourly intensity. In addition, resulting from simplifications in the TOPMODEL approach (as stated by the reviewer as well) and the spatial variability in mountainous rainfall, the computed pore pressure is not exactly relatable to a real precipitation event. Therefore, in reality, our rainfall intensity calibration focuses more on reproducing the required pore pressure. The utility of this analysis is in the discussion on the overall performance of the model.The model is indeed calibrated to past events in order to perform sensitivity analysis based on the most realistic combination of parameters. the calibration for the best performing set of parameters are used in this paper only to show the values of the parameters (table 6). This is used in the discussion on how realistic the parameters values are. For future case studies, we are planning on improving our hydrological approach in SlideforMap, which will more accurately relate to rainfall event (conditional on available subdaily precipitation data). This further development is also based on the analysis in section 5.6.Additionally, it would be interesting to see the AUC curves for the calibrated and the best model combinations. The shape of the curve also tells about the model performance. Then, to my opinion, sensitivity analysis should go before the model calibration. Normally, calibration is done on parameters that are more sensitive. I understand figure 7 and 8, but not sure this is the most efficient way to verify the sensitivity of the parameters. I am curious to see how, for example, the landslide probability varies with the chance of parameters values. This test could be shown with the least and most sensitive parameters.We agree that model development is an iterative process, where sensitivity analysis can be used to identify the parameters that are most sensitive. In this work, the parameters to be calibrated are selected based on literature, we identified these parameters as often recurring in shallow landslide modelling. In our opinion the sensitivity of all these parameters is interesting and can help in future development of SlideforMap or other models employing a similar method. As suggested by the reviewer, we will add the corresponding AUC curves to the paper.6. ResultsThe result of high m_f and low m_c is quite obvious; as the author clearly say in the discussion, and as found by other past works, in the end only few parameters really affect the process: the geometry of the slope (i.e. the soil thickness), the mechanical properties (i.e. friction angle) and the characteristic of the trigger (i.e. precipitation) whose effects are controlled by the soil transmissivity. With regard to the vegetation: different vegetation scenarios are analyzed (and this is fine). which is the real configuration? Which is the ultimate target of the simulations?Yes you are right, In the end it is only a few parameters that affect the process. In the shallow landslide probability pattern, these are the elevation model (determining the slope and specific catchment area) and the single trees datafile. The 'magnitude' of the pattern comes from soil thickness, friction angle, cohesion, precipitation and transmissivity, as mentioned by the reviewer. vegetation mitigates the magnitude and also the spatial pattern of the probability. It is also something land management actually influences and is therefore of ultimate importance on shallow landslide mitigation. We hope to quantify this effect of vegetation in our results and discussion and show its importance. Thus to summarize the ultimate goal is to assess forest (management) scenario's on slope stability. The real configuration is the single tree detection method. This will be specified in the revised version.7. GeneralI suggest to clearly state which is the ultimate main target of the model. Can we use it as forecast tool in an early warning system? If so, in which way? My impression is that it is too constrained to the calibration parameters, which, in some cases, may lose their physical meaning.Thank you for the point. Simplifications and calibration constraints make it hard to use as exact forecast tool. The main application the authors intended to model for is as a tool to quantify the effects of different vegetation scenario's for land managers. The authors will state this more clearly in the discussion but also already early in the introduction.TECHNICAL CORRECTIONSReply: thanks for pointing out the corrections, we will implement them all.LITERATUREBeven, K., & Binley, A. (1992). The future of distributed models: Model calibration and uncertainty prediction. Hydrological Processes, 6(3), 279–298. https://doi.org/10.1002/hyp.3360060305Beven, K., & Germann, P. (2013). Macropores and water flow in soils revisited. Water Resources Research, 49(6), 3071–3092. https://doi.org/10.1002/wrcr.20156Guzzetti, F., Cardinali, M., Reichenbach, P., Cipolla, F., Sebastiani, C., Galli, M., & Salvati, P. (2004). Landslides triggered by the 23 November 2000 rainfall event in the Imperia Province, Western Liguria, Italy. Engineering Geology, 73(3–4), 229–245. https://doi.org/10.1016/j.enggeo.2004.01.006Montrasio, L., & Valentino, R. (2008). A model for triggering mechanisms of shallow landslides. Natural Hazards and Earth System Science, 8(5), 1149–1159. https://doi.org/10.5194/nhess811492008Rickli, C., & Graf, F. (2009). Effects of forests on shallow landslides – case studies in Switzerland. Forest Snow and Landscape Research, 44(82 (1)), 33–44. Retrieved from http://www.issw.ch/wsl/publikationen/pdf/9696.pdf

RC2: 'Comment on nhess2021140', Anonymous Referee #2, 30 Jun 2021
GENERAL COMMENTS
The authors describe a probabilistic model called SlideforMap (SfM) which generates a map of shallow landslide probability across an area of interest. The approach is to randomly simulate “hypothetical landslides” across the landscape. A factor of safety is calculated for each hypothetical landslide based on limitequilibrium analysis. Areas with factors of safety less than 1 are assumed to be unstable. The final output of the model is the fraction of unstable landslides at a given location.
The model requires a total of 22 parameters, 16 of which are deterministic and 3 of which are drawn from probability distributions (each of which is defined by 2 parameters). The resisting force is based partially on pore water pressures computed using a variation of TOPMODEL, which is justified by assuming the dominant role of macropore flow in pore pressure development.
The novel feature promoted by the authors is the inclusion of basal and lateral root reinforcement from vegetation into the resisting force. In their case study, the authors demonstrate how they obtain an inventory of trees from an airborne laser scanning dataset. The authors also argue that the root reinforcement should explicitly depend on the size and spatial distribution of individual trees.
The authors demonstrate their approach using three study areas in Switzerland, each of which has a landslide inventory that the authors use to calibrate the model against. The authors conduct a sensitivity analysis where they analyze how the marginal distributions of different parameters are related to model performance and to the fraction of unstable landslides.
The authors include a thorough discussion that explores the limitations of the reference datasets in assessing model performance, the assumptions of several model parameterization choices, and the AUC as a performance metric. Much of this discussion reflects on the differences in the predicted unstable ratio between one of the study areas, St Antonien, and the other two.
The authors have organized their manuscript well, and they have described a complex workflow in a straightforward way. They also build a convincing case for the utility and need for a model of this type, and the described case studies illustrate the applications well.
In my opinion, this manuscript should be published in NHESS after some clarifications and revisions. Most of my criticisms are focused on areas where the authors need to provide additional clarifications, either to adequately explain their approach or to explain how this model could be used by others.
SPECIFIC COMMENTS
 The authors say that their model demonstrates the importance of root reinforcement on shallow landslides, but the authors need to define what “shallow” means so that it is clear where their conclusions apply.
 The authors are persuasive about the importance of root reinforcement in modeling landslide hazards, but they do not provide much discussion of how this model compares to other previously published models, including both related models (such as SOSlope or SlideForNET) or other models that compute landslide susceptibility on a regional scale. Some additional discussion of where this model fits within the context of other landslide susceptibility models generally would be helpful for prospective users.
 In describing the methodology, the authors are not always clear about which values are assumed for their own case study, and which values are fixed in the model. For example, at a number of places in the methodology section, the authors assign values and limits on parameters (e.g., maximum HL surface area, mean tree density, precipitation intensity threshold, etc.) based on data from Switzerland (where the case study is located), but it is not clear whether a given user would have the freedom to change these values.
 The structure of the model requires that soil depth, soil cohesion, and the angle of internal friction be modeled as random variables with normal distributions, but the other 16 parameters are assumed to be deterministic. The authors need to explain why these three parameters specifically were chosen to be random variables. For instance, variables can be randomized when the uncertainty in their values is either shown or assumed to have the most significant effects on the results. This is suggested somewhat by the sensitivity analysis for the case of soil cohesion and soil depth, but this choice is not explained explicitly.
 The authors make use of two datasets, a tree inventory and a landslide inventory, in their analysis. However, they do not spend much time explaining how a prospective user would apply this model if they were lacking these datasets. It seems that users could still apply this model without these datasets, either by creating synthetic datasets or assuming specific values for the parameters that would be derived from these datasets. Providing some more guidance on applying the model without these datasets this would make the model more accessible to users.
 The sensitivity analysis is interesting but not entirely convincing. If strong parameter correlation is at play, as the authors suggest, then how would we know which parameters are truly important?
 In a couple of places within the text (L4952; L169170) the authors conflate deterministic models with spatial homogeneity. This is misleading, as it is possible to have deterministic models that account for spatial heterogeneity, and probabilistic models that are spatially homogeneous. I would suggest that the explanation the authors are after is that the spatially heterogeneous values themselves are uncertain, and this is the motivation for using a probabilistic approach.
 Is it valid to compare the globally uniform vegetation scenario to the other three scenarios if the globally uniform scenario was used to calibrate the parameters?
 It appears that the authors used the same landslide inventory to both calibrate the dataset and to validate the performance of the model against different vegetation scenarios. Did the authors consider using any portion of the landslide inventory as an independent validation dataset?
 L4445. The authors need to give some additional definition of a deterministic approach and why SHALSTAB is an example of this approach.
 L128130. It seems that the unstable ratio is a very limited metric, particularly if the landslide density is already very low. Shouldn’t the landslide density be relevant in addition to the unstable ratio? If there is an explicit requirement that the number of HLs be large enough to compute the unstable ratio with a large denominator, does this effectively put a lower bound on the landslide density for this model?
 L152153. I am surprised that the landslides are generated using a spatially uniform distribution, as this may result in landslides being simulated in areas that are not landslide prone. What is the rationale behind this? Shouldn’t they follow a spatially distributed density, or at least be restricted to susceptible areas?
 L278. A 2km buffer seems extremely large, especially if topographic wetness is computed over multiple small catchments. How was this value chosen, and is it adequate for other studies?
 L407408. What does this mean if the unstable ratio decreases when single tree detection is used? Does this indicate that heterogeneity is important for slope stability, or does it simply mean that the uniform vegetation scenarios are not realistic?

Table 7. Why are the AUC and Unstable ratio values different for the globally uniform vegetation scenario compared to the results with the optimal parameters (Table 6)? Is this due to the difference in the landslide density?

L471473. Does this high unstable ratio match with long term observations about landslide occurrence in StA? In other words, is the unstable ratio realistic?

L480. This suggests that AUC is a poor choice of performance metric for comparing the three study areas. Are there other metrics which would be better?
TECHNICAL CORRECTIONS
 L14. This should be “ratio” instead of “fraction.”
 L121. Does SfM generate a raster image of probability values?
 L134. Do the authors mean “greater than 1.0”?
 L163. What does “distance of 10” refer to?
 L 271273. What resolution is the unstable ratio computed at? This is not made explicit here in the paper.
 L300307. What is the spatial format of the landslide inventory? If they are polygons, how are they compared to the unstable ratio map so that the AUC can be computed? Does the landslide inventory need to be converted or rasterized at a specific resolution?
 L309. The format for the numbers a,b, and c looks unusual. Please verify that the values and formats are correct.
 L312313. Why are these 11 parameters fixed while the others are varied?
 L332. What is n?
 L336337. Please explain why weighting is being used and how this weighting is calculated.
 L343. Please explain why the parameter range is using intensity values from different return periods.
 Table 5. The value for vegetation weight, Wveg, uses a different name and different units than the rhotree in Table 1 (tonne per square meter vs. kg per cubic meter). Is there a reason for this difference?
 L348. Is 1000 an adequate size to represent the sample space over the 12 parameters used in the sensitivity analysis?
 L360361. Does this model assume that root reinforcement comes only from trees, and not from shrubs, grasses, or other vegetation types? Is the singletree detection scenario using the same trees as the tree inventory cited in 3.2?
 L363. Please verify that the exponent is correct in the expression for landslide density.
 Fig. 8. How is “x% best” defined for the unstable ratio?
 L403. Do the model runs assume randomization of the three parameters (as in the original model setup)?
 L508. Are the 12 parameters all included in the 22 original parameters?

AC2: 'Reply on RC2', Feiko van Zadelhoff, 23 Jul 2021
We would like to thank the reviewer for the overall positive assessment of our paper and the detailed suggestions to further improve the manuscript. We give in italic below the original comment and in normal font our answers below. We did not copy the entire general comment part from the review, which provided a nice summary of our work (thanks for this!) but did not require any specific answer.GENERAL COMMENTSThe authors describe a probabilistic model called SlideforMap (SfM) which generates a map of shallow landslide probability across an area of interest. (..) The authors have organized their manuscript well, and they have described a complex workflow in a straightforward way. They also build a convincing case for the utility and need for a model of this type, and the described case studies illustrate the applications well. In my opinion, this manuscript should be published in NHESS after some clarifications and revisions. Most of my criticisms are focused on areas where the authors need to provide additional clarifications, either to adequately explain their approach or to explain how this model could be used by others.Thanks for the nice summary and the positive assessment. The revised paper will contain more details on how the model could be used by others (see also further comments below and response to RC1, 6. Results question).SPECIFIC COMMENTS1) The authors say that their model demonstrates the importance of root reinforcement on shallow landslides, but the authors need to define what “shallow” means so that it is clear where their conclusions apply.An informal definition of shallow landslides is landslides within the soil mantle, not containing bedrock. This is different from the official swiss definition, which states these are landslides with a soil depth < 2 m. In this paper, we officially use the Swiss definition, however, for the practical application of SlideforMap we consider this definition to be irrelevant. We will state this more clearly in the introduction and conclusion.The authors are persuasive about the importance of root reinforcement in modeling landslide hazards, but they do not provide much discussion of how this model compares to other previously published models, including both related models (such as SOSlope or SlideForNET) or other models that compute landslide susceptibility on a regional scale. Some additional discussion of where this model fits within the context of other landslide susceptibility models generally would be helpful for prospective users.Thank you for this point. The authors agree this issue can be better emphasized in the discussion. We will add a paragraph explicitly comparing SlideforMap to other landslide susceptibility models in their application.In describing the methodology, the authors are not always clear about which values are assumed for their own case study, and which values are fixed in the model. For example, at a number of places in the methodology section, the authors assign values and limits on parameters (e.g., maximum HL surface area, mean tree density, precipitation intensity threshold, etc.) based on data from Switzerland (where the case study is located), but it is not clear whether a given user would have the freedom to change these values.Thank you for pointing this out with the concrete examples. The thresholds are indeed Switzerland specific, others are based on assumptions by the authors. Future users have the opportunity to change these values and are encouraged to do so if they apply SlideforMap in other areas. We will make clear in the revised version which parameters are specifically selected for Switzerland and which ones are related to more general assumptions.The structure of the model requires that soil depth, soil cohesion, and the angle of internal friction be modeled as random variables with normal distributions, but the other 16 parameters are assumed to be deterministic. The authors need to explain why these three parameters specifically were chosen to be random variables. For instance, variables can be randomized when the uncertainty in their values is either shown or assumed to have the most significant effects on the results. This is suggested somewhat by the sensitivity analysis for the case of soil cohesion and soil depth, but this choice is not explained explicitly.Indeed as the reviewer suggests this was partly motivated by sensitivity. Soil depth and soil cohesion are generally assumed to be highly influential in shallow landslide susceptibility mapping (e.g. Cislaghi et al., 2018).We will make clearer in the revised version the difference between i) applying a random parameter field for selected parameters (i.e. different randomly generated parameter values for each grid cell) and ii) generating a set of random parameters for parameter calibration. These are two very different, unrelated stages of parameter identification, where the first one is a way of assigning spatially heterogeneous parameters values and the second a way of identifying good or best parameter estimates.The motivation for applying random fields is the application in hilly or mountainous areas for SlideforMap. Here, soil properties are generally highly heterogeneous (e.g. Tofani et al., 2017) and we would like to account for this specific heterogeneity in the probabilistic approach of SlideforMap. Two soil properties are left out intentionally. Firstly soil density, which is assumed to have low spatial variability and influence. Secondly soil transmissivity, which is considered part of the hydrological approach and is included in calibration. We will explain this choice more explicitly in the revised version.The authors make use of two datasets, a tree inventory and a landslide inventory, in their analysis. However, they do not spend much time explaining how a prospective user would apply this model if they were lacking these datasets. It seems that users could still apply this model without these datasets, either by creating synthetic datasets or assuming specific values for the parameters that would be derived from these datasets. Providing some more guidance on applying the model without these datasets this would make the model more accessible to users.Good point. Both synthetic datasets and assuming specific values (e.g. distribution parameters taken directly from the Malamud et al., (2004) are possible. We will make this clearer in the SlideforMap section (section 2) of the paper.The sensitivity analysis is interesting but not entirely convincing. If strong parameter correlation is at play, as the authors suggest, then how would we know which parameters are truly important?We did not want to imply that strong parameter correlation is at play. Figure 1 in the attachment shows pairwise dotty plots between all parameters. figure 2 below shows all the pairwise linear correlation coefficients. Both figures are for the 20% best parameter sets. These figures show a correlation between mean depth  mean cohesion, Transmissivity mean cohesion and Friction angle  mean cohesion. This indicates that the influence of these parameters on the performance may be higher than comes forward in our manuscript figure 7. In addition, further multivariate correlations or bivariate nonlinear correlations can be at play as well.Attachment Figure 1: pairwise dotty plots between the 20 % best parameters according to AUCAttachment Figure 2: pairwise correlation between the 20 % best parameters according to AUCWhat we intended to say in the original manuscript is that a potential correlation between parameters can lead to apparent absence of sensitivity This was nicely demonstrated in the paper by Bardossy (2007). We reproduce below that example (see full Matlab code as a supplement to this response) where we try to find the best parameters of a simple Nash cascade model fitted to a hydrograph that we generated ourselves with that model. The performance criteria (here the RMSE) does not show a strong sensitivity with respect to the randomly sampled parameters (Figure 3). This result is due to the correlation between the best parameter sets (Attachment figure 3 right). What this example shows is that if parameter sets show a correlation this might manifest itself in absence of sensitivity. Accordingly, absence of sensitivity does not imply that the parameters have to be correlated.Attachment Figure 3: left and center: scatter plot of parameter values against performance measure of the synthetic Nash cascade example, showing an apparent low sensitivity; right: plot of the best 20% of all parameters sets against each other showing the strong correlation between the best parameter sets.In a couple of places within the text (L4952; L169170) the authors conflate deterministic models with spatial homogeneity. This is misleading, as it is possible to have deterministic models that account for spatial heterogeneity, and probabilistic models that are spatially homogeneous. I would suggest that the explanation the authors are after is that the spatially heterogeneous values themselves are uncertain, and this is the motivation for using a probabilistic approach.This is a good point. We will adjust the text.Is it valid to compare the globally uniform vegetation scenario to the other three scenarios if the globally uniform scenario was used to calibrate the parameters?Good point. Indeed we would argue this gives an 'unfair advantage' to the uniform vegetation scenario. Performance is comparable (in the case of the Trub study area even better) with the single tree detection detection. In our opinion this strengthens our case that the model can be calibrated on uniform vegetation and then be used with different vegetation scenarios.It appears that the authors used the same landslide inventory to both calibrate the dataset and to validate the performance of the model against different vegetation scenarios. Did the authors consider using any portion of the landslide inventory as an independent validation dataset?We wanted to analyze the performance of the model, not do a validation. For a proper validation we think the size of the dataset is too limited. The vegetation scenarios are used to demonstrate the usefulness of the model to assess the impact different vegetation scenarios but not to validate the model (in the context of having a set of best performing parameters). This will be made clearer in the revised version.L4445. The authors need to give some additional definition of a deterministic approach and why SHALSTAB is an example of this approach.Good point and related to the comment on spatial homo/heterogeneity. In the same section we will more precisely define deterministic and probabilistic approaches and what is does and does not entail.L128130. It seems that the unstable ratio is a very limited metric, particularly if the landslide density is already very low. Shouldn’t the landslide density be relevant in addition to the unstable ratio? If there is an explicit requirement that the number of HLs be large enough to compute the unstable ratio with a large denominator, does this effectively put a lower bound on the landslide density for this model?Both the unstable ratio as a metric and the AUC are influenced by the landslide density. It can also be seen as a slight factor of influence in the sensitivity analysis (Figure 7). Although the landslide density definitely has a lower bound for reliability, we have not specifically identified this boundary. We considered this out of the scope of this research. We agree that the unstable ratio in general is a limited metric since it is not an independent metric of performance, therefore we choose the AUC as main metric.L152153. I am surprised that the landslides are generated using a spatially uniform distribution, as this may result in landslides being simulated in areas that are not landslide prone. What is the rationale behind this? Shouldn’t they follow a spatially distributed density, or at least be restricted to susceptible areas?This was chosen to be unbiased in the definition of what was qualified as susceptible. This choice however influences the AUC metric as we discuss in section 5.5. This is a constraint many natural hazard model studies struggle with (Corominas et al., 2014), but to stay in line with similar model publications we decided to keep it as our performance metric.L278. A 2km buffer seems extremely large, especially if topographic wetness is computed over multiple small catchments. How was this value chosen, and is it adequate for other studies?The value was deliberately chosen large to avoid any doubt in the TWI accuracy. As noticed it is extremely large, but we had the DEM available to do it and the GIS procedure is not complicated. Model users are free to use a smaller buffer if they are confident it will still result in a correct TWI computation. This will be specified in the revised version.L407408. What does this mean if the unstable ratio decreases when single tree detection is used? Does this indicate that heterogeneity is important for slope stability, or does it simply mean that the uniform vegetation scenarios are not realistic?The most realistic explanation is that root reinforcement from single tree detection exceeds that of the calibrated uniform vegetation scenario. When applied in susceptible areas, it decreases instability to a greater extent than it would in the uniform vegetation scenario. It is an issue of both placement and amount. This will be added to the discussion.Table 7. Why are the AUC and Unstable ratio values different for the globally uniform vegetation scenario compared to the results with the optimal parameters (Table 6)? Is this due to the difference in the landslide density?Table 7 takes the average of 10 runs with identical model parameters but different realisations with random placement of the landslides. We will make this clearer. Table 6 reports the result corresponding to a single run with the best parameter set of the 1000 randomly generated sets (for each of which a single run was computed).L471473. Does this high unstable ratio match with long term observations about landslide occurrence in StA? In other words, is the unstable ratio realistic?This is a good question and can only to a certain extent be analyzed from our landslide inventory. From the inventoried slides and the surface area of the study areas a landslide density can be computed. These would be in slides/km2: Eriz: 4.9, Trub; 8, StA: 58.9. These results indicate a higher landslide density in the StA area comparable to our results. Insecurity arises of course due to the fact that events can not be 1:1 compared, but it could give a rough estimate. We can add this column in the table and shortly mention this in the discussion.L480. This suggests that AUC is a poor choice of performance metric for comparing the three study areas. Are there other metrics which would be better?That is a good point. As stated, though the AUC is frequently used, it has its shortcomings. This problem has been analyzed and better (though not optimal) propositions have been made (e.g. Chung & Fabbri, 2003). However in order to compare the results in an easy manner to performance of other models, we decided to stick with the AUC. In a future paper, we would like to diversify the performance evaluation of SfM, but we consider this outside the scope of presenting the model in the first place.TECHNICAL CORRECTIONSReply: thanks for pointing out the corrections, we will implement them.LITERATUREBárdossy, A. (2007). Calibration of hydrological model parameters for ungauged catchments. Hydrology and Earth System Sciences, 11(2), 703–710. https://doi.org/10.5194/hess117032007Chung, C. J. F., & Fabbri, A. G. (2003). Validation of spatial prediction models for landslide hazard mapping. Natural Hazards, 30(3), 451–472. https://doi.org/10.1023/B:NHAZ.0000007172.62651.2bCislaghi, A., Rigon, E., Lenzi, M. A., & Bischetti, G. B. (2018). A probabilistic multidimensional approach to quantify large wood recruitment from hillslopes in mountainousforested catchments. Geomorphology, 306, 108–127. https://doi.org/https://doi.org/10.1016/j.geomorph.2018.01.009Corominas, J., vanWesten, C., Frattini, P., Cascini, L., Malet, J. P., Fotopoulou, S., Catani, F., Van Den Eeckhaut, M., Mavrouli, O., Agliardi,F., Pitilakis, K.,Winter, M. G., Pastor, M., Ferlisi, S., Tofani, V., Hervás, J., and Smith, J. T.: Recommendations for the quantitative analysisof landslide risk, Bulletin of Engineering Geology and the Environment, 73, 209–263, https://doi.org/10.1007/s1006401305388, 2014.Malamud, B., Turcotte, D., Guzzetti, F., & Reichenbach, P. (2004). Landslide inventories and their statistical properties. Earth Surface Processes and Landforms, 29(6), 687–711. https://doi.org/10.1002/esp.1064Tofani, V., Bicocchi, G., Rossi, G., Segoni, S., D’Ambrosio, M., Casagli, N., & Catani, F. (2017). Soil characterization for shallow landslides modeling: a case study in the Northern Apennines (Central Italy). Landslides, 14(2), 755–770. https://doi.org/10.1007/s1034601708098

CC1: 'Comment on nhess2021140', Dave Milledge, 01 Jul 2021
This is a really interesting paper that demonstrates the applicability and predictive capability of a new model for shallow landslides to provide a detailed inclusion of the influence of vegetation. The use of LiDAR data to deduce tree properties and thus root characteristics is a really exciting development.
The model itself is similar to a number of existing models but also makes some important changes. It would be really useful to make these similarities and differences more explicit. The striking similarities to me were: 1) the hydrological model (Eqns 1112) is exactly that of SHALSTAB (Montgomery and Dietrich, 1994) and SINMAP (Pack et al., 1998); 2) modelling discrete landslides of defined dimensions with lateral resistance due to roots only (Eqns 16) follows Montgomery et al. (2000), Schmidt et al. (2001) and Roering et al. (2003); 3) the probabilistic treatment of stability using distributions for parameters follows Pack et al. (1998) who represented c, phi and the R/T ratio as uniform distributions; 4) introducing a slope dependence to failure depth follows Prancevic et al. (2020), though with a different functional form. The similarities are strongest between SfM and Montgomery et al. (1998), they use very similar stability models (both infinite slope with root cohesion only on the margins), the same hydrological model, and both impose discrete landslide dimensions; so differentiating your work from theirs will be important.
Having read the paper I have one primary outstanding question: What do you gain as a result of the additional data collection and modelling efforts involved in a detailed inclusion of the influence of vegetation? Your paper focuses on predictive skill (using ROC AUC) and predicted instability (using an unstable area ratio).
That focus enables a straightforward assessment of improvement in predictive skill from this more complex model relative to a simpler models such as SHALSTAB or SINMAP. In fact, I think you already have an answer to this in Table 7. The ‘no vegetation’ case in SfM is very close to the SINMAP model: in this case, there is no lateral resistance (i.e. an infinite slope), probability of failure is calculated from pdfs of friction, cohesion and depth with pore pressure predicted using the SINMAP/SHALSTAB model. The uniform vegetation cases (Global and Forest area) are very close to the SHALSTAB implementation of Montgomery et al. (2000): in these cases landslides have predefined dimensions and lateral cohesion is spatially uniform. The difference is that landslide dimensions (area and depth), and material properties (c and phi) are sampled from distributions to generate a probability of failure rather than using the critical P/T as a metric for propensity to failure (as in SHALSTAB). In all these cases I would expect a direct comparison to SINMAP and the SHALSTAB of Montgomery et al. (2000) to yield almost exactly the same AUCs as those from SfM. The clear structural difference between SfM and previous models comes in the case of ‘Single tree detection’.
Reading Table 7 in the context of these connections to simpler early models leads to three conclusions:
 Landslide predictions are surprisingly (and encouragingly) skilful even when models as simple as the ‘No vegetation’ SfM (equivalent to SINMAP) are used. Models like SINMAP are very attractive if they perform so well given their simple structure and parsimonious parameterisation.
 Representing landslides as discrete features (as in SfM or Montgomery et al. (2000)) rarely improves predictive skill unless detailed vegetation information is available. Best AUC for SfM with ‘Global’ or ‘Forest area vegetation’ are equal to the ‘No vegetation’ case for 2 of the 3 study sites and only 1% better for Sta.
 Detailed vegetation information from single tree detection does subtly improve predictive skill but only in 2 of the 3 sites (slightly worse for Eriz) and only by 3.8 and 3.2% in AUC for Trub and Sta respectively.
One interpretation of this would be that while SfM is much more satisfying from a process representation point of view it offers only very marginal gains in predictive skill and has considerable cost in that it is more highly parameterised and more complex. An alternative interpretation would be that small skill improvements on an already excellent model are worth the additional complexity (and cost). Reframing the percentage changes in AUC as percentage of the unrealised AUC that has been eroded by the new model (thus changing in denominator from AUC_{pre} to 1AUC_{pre}) the same values are: 6% and 43% for Trub and Sta respectively. I think this interpretation, which recognises the diminishing returns in model improvement is reasonable and if so it suggests the improvement is nontrivial.
It is interesting that the unstable ratio metric is more sensitive to model structure than AUC, and perhaps encouraging that this ratio is reduced by improved process representation. However as you point out (L355), this ratio is a measure of instability rather than accuracy.
SfM also makes predictions about the size of landslides most likely to be triggered in each location (though these are not currently reported in the paper). This is an important difference from previous models. Few models have done this before and those that have are extremely computationally expensive. Therefore the most exciting aspect of SfM to me is its ability to predict landslide size. The authors are clear that the model requires a prior distribution of landslide sizes but this does not prevent SfM from producing useful information on landslide size both in global/lumped terms and spatially distributed terms. In lumped terms, you could compare the size distribution for triggered landslides with the prior distribution. In the current case the prior is the observed size distribution for the study area but you could equally impose a uniform prior and assess the extent to the posterior matches the observed approaches both would be informative. In spatially distributed terms, the pattern of landslide size and its relationship to local conditions would be interesting and you would also be able to assess model performance with respect to landslides size by comparing the areas of predicted landslides that overlap observed landslides and (do they correlate? What is the form of the relationship?). Perhaps this type of analysis is reserved for a later study but it would fit nicely in the current paper.
Beyond these three major points I have several other questions that are more specific but less important. I don’t expect any of them to alter the primary messages of either the paper or the points I raise above but I hope they might be useful for the authors during revision.
I do not understand the rationale behind some of the assumptions in SfM’s boundary resistance representation
 Neglecting lateral earth pressure. It is true that active and passive earth pressure are maximised at some strain but neglecting them on this basis leaves two problems: a) you still need a treatment for the forces acting at the head and toe of the landslide; b) you need to apply the same criteria to root reinforcement since this is also maximised at some strain.
 Neglecting soil cohesion on the sides. It seems inconsistent to apply root reinforcement but not soil cohesion on the lateral boundaries if you apply both on the base.
 Lateral root reinforcement acts only over the upslope half of the landslide’s perimeter (Eqn 3). I don’t see a justification for this and Schwarz et al., 2010 point out that it underestimates lateral reinforcement.
 Lateral root reinforcement in Eqn. 9 is depth independent. This seems inconsistent with observed depth dependent rooting (density and size); and the depth dependence of basal reinforcement in SfM (Eqn 10).
 Calculating root reinforcement using spatially averaged distance to trees within the Gamma function. Previous applications of the Gamma function (Eqn 9) appear to use it to predict root reinforcement at a known distance from the nearest tree (Moos et al., 2016). Given its nonlinearities, is it reasonable to use an average distance in Eqn 9 rather than evaluating Eqn 9 for the distribution of distances then averaging?
Variability
The form amplitude and spatial pattern of variability in material properties are all likely important in defining landslide location and size (e.g. Bellugi et al., 2021). Representing this variability seems important. I would have liked to see more detail on your rationale for your choice of distribution form and spatial (de)correlation. I recognise that observations to inform this are sparse and these properties are not well known. The normal distribution has some specific problems that you grapple with but that others chose to avoid by using a lognormal (e.g. Griffiths et al., 2007). You deal with unphysical negative values by truncating, and claim these are rare but this places strict constraints on the variability that you can impose (small coefficients of variation for soil depth and cohesion in Table 5). In the absence of evidence to the contrary, a distribution that is limited to positive values (e.g. lognormal) would seem a more appropriate choice.
Soil depth variability is treated slightly differently (spatially decorrelated but slope dependent). I was unsure whether soil depths distribution was parameterised from observed landslide scar depths (L178) or using mean and standard deviation as parameters to optimise (Figure 7). The former seems problematic landslides likely occur in deeper soils biasing the sample. Perhaps Eqn 7 was designed to account for this? However, I don’t understand why the coefficients on mu (1.35) and sigma1 (0.75) in Eqn 7 have these particular values. The second approach, tuning mean depth rather than setting it from observations seems more appealing to me and would also enable a comparison between model results and observed landslide depths, which would be a nice addition.
Hydrology
Your approach is exactly the same as that of SHALSTAB and SINMAP but is considerably different from Topmodel (Beven and Kirkby, 1979). All three use a topographic index to define hydrologically similar units. Topmodel uses these (with simple treatments for evaporation and infiltration) to simulate a timevarying catchment averaged response to a rainfall timeseries that can be mapped back onto the HSUs; the others simply solve for a single steady recharge rate (neglecting these processes). Even the topographic index (i.e. A/sin(B)) differs from that of Topmodel (which uses ln(A/tan(B))). This reflects differences in reference frame (the sin vs tan) and assumed conductivity profile (uniform vs exponential). I don't disagree with the approach but I think it follows Montgomery and Dietrich (1994) and Pack et al. (1998) so it would be simpler to say that. If you wanted to give credit to earlier work then the TOPOG model of O’loughlin (1986) was behind the original derivation of SHALSTAB and the first introduction of a topographic index was by Kirkby (1975).
Previous papers that apply this hydrological model do not claim that it is particularly well suited to slopes with macropore flow. Montgomery et al. (2002) highlight the importance of macropores and fractures (and a steep soil water characteristic curve) for hillslope hydrologic response but also recognise that “that rapid pore pressure response that controls slope instability […] is driven by vertical flow, not lateral flow” (Montgomery et al., 2004). There is general agreement that lateral flow (modelled here) strongly influences the pore pressure field antecedent to a burst of rain that could initiate a landslide (Iverson, 2000; Montgomery et al., 2002; 2004). This has important implications for the approach though because it implies that Q/T is an index for the ‘propensity for landsliding’ rather than a parameter to be calibrated within a complete hydrological treatment. This explains the apparent problem of predicted pore pressures independent of rainfall duration but observations that landslide triggering depends on both intensity and duration. Broad spatial patterns of pore pressure and instability should be well captured but triggering rainfall properties may not be. In fact, discussion of the influence of macropores on pore pressure tends to focus on the unpredictable localised pressure peaks associated with constrictions or terminations to macropores (e.g. Pierson, 1983; Montgomery et al. 2002). Even given these limitations I don't think this is a bad model relative to the alternatives because it captures broad phreatic surface patterns and I'm convinced that the finer detail of these patterns is set by (unknown and perhaps unknowable) heterogeneity in material properties (e.g. macropores). If so, a more refined and expensive hydrological model may improve predictions of spatial pore pressure patterns very little.
Sensitivity Analysis
As you point out parameter interaction makes it very difficult to infer parameter sensitivity from Figure 7 I think that may make it difficult to support some of your assertions in L388395 because you cannot guarantee that interactions are not masking other stronger sensitivities. For me the clearest example is the interaction between P and T (Table 6). Both are listed as uncertain parameters within the sensitivity analysis but only feature in pore pressure definition and only in that equation as the P/T ratio. As a result their inclusion as two separate variables in this analysis is likely to lead to severe equifinality (with high or low values will result in the same outcome as long as P/T is constant). Why not include the ratio of the two in your sensitivity analysis?
Queries on equations:
1) I think there is a dimensional problem in either the first term of Eqn 3 or the second term of Eqn 4. Eqn 10 expresses R_{bas} as a function of R_{lat} so I think both should be either a force per unit length or a stress. If R_{lat} (in Eqn 9) is a stress then Eqn3 is dimensionally incorrect because the first term is a force per unit length and the second a force. The first term needs integrating over landslide depth. This could take the form cos(s) H if you assume reinforcement is depth invariant. However, this would then be inconsistent with Eqn10, which assumes that root reinforcement declines with depth. On the other hand, if R_{lat} is a force per unit length (which might be more consistent with Moos et al (2016), Fig 3) then the problem may be more difficult to solve because the lateral depth integrated stress (N/m) is being applied across a basal area (m^{2}).
2) Are h and H measured in a vertical reference frame as indicated in Figure 2? If so then I think there is a cos(s) missing from Eqn 12. The first cos(s) converts vertical depth to slope normal thickness, the second converts phreatic surface thickness to pressure head (under assumptions of: uniform steady slope parallel seepage).
3) Eqn 15 is incorrect because the original equation calculates DBH in cm from tree height in metres (Dorren, 2017) but you use DBH in metres (L292). I think Eqn 15 should be adjusted to 0.01H^{1.25}.
References
Bellugi D.G. et al., 2021. Controls on the size distributions of shallow landslides. Proc Nat Ac Sci, 118(9).
Beven, K.J. & Kirkby, M.J., 1979. A physically based, variable contributing area model of basin hydrology. Hyd Sci Jnl, 24(1).
Dorren, L., 2017. FINT – Find individual trees. User manual., ecorisQ paper (www.ecorisq.org).
Griffiths, D.V. et al., 2009. Influence of spatial variability on slope reliability using 2D random fields. Jnl Geotech & Geoenv Eng, 135(10).
Iverson, R.M., 2000. Landslide triggering by rain infiltration. WRR, 36(7).
Kirkby MJ. 1975. Hydrograph modelling strategies. In Proc in Phys & Hum Geog, Reel et al. (eds). Heinemann: London; 69–90.
Montgomery, D.R. & Dietrich, W.E., 1994. A physically based model for the topographic control on shallow landsliding. WRR, 30(4).
Montgomery, D.R. et al., 2000. Forest clearing and regional landsliding. Geology, 28(4).
Montgomery, D.R. et al., 2002. Piezometric response in shallow bedrock at CB1: Implications for runoff generation and landsliding. WRR 38(12).
Montgomery, D.R. & Dietrich, W.E., 2004. Reply to comment by Richard M. Iverson on ‘piezometric response in shallow bedrock at cb1: implications for runoff generation and landsliding’. WRR, 40(3).
Moos, C., Bebi, P., Graf, F., Mattli, J., Rickli, C. and Schwarz, M., 2016. How does forest structure affect root reinforcement and susceptibility to shallow landslides? ESPL, 41(7).
O'loughlin, E.M., 1986. Prediction of surface saturation zones in natural catchments by topographic analysis. WRR, 22(5).
Pack, R. T. et al., 1998. The SINMAP Approach to Terrain Stability Mapping. In 8th Congress IAEG, Vol. 2: Eng Geol & Nat Haz, Moore & Hungr (eds). A A Balkema.
Pierson, T.C., 1983. Soil pipes and slope stability. QJ Eng Geol & Hydrogeol, 16(1), pp.111.
Prancevic, J.P., 2020. Decreasing landslide erosion on steeper slopes in soil‐mantled landscapes. GRL 47(10).
Roering, J.J., et al., 2003. Shallow landsliding, root reinforcement, and the spatial distribution of trees in the Oregon Coast Range. Can Geotech Jnl, 40(2).
Schmidt, K.M. et al., 2001. The variability of root cohesion as an influence on shallow landslide susceptibility in the Oregon Coast Range. Can Geotech Jnl, 38(5).

AC3: 'Reply on CC1', Feiko van Zadelhoff, 23 Jul 2021
Dear CC1 (dear mr Milledge),
Thank you for your thorough analysis of our manuscript. Due to the holidays we have not been able to compile our answers yet, but we will post these as soon as all the authors have agreed upon these. In addition we will definately use you feedback in a revised version.

AC4: 'Complete reply on CC1', Feiko van Zadelhoff, 20 Sep 2021
We would like to thank CC1 for taking the time review our manuscript, the provided feedback and the clear suggestions for improvement. In italic below we give the original comment and in normal font our answers.he model itself is similar to a number of existing models but also makes some important changes. It would be really useful to make these similarities and differences more explicit. The striking similarities to me were: 1) the hydrological model (Eqns 1112) is exactly that of SHALSTAB (Montgomery and Dietrich, 1994) and SINMAP (Pack et al., 1998); 2) modelling discrete landslides of defined dimensions with lateral resistance due to roots only (Eqns 16) follows Montgomery et al. (2000), Schmidt et al. (2001) and Roering et al. (2003); 3) the probabilistic treatment of stability using distributions for parameters follows Pack et al. (1998) who represented c, phi and the R/T ratio as uniform distributions; 4) introducing a slope dependence to failure depth follows Prancevic et al. (2020), though with a different functional form. The similarities are strongest between SfM and Montgomery et al. (1998), they use very similar stability models (both infinite slope with root cohesion only on the margins), the same hydrological model, and both impose discrete landslide dimensions; so differentiating your work from theirs will be important.Thank you for pointing out the similarities in approach of SlideforMap in this systematic way.1) you're right. We use the same hydrological approach as SHALSTAB. We will specifity this better in the paper.2) In our opinion, the concept is identical, but the implementation is different. Our approach to the quantification of root reinforcement and the probabilistic approach for landslide dimensions (eq. 6) is a unique integration. Our lateral root reinforcement is spatially heterogeneous and multiplied by half of the landslide perimeter instead of the whole perimeter. This is closer to reality according to Schwarz et al., (2010) and in contrast to Montgomery et al., (2000), Roering et al., (2003) and Schmidt et al., (2001) who use the full perimeter of the landslide. We employ a probabilistic approach to the landslide surface areas by sampling from a distribution. This is in contrast to the fixed width (5m) and length (10m) of the landslides in Montgomery et al., (2000). In addition, in contrast to the papers commented, we employ the distinct Root Bundle Model (RBMw) (Gehring et al., 2019) to estimate spatial lateral and basal root reinforcement.3) Yes that is correct, even though we use the normal distribution for soil parameters as opposed to the uniform distribution in Pack et al., (1998).4) The approach of fitting landslide scar depth to slope angle is indeed similar to Prancevic et al. (2020). In Prancevic et al., (2020) it is used to define relationships and dimensions for slope angle domains, we use it as a correction factor to keep our probabilistic samples realistic.Although SfM is similar in many aspects of the approaches, the main difference is related to the quantification of root reinforcement (Our spatial distributed root reinforcement comes specifically from forest structure) and in the implementation of the probabilistic computation. Compared to Montgomery et al., (1998), the hydrological module is identical. We choose the application of this model as a result of observation in an artificial precipitation experiment by Askarinejad et al., (2012). But in other modules of the model we have notable differences. We will add specific similarities and distinction to our method section in the revised paper and discuss these in the discussion section.Having read the paper I have one primary outstanding question: What do you gain as a result of the additional data collection and modelling efforts involved in a detailed inclusion of the influence of vegetation?The opportunity to work with single tree detection based vegetation parameters opens up opportunities for users to assess the impact of different vegetation scenarios. This will have advantages since forest management and/or reforestation are an important if not the most important mitigation strategy there is for shallow landslides in many areas (e.g. Amishev et al., 2014). Moreover, we expect an improvement in the performance of the model with the detailed inclusion of vegetation, which we analyzed in the paper.Your paper focuses on predictive skill (using ROC AUC) and predicted instability (using an unstable area ratio). That focus enables a straightforward assessment of improvement in predictive skill from this more complex model relative to a simpler models such as SHALSTAB or SINMAP. In fact, I think you already have an answer to this in Table 7. The ‘no vegetation’ case in SfM is very close to the SINMAP model: in this case, there is no lateral resistance (i.e. an infinite slope), probability of failure is calculated from pdfs of friction, cohesion and depth with pore pressure predicted using the SINMAP/SHALSTAB model. The uniform vegetation cases (Global and Forest area) are very close to the SHALSTAB implementation of Montgomery et al. (2000): in these cases landslides have predefined dimensions and lateral cohesion is spatially uniform. The difference is that landslide dimensions (area and depth), and material properties (c and phi) are sampled from distributions to generate a probability of failure rather than using the critical P/T as a metric for propensity to failure (as in SHALSTAB). In all these cases I would expect a direct comparison to SINMAP and the SHALSTAB of Montgomery et al. (2000) to yield almost exactly the same AUCs as those from SfM. The clear structural difference between SfM and previous models comes in the case of ‘Single tree detection’.Thank you for pointing this out, adding this explicitly will be a good addition to the paper. However, we would like to note that there are further differences to the model approach as pointed out in the previous questions. In addition we integrate the spatial parameters (slope, topographic wetness index, lateral root reinforcement) over all cells in the landslide surface area. The landslide surface area samples are drawn from a calibrated surface area distribution.Reading Table 7 in the context of these connections to simpler early models leads to three conclusions:

Landslide predictions are surprisingly (and encouragingly) skilful even when models as simple as the ‘No vegetation’ SfM (equivalent to SINMAP) are used. Models like SINMAP are very attractive if they perform so well given their simple structure and parsimonious parameterisation.

Representing landslides as discrete features (as in SfM or Montgomery et al. (2000)) rarely improves predictive skill unless detailed vegetation information is available. Best AUC for SfM with ‘Global’ or ‘Forest area vegetation’ are equal to the ‘No vegetation’ case for 2 of the 3 study sites and only 1% better for Sta.

Detailed vegetation information from single tree detection does subtly improve predictive skill but only in 2 of the 3 sites (slightly worse for Eriz) and only by 3.8 and 3.2% in AUC for Trub and Sta respectively.
One interpretation of this would be that while SfM is much more satisfying from a process representation point of view it offers only very marginal gains in predictive skill and has considerable cost in that it is more highly parameterised and more complex. An alternative interpretation would be that small skill improvements on an already excellent model are worth the additional complexity (and cost). Reframing the percentage changes in AUC as percentage of the unrealised AUC that has been eroded by the new model (thus changing in denominator from AUC_{pre} to 1AUC_{pre}) the same values are: 6% and 43% for Trub and Sta respectively. I think this interpretation, which recognises the diminishing returns in model improvement is reasonable and if so it suggests the improvement is nontrivial.It is interesting that the unstable ratio metric is more sensitive to model structure than AUC, and perhaps encouraging that this ratio is reduced by improved process representation. However as you point out (L355), this ratio is a measure of instability rather than accuracy.The authors agree to a large degree with this interpretation and would like to thank CC for this interesting and concise discussion. We will add this discussion to the paper and acknowledge CC explicitly for the contribution to the improvement of this paper.With single tree detection, the relative gain of the AUC is marginal, but as pointed out in the comment, the decrease in unstable ratio is more significant. This is noteworthy and we will add some possible explanations in the discussion, as we are not certain of the causes. It can possibly be further reduced by running future scenario's with denser, well managed forest or reforestation efforts in areas where there is none as of yet. This is also the main ability of SlideforMap, to quantify how instability can be reduced under different forest scenarios.Ps. CC1 swapped study areas in the comment: Sta has 6% percentage change in AUC, Trub has 43%.SfM also makes predictions about the size of landslides most likely to be triggered in each location (though these are not currently reported in the paper). This is an important difference from previous models. Few models have done this before and those that have are extremely computationally expensive. Therefore the most exciting aspect of SfM to me is its ability to predict landslide size. The authors are clear that the model requires a prior distribution of landslide sizes but this does not prevent SfM from producing useful information on landslide size both in global/lumped terms and spatially distributed terms. In lumped terms, you could compare the size distribution for triggered landslides with the prior distribution. In the current case the prior is the observed size distribution for the study area but you could equally impose a uniform prior and assess the extent to the posterior matches the observed approaches both would be informative. In spatially distributed terms, the pattern of landslide size and its relationship to local conditions would be interesting and you would also be able to assess model performance with respect to landslides size by comparing the areas of predicted landslides that overlap observed landslides and (do they correlate? What is the form of the relationship?). Perhaps this type of analysis is reserved for a later study but it would fit nicely in the current paper.It is an interesting point of backcalculating and validation of landslide size distribution. This is however, not in the scope of this paper. Regarding size of the paper and as suggested by the CC we would like to reserve this for a future publication and mention it explicitly in the outlook part of the paper (conclusion). A sizespecific distribution is actually used in SlideforNet (https://www.ecorisq.org/slideforneten), a lumped version of SlideforMap for practitioners.Beyond these three major points I have several other questions that are more specific but less important. I don’t expect any of them to alter the primary messages of either the paper or the points I raise above but I hope they might be useful for the authors during revision. I do not understand the rationale behind some of the assumptions in SfM’s boundary resistance representation
Neglecting lateral earth pressure. It is true that active and passive earth pressure are maximised at some strain but neglecting them on this basis leaves two problems: a) you still need a treatment for the forces acting at the head and toe of the landslide; b) you need to apply the same criteria to root reinforcement since this is also maximised at some strain.
Thanks for pointing out that we did not do a good job in explaining the rationale behind our assumption. This will be specified in the revised version. In field experiments (Askarinejad et al., 2012) and numerical simulations (Cohen & Schwarz, 2017) it was shown that soil cohesion, active/passive earth pressure and root reinforcement are activated at different phases during soil replacement. a) We assume the phase where lateral root reinforcement in the tension crack is maximized to be the most stable, therefore we do our force balance here (as a sort of worstcase). Here we neglect passive earth pressure at the toe of the landslide because in that phase it is not usually activated (Cohen & Schwarz, 2017). b) This is the reason we only consider root reinforcement under tension (and not under compression, at this stage)
Neglecting soil cohesion on the sides. It seems inconsistent to apply root reinforcement but not soil cohesion on the lateral boundaries if you apply both on the base
We make the assumption that soil cohesion and root reinforcement are not additive along tension cracks at the lateral boundaries. With increasing displacement, roots exert an increasing root reinforcement until they break. Soil cohesion, however, only acts under very small displacement. At this point either one or the other acts. Only on the shear plane they are additive considering the effect of cohesion as component of the residual shear strength of the soil (Cohen et al., 2011).We focus on the tension boundary of the landslide because it has been shown that roots do not contribute considerably to the maximum passive earth pressure forces. The passive earth pressure and compression reinforcement are activated at a later state in shallow landslide initiation than the tension forces (Schwarz et al., 2005).All above will be specified better in the revised version of the paper.
Lateral root reinforcement acts only over the upslope half of the landslide’s perimeter (Eqn 3). I don’t see a justification for this and Schwarz et al., 2010 point out that it underestimates lateral reinforcement.
Landslide triggering happens in different phases. In our model, a first step is to decide on which triggering phase to focus on, i.e. which one to parameterize. As stated in the previous answer, we focus on the phase of maximum root mobilization. This will be made clearer in the revised version. In Schwarz et al., (2010), only trees along the landslide scarp (with the landslide already having occurred) are included. This results in an underestimation, which is addressed in their discussion. In SfM, we have a complete spatial field, so all trees are included. Upslope part of the landslide perimeter, not of the trees.
Lateral root reinforcement in Eqn. 9 is depth independent. This seems inconsistent with observed depth dependent rooting (density and size); and the depth dependence of basal reinforcement in SfM (Eqn 10).
The reviewer is right. An integration of the lateral root reinforcement over the soil depth will be included in a recomputation and revised version of the paper. Although we do not expect results to change much because most of the lateral root reinforcement is concentrated in the first 0.5 m of the soil depth (Vergani et al., 2016 and Gehring et al., 2019). The vast majority of our probabilistic landslides are deeper than 0.5 m.
Calculating root reinforcement using spatially averaged distance to trees within the Gamma function. Previous applications of the Gamma function (Eqn 9) appear to use it to predict root reinforcement at a known distance from the nearest tree (Moos et al., 2016). Given its nonlinearities, is it reasonable to use an average distance in Eqn 9 rather than evaluating Eqn 9 for the distribution of distances then averaging?
This is a good point. This was a parsimonious choice to reduce the computational effort; for a future paper, we will compute the difference between the lateral root reinforcement related to the average distance or the average root reinforcement of the full distribution and decide based on the results, which formulation to keep.VariabilityThe form amplitude and spatial pattern of variability in material properties are all likely important in defining landslide location and size (e.g. Bellugi et al., 2021). Representing this variability seems important. I would have liked to see more detail on your rationale for your choice of distribution form and spatial (de)correlation. I recognise that observations to inform this are sparse and these properties are not well known. The normal distribution has some specific problems that you grapple with but that others chose to avoid by using a lognormal (e.g. Griffiths et al., 2007). You deal with unphysical negative values by truncating, and claim these are rare but this places strict constraints on the variability that you can impose (small coefficients of variation for soil depth and cohesion in Table 5). In the absence of evidence to the contrary, a distribution that is limited to positive values (e.g. lognormal) would seem a more appropriate choice.Thank you for pointing this out. We agree that the normal distribution has its limitations and that a lognormal distribution is a more appropriate choice. We will apply this in a recomputation of the results.Soil depth variability is treated slightly differently (spatially decorrelated but slope dependent). I was unsure whether soil depths distribution was parameterised from observed landslide scar depths (L178) or using mean and standard deviation as parameters to optimise (Figure 7). The former seems problematic: landslides likely occur in deeper soils biasing the sample. Perhaps Eqn 7 was designed to account for this? However, I don’t understand why the coefficients on mu (1.35) and sigma1 (0.75) in Eqn 7 have these particular values. The second approach, tuning mean depth rather than setting it from observations seems more appealing to me and would also enable a comparison between model results and observed landslide depths, which would be a nice addition.Mean and standard deviation of the soil depth are tuned parameters. In our results in Fig. 7, we used the calibrated values from Table 6 for the mean and standard deviation of soil depth. The parameters 1.35 and 0.75 are part of a correction factor for the sampled soil depth to the slope. This was approximated by a survival function related to the slope which was calibrated manually based on the inventory. We will adjust L178. We will point out that calibration to the slope angle and the values for mu (1.35) and sigma (0.75) are optional.HydrologyYour approach is exactly the same as that of SHALSTAB and SINMAP but is considerably different from Topmodel (Beven and Kirkby, 1979). All three use a topographic index to define hydrologically similar units. Topmodel uses these (with simple treatments for evaporation and infiltration) to simulate a timevarying catchment averaged response to a rainfall timeseries that can be mapped back onto the HSUs; the others simply solve for a single steady recharge rate (neglecting these processes). Even the topographic index (i.e. A/sin(B)) differs from that of Topmodel (which uses ln(A/tan(B))). This reflects differences in reference frame (the sin vs tan) and assumed conductivity profile (uniform vs exponential). I don't disagree with the approach but I think it follows Montgomery and Dietrich (1994) and Pack et al. (1998) so it would be simpler to say that. If you wanted to give credit to earlier work then the TOPOG model of O’loughlin (1986) was behind the original derivation of SHALSTAB and the first introduction of a topographic index was by Kirkby (1975).Thank you for pointing this out with the detailed references . We will let go of the formulation of using TOPmodel or TOPmodel assumptions and give explicit credit to O'loughlin (1986) and Kirkby (1975).Previous papers that apply this hydrological model do not claim that it is particularly well suited to slopes with macropore flow. Montgomery et al. (2002) highlight the importance of macropores and fractures (and a steep soil water characteristic curve) for hillslope hydrologic response but also recognise that “that rapid pore pressure response that controls slope instability […] is driven by vertical flow, not lateral flow” (Montgomery et al., 2004). There is general agreement that lateral flow (modelled here) strongly influences the pore pressure field antecedent to a burst of rain that could initiate a landslide (Iverson, 2000; Montgomery et al., 2002; 2004). This has important implications for the approach though because it implies that Q/T is an index for the ‘propensity for landsliding’ rather than a parameter to be calibrated within a complete hydrological treatment. This explains the apparent problem of predicted pore pressures independent of rainfall duration but observations that landslide triggering depends on both intensity and duration. Broad spatial patterns of pore pressure and instability should be well captured but triggering rainfall properties may not be. In fact, discussion of the influence of macropores on pore pressure tends to focus on the unpredictable localised pressure peaks associated with constrictions or terminations to macropores (e.g. Pierson, 1983; Montgomery et al. 2002). Even given these limitations I don't think this is a bad model relative to the alternatives because it captures broad phreatic surface patterns and I'm convinced that the finer detail of these patterns is set by (unknown and perhaps unknowable) heterogeneity in material properties (e.g. macropores). If so, a more refined and expensive hydrological model may improve predictions of spatial pore pressure patterns very little.Thanks for this concise summary. We will update our introduction and the justification of the model to better relate it to underlying assumptions and previous work.Vertical flow driving is dependent on the situation in our opinion (as experiments cited in the paper by Askarinejad et al., (2012)). Moreover Montgomery and Dietrich (2004) argue that "lateral pore pressure diffusion does not appear to govern the site response time". They explain this inconsistency arguing that "the lateral flow response is dominated by the advective response", giving credits to the importance of lateral preferential flow in macropores. We agree with the CC that the process of building pore water pressure in soil during the triggering of shallow landslide is far from fully understood and we agree that the proposed approach is a good compromise to consider rainfall characteristics in slope stability modelling.Sensitivity AnalysisAs you point out parameter interaction makes it very difficult to infer parameter sensitivity from Figure 7 I think that may make it difficult to support some of your assertions in L388395 because you cannot guarantee that interactions are not masking other stronger sensitivities. For me the clearest example is the interaction between P and T (Table 6). Both are listed as uncertain parameters within the sensitivity analysis but only feature in pore pressure definition and only in that equation as the P/T ratio. As a result their inclusion as two separate variables in this analysis is likely to lead to severe equifinality (with high or low values will result in the same outcome as long as P/T is constant). Why not include the ratio of the two in your sensitivity analysis?Thanks for this important point. Our analysis on the sensitivity of the parameters values could indeed be refined as suggested above. We will complete further analyses and decide how to revise the manuscript. We will add the P/T ratio to Table 6, Figure 7 and Figure 8. Subsequently we will do an analysis on the equifinality as suggested by CC.Queries on equations:1) I think there is a dimensional problem in either the first term of Eqn 3 or the second term of Eqn 4. Eqn 10 expresses R_{bas} as a function of R_{lat} so I think both should be either a force per unit length or a stress. If R_{lat} (in Eqn 9) is a stress then Eqn3 is dimensionally incorrect because the first term is a force per unit length and the second a force. The first term needs integrating over landslide depth. This could take the form cos(s) H if you assume reinforcement is depth invariant. However, this would then be inconsistent with Eqn10, which assumes that root reinforcement declines with depth. On the other hand, if R_{lat} is a force per unit length (which might be more consistent with Moos et al (2016), Fig 3) then the problem may be more difficult to solve because the lateral depth integrated stress (N/m) is being applied across a basal area (m^{2}).Thanks for pointing this out. Indeed as consistent with Moos et al., (2016), R_{lat }is in force per length (N/m). R_{bas }however, is in Pa in accordance with Gehring et al., (2019), equation 4. It uses a dimension correction factor. Under the assumption of root isotropy, the dimension correction factor will have a value of 1 m^(1).2) Are h and H measured in a vertical reference frame as indicated in Figure 2? If so then I think there is a cos(s) missing from Eqn 12. The first cos(s) converts vertical depth to slope normal thickness, the second converts phreatic surface thickness to pressure head (under assumptions of: uniform steady slope parallel seepage).Thank you for pointing this out, we will include this in a revised computation and version.3) Eqn 15 is incorrect because the original equation calculates DBH in cm from tree height in metres (Dorren, 2017) but you use DBH in metres (L292). I think Eqn 15 should be adjusted to 0.01H^{1.25}.You are right, thanks for pointing this out. We will adjust this in the revisions. In the computations this went correctly.LITERATUREAmishev, D., Basher, L. R., Phillips, C. J., Hill, S., Marden, M., Bloomberg, M., & Moore, J. R. (2014). New forest management approaches to steep hills. Ministry for Primary Industries.Cohen, D., & Schwarz, M. (2017). Treeroot control of shallow landslides. Earth Surface Dynamics, 5(3), 451477.Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., … Boehner, J. (2015). System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev., 8, 1991–2007.Gehring, E., Conedera, M., Maringer, J., Giadrossich, F., Guastini, E., & Schwarz, M. (2019). Shallow landslide disposition in burnt European beech (Fagus sylvatica L.) forests. Scientific Reports, 9(1), 1–11. https://doi.org/10.1038/s41598019450737Kirkby MJ. 1975. Hydrograph modelling strategies. In Proc in Phys & Hum Geog, Reel et al. (eds). Heinemann: London; 69–90.Montgomery, D. R., & Dietrich, W. E. (1994). A physically based model for the topographic control on shallow landsliding. Water Resources Research, 30(4), 1153–1171. https://doi.org/10.1029/93WR02979Moos, C., Bebi, P., Graf, F., Mattli, J., Rickli, C., & Schwarz, M. (2016). How does forest structure affect root reinforcement and susceptibility to shallow landslides? Earth Surface Processes and Landforms, 41(7), 951–960. https://doi.org/10.1002/esp.3887O'loughlin, E.M., 1986. Prediction of surface saturation zones in natural catchments by topographic analysis. WRR, 22(5).Schwarz, M., Preti, F., Giadrossich, F., Lehmann, P., & Or, D. (2010). Quantifying the role of vegetation in slope stability: A case study in Tuscany (Italy). Ecological Engineering, 36(3), 285–291. https://doi.org/10.1016/j.ecoleng.2009.06.014Vergani, C., Schwarz, M., Soldati, M., Corda, A., Giadrossich, F., Chiaradia, E. A., … Bassanelli, C. (2016). Root reinforcement dynamics in subalpine spruce forests following timber harvest: A case study in Canton Schwyz, Switzerland. Catena, 143, 275–288. https://doi.org/10.1016/j.catena.2016.03.038 