Reply on RC1

I reviewed with interest this manuscript for possible publication in NEHSS journal. The work describes a comprehensive modeling tool to assess shallow landslides initiated by rainfall, in a probabilistic framework. The manuscript provides an interesting contribution in this field, although some aspects are strongly simplified, in contrast with others. The scientific quality is good, the reading is agile although the manuscript is overall a bit long and often dispersive. The literature review can be improved with additional appropriate references of strictly related works. The description of the climate forcing that initiates (or not) the landslide events requires significant improvement. To my opinion, the work can be published after some important clarifications and revisions.

insurance institutes, is interesting. However, in general, I found the introduction a bit dispersive, lacking in some aspects. The work of Dietrich and Montgomery, 1994, (SHALSTAB) represents the pioneering work within this approach, and it has been followed by many other deterministic work that gave different contributions in improving the hydrological modeling at support for the shallow landslide, such as the cited Iverson 2000, and, additionally, Rosso et al., 2006Claessens et al., 2007, Arnone et al., 2011Lepore et al., 2013;Simoni et al., 2008, Baum et al., 2002, Montrasio et al., (2011) (SLIP) (among the others). With regard to the effect of vegetation, the aspects related to the hydrological effects should be at least discussed, which can sometime be even more significant than the mechanical ones (Feng et al., 2020). An interesting review are by Chae et al., 2017, Gasser et al., 2019and the just published by Masi et al., 2021 Thank you for the positive reply on the insurance institute view and the many suggestions on hydrological papers employing the model scheme. We will include some of these to embed our approach better in current research. We agree that the hydrological effects of vegetation are significant in slope stability and it is important to discuss this in the introduction. In Feng et al., 2020, it is concluded that in extreme rainfall events, the hydrological effects decrease almost completely in importance. SlideforMap is focussed on these events and therefore we would like to keep it to a short discussion pointing this out.

Definition of the stability problem.
I found the definition of the problem of stability estimation (section 2.2, Figure 2) a bit misleading. It is not clear the definition of the volume of soil to which forces are applied. In the method of the limit equilibrium, under the hypothesis that the width of the landslide is sufficiently large so that the deformations are in the plane parallel to the soil thickness Hsoil (i.e. perpendicular to the elliptic landslide in figure 2), forces are assessed by considering a 'slice' of soil with unit width (in the direction parallel to the elliptic landslide plane). Figure 2 is confusing and the planes of forces are not well drawn. The limit equilibrium method (and infinite slope model) is based on the hypothesis of large and elongated element with respect to the soil thickness, so that a unit in width element can be considered. Also, Pwater is not indicated in the Figure 2. According to the definition in the manuscript, Rlat and Fres apply on different planes. I suggest to modify in a 3D perspective the Figure 2 and specify the hypothesis/assumptions.
Thank you for pointing out the confusion arising from figure 2. The authors will give the figure a 3D perspective in order to enhance clarity on the dimensions ,volume and force application planes of the assumed shallow landslide. Specifically we will adequately scale the shallow landslide elongation in an enlarged side view. We will add the water pressure (Pwater) as a subtraction of perpendicular force and emphasize the points or fields on which the forces apply.

Hydrology and precipitation.
Here is my main comment. The proposed modeling framework addresses shallow landslides that are initiated by rainfall, which is the triggering factor. The approach used (based on TOPMODEL) is extremely simplified because based on steady state conditions, which do not take into account the transient of the hydrological processes (Chae et al., 2017). The authors declare the limitation of the approach used in the discussion section, but this should be clearly stated soon in the methodology. As correctly written by the author, the stationarity is supposed to be reached within the hour of timestep. Clearly, this cannot be largely verified. That said I arise two more critical issues that are not mentioned by the authors: Under unsaturated conditions, soil (especially fine and clayey soils) exerts a strong water uptake effect due to suction, which leads to an apparent 'hydrological' cohesion. This represent a further limitation of the Montgomery and Dietrich approach that the authors should mention (see, works mentioned in Chae et al., 2017, e.g. Lepore et al., 2013. Thank you for pointing this out. The TOPMODEL approach does indeed have important limitations and is not verified in this research. The authors will address the issues relating to the TOPMODEL approach more distinctly in the methodology. The water uptake due to suction will also be addressed by the authors in the methodology. We note that the majority of the shallow landslides in the used inventory are in coarse soil material and over intense rainfall leading to high degrees of saturation. According to Monstrasio and Valentino (2008), these conditions result in a limited added cohesion. We believe, although the apparent hydrological cohesion is valid, it's effects are marginal in our case studies. This will be clarified in the revised version in the methodology section.
In the description of the model application (section 3.4.2) it is not clear how rainfall initiating events are selected. If I understood well, only events of 1hour duration are selected, whose intensity is identified from the Depth-Duration-Frequency (DDF) curve at different return periods (i.e. from 10 to 100 years). Therefore, I guess 10 events of 1 hours are simulated. Is that correct? If so, it should be explained and justified the reason of analyzing events of only 1 hours, which cannot be 'critical' for landslide initiation. Authors should deeply clarify this part in the manuscript, explain the methodology used to define the events, and report the parameters of the DDF curves.
The DDF curves are used to give an upper (100 year return period) and a lower (10 year return period) boundary for the range used in our sensitivity analysis. Subsequently 1000 (LHS) samples are drawn between these boundaries, along with all other model parameters that are calibrated. There are not 10 events simulated for 1 hour. The type of rainfall in Switzerland that triggers shallow landslides corresponds to this range of return period and magnitude (e.g. Rickli & Graf 2009). The focus on rainfall intensity is in line with the assumption that a steady state of pore water pressure in the soil is generated by preferential flow through macropores. As stated in line 73:78, a long duration low intensity rainfall is generally not critical in landslide initiation, whereas short duration (enough to reach a steady state) high intensity rainfall is critical (e. g. Guzzetti et al., 2004). 1 hour seemed us indicative to the situation, although a different duration could be critical. Unfortunately, we don't have data that gives us the exact intensity and duration of the triggering. We choose a steady stage approach and decided to vary the return periods instead of the duration, but both conditions could overlap. The method to identify the events will be better clarified in the revised version.

Data inventory The proposed methodology used to characterize the hypothetical landslides (extent) is strictly dependent on the data inventory (section 2.3), as also stated somewhere by the authors. However, it is important that the observed landslides used to characterize the model are of the same type, according to the hypothesis of the stability model used and all triggered by rainfall. Is it so? Please specify.
That's a good point. All landslides of the used inventory are triggered by rainfall. As shown in figure 4, most of the slides are shallower than 1.5 m. These are the ones we want to model and assume these to be representative for Switzerland.

Calibration/sensitivity analysis With regard to the best set of parameters, my question is: are the found parameters consistent and realistic?
Consistency in the found parameters is arguable. There appears to be a certain equifinality at play, which is quite common in multiparameter modelling (e.g. Beven and Binley, 1992). Table 1, below gives an example of the 5 best parameter sets (highest AUC) as used in the paper. Parameters with a high influence such as mean soil depth (mde) and Transmissivity (Tra) are consistent in their value proportional to the range of the sensitivity analysis. Parameters with a low(er) influence show higher variability. Realism of the parameters is determined by the ranges the authors choose in table 5. We believe, with the data available to us, we made the most realistic assumptions on the parameter ranges. We will add a comment on consistency and realism to the revised version.

For example, I argue the choice of including the precipitation intensity as calibration parameter. As discussed in the previous comment, rainfall represents the triggering forcing and it is a dynamic variable. Ideally, we should know the precipitation intensity associated to each observed landslide. Otherwise, if used as parameter, it seems that the model is tuned ad hoc just to reproduce the past events. If so, which could be its utility?
Rickli and Graf., 2009 mention an estimate of the rainfall events in the landslide inventory. The estimated duration of the event, however, is variable. This is hard to relate to hourly intensity. In addition, resulting from simplifications in the TOPMODEL approach (as stated by the reviewer as well) and the spatial variability in mountainous rainfall, the computed pore pressure is not exactly relatable to a real precipitation event. Therefore, in reality, our rainfall intensity calibration focuses more on reproducing the required pore pressure. The utility of this analysis is in the discussion on the overall performance of the model.
The model is indeed calibrated to past events in order to perform sensitivity analysis based on the most realistic combination of parameters. the calibration for the best performing set of parameters are used in this paper only to show the values of the parameters (table 6). This is used in the discussion on how realistic the parameters values are. For future case studies, we are planning on improving our hydrological approach in SlideforMap, which will more accurately relate to rainfall event (conditional on available sub-daily precipitation data). This further development is also based on the analysis in section 5.6.
Additionally, it would be interesting to see the AUC curves for the calibrated and the best model combinations. We agree that model development is an iterative process, where sensitivity analysis can be used to identify the parameters that are most sensitive. In this work, the parameters to be calibrated are selected based on literature, we identified these parameters as often recurring in shallow landslide modelling. In our opinion the sensitivity of all these parameters is interesting and can help in future development of SlideforMap or other models employing a similar method. As suggested by the reviewer, we will add the corresponding AUC curves to the paper.

Results
The result of high m_f and low m_c is quite obvious; as the author clearly say in the discussion, and as found by other past works, in the end only few parameters really affect the process: the geometry of the slope (i.e. the soil thickness), the mechanical properties (i.e. friction angle) and the characteristic of the trigger (i.e. precipitation) whose effects are controlled by the soil transmissivity. With regard to the vegetation: different vegetation scenarios are analyzed (and this is fine). which is the real configuration? Which is the ultimate target of the simulations?
Yes you are right, In the end it is only a few parameters that affect the process. In the shallow landslide probability pattern, these are the elevation model (determining the slope and specific catchment area) and the single trees datafile. The 'magnitude' of the pattern comes from soil thickness, friction angle, cohesion, precipitation and transmissivity, as mentioned by the reviewer. vegetation mitigates the magnitude and also the spatial pattern of the probability. It is also something land management actually influences and is therefore of ultimate importance on shallow landslide mitigation. We hope to quantify this effect of vegetation in our results and discussion and show its importance. Thus to summarize the ultimate goal is to assess forest (management) scenario's on slope stability. The real configuration is the single tree detection method. This will be specified in the revised version.

General
I suggest to clearly state which is the ultimate main target of the model. Can we use it as forecast tool in an early warning system? If so, in which way? My impression is that it is too constrained to the calibration parameters, which, in some cases, may lose their physical meaning.
Thank you for the point. Simplifications and calibration constraints make it hard to use as exact forecast tool. The main application the authors intended to model for is as a tool to quantify the effects of different vegetation scenario's for land managers. The authors will state this more clearly in the discussion but also already early in the introduction.

TECHNICAL CORRECTIONS
Reply: thanks for pointing out the corrections, we will implement them all.