Data limitations and potential of hourly and daily rainfall thresholds for shallow landslides

Rainfall thresholds are a simple and widely used method to predict landslide occurrence. In this paper we provide a comprehensive data-driven assessment of the effects of rainfall temporal resolution (hourly versus daily) on landslide prediction performance in Switzerland, with sensitivity to two other important aspects which appear in many landslide studies–the normalisation of rainfall, which accounts for local climatology, and the inclusion of antecedent rainfall as a proxy of soil water state prior to landsliding. We use an extensive landslide inventory with over 3800 events and several daily and hourly, station 5 and gridded rainfall datasets, to explore different scenarios of rainfall threshold estimation. Our results show that although hourly rainfall did show best predictive performance for landslides, daily data were not far behind, and the benefits of hourly resolutions can be masked by the higher uncertainties in threshold estimation connected to using short records. We tested the impact of several typical actions of users, like assigning the nearest raingauge to a landslide location and filling in unknown timing, and report their effects on predictive performance. We find that localisation of rainfall thresholds through normalisation 10 compensates for the spatial heterogeneity in rainfall regimes and landslide erosion process rates and is a good alternative to regionalisation. On top of normalisation by mean annual precipitation or a high rainfall quantile, we recommend that nontriggering rainfall be included in rainfall threshold estimation if possible. Finally, we demonstrate that there is predictive skill in antecedent rain as a proxy of soil wetness state, despite the large heterogeneity of the study domain, although it may not be straightforward to build this into rainfall threshold curves. 15

date. Then, depending on the rainfall dataset used, the timeframe is modified, and for hourly analysis a further selection is made of the entries with known timing (number of landslides per rainfall dataset reported in Table 1).

Rainfall thresholds
The methodology for the definition of rainfall thresholds follows the statistical procedure introduced in Leonarduzzi et al. 95 (2017). First we separate the rainfall timeseries into events, by considering a minimum amount of dry hours in between. We choose 24 hours for daily rainfall data and 6 hours for hourly rainfall data, after an optimisation within a range of 2-12 hours, which is the amount of dry hours expected to separate individual storms.
Then we classify rainfall events as triggering events if a landslide happens during or immediately after the event, and nontriggering otherwise. We compute the event duration, total rainfall, mean and maximum rainfall intensity for each event. We 100 then define optimal thresholds for each of the precipitation characteristics by finding the threshold that maximises the True Skill Statistic T SS = specif icity + sensitivity − 1, where sensitivity is the rate of true positives and specificity is the rate of true negatives. Additionally we also define total rainfall E versus duration D (ED) thresholds in the form of a power law function, E = a · D b , by optimising the two parameters a and b through TSS maximisation. As a reference, we provide also the results for the thresholds defined following the frequentist approach, first introduced in Brunetti et al. (2010), which is one of 105 the most widely used methods for ED fitting (e.g., Peruccacci et al., 2012;Vennari et al., 2014;Gariano et al., 2015;Iadanza et al., 2016;Melillo et al., 2018;Roccati et al., 2018). The optimum threshold in this case is based on triggering events only.
The exponent b is obtained by fitting the ED pairs with a line in log-log space. The intercept a is adjusted to match a chosen exceedance probability (in this paper we use the 5% exceedance probability as a reference).
For all analyses based on the gridded rainfall products, we consider the rainfall timeseries for each susceptible cell, for which 110 we define rainfall events following the procedure explained above. Susceptible cells are those rainfall cells in which at least one landslide was recorded in the respective timeframe of each dataset in Table 1.

Inaccurate landslides timing: triggering and peak intensities
One problem we face when utilising hourly rainfall records, is that the actual timing of historical landslides is typically not available or very uncertain/inaccurate. For instance, Guzzetti et al. (2007) report that out of the 2626 rainfall events associated 115 with shallow slope failures globally, only 26.3% had information about the date of occurrence and only 5.1% also the timing.
Although a common approach to compensate for the lack of accurate landslide timing is to assign the landslide to the rainiest hour within a certain time window, the effect of this approximation is not well known. Peres et al. (2018) showed the potential impact of timing and date uncertainty using synthetic databases by coupling a stochastic weather generation and a physically based hydrological and slope stability model. Staley et al. (2013) showed, using a precise debris-flow database, that using peak 120 rainstorm intensity instead of the actual triggering intensity, results in an overestimation of the ID threshold.
We study the wrong timing effect similarly to Staley et al. (2013) by introducing two scenarios as alternatives to the actual landslide database: one in which we assume that when the day of a landslide is known its timing is assigned to the most intense rainy hour within the day (this is the case of Staley et al., 2013), and a second alternative in which the timing is assigned to the rainiest hour within a 48h window centred on the actual timing recorded in the database (this is a hypothetical case which considers the fact that we may not have the right date recorded in the landslide database). Once the timing is altered accordingly, the modified landslide databases are used for the definition of ED thresholds following the same procedure as with the original true database. We carry out this exercise utilising landslides with known time of occurrence recorded between May 2003 and December 2010 (timeframe of RHIR).

130
In most studies where regional rainfall thresholds are defined, landslides in a region are assigned to the closest rain gauge, sometimes taking into consideration not only distance (Finlay et al., 1997;Godt et al., 2006), but also similarities in topography or other aspects important for precipitation (e.g., Aleotti, 2004;Berti et al., 2012;Gariano et al., 2012;Rossi et al., 2012;Melillo et al., 2018;Vennari et al., 2014). Nikolopoulos et al. (2015) showed that decreasing the density of the rain gauge network, the b parameter of the power law ID curve on average decreases, depending on whether the closest rain gauge is considered (nearest 135 neighbour) or simple interpolation methods are used such as inverse distance weighting or ordinary kriging. In the context of comparing the impacts of daily and hourly rainfall resolutions on landslide thresholds, we recognise that gauge density is very important and we construct an experiment to test the effects of gauge density and accuracy of spatial interpolation.
To do this, we define rainfall thresholds with the "closest rain gauge" based on the very sparse station-based hourly rainfall record RHG and compare it to the spatially-distributed disaggregated dataset RHIG. The comparison shows the effect of improving an hourly record obtained with a very sparse network, by taking advantage of a daily dataset based on a much denser network and an advanced interpolation method in RDI (Frei and Schär, 1998), merged with hourly station data. We propose two versions of the "closest rain gauge" approach used in many studies. First we assign each landslide to the geographically closest rain gauge and then extract rainfall events for each of the gauges which have at least one landslide (maximum 45 rain gauges) from the RHG dataset. Second we assign to each susceptible rainfall cell (as defined for the gridded rainfall products) 145 the rainfall of the closest rain gauge, and the event definition is carried out for each of these cells (maximum as many cells as the number of landslides) from the RHIG dataset.

Rainfall normalisation
One of the methods suggested to improve the predictive power of regional rainfall thresholds is to localise them. This can be done through regionalisation, by dividing the area into homogeneous regions and defining a different threshold for each of them 150 (e.g. Peruccacci et al., 2012;Leonarduzzi et al., 2017;Peruccacci et al., 2017), or by normalisation, that is defining thresholds based on the ratio between the precipitation parameters and a local scaling value, considered to be representative of local rainfall characteristics. Typically, the property chosen is the Mean Annual Precipitation MAP (e.g. Dahal and Hasegawa, 2008;Aleotti, 2004;Guzzetti et al., 2007;Leonarduzzi et al., 2017;Peruccacci et al., 2017), the Rainy-Day Normal RND = MAP n where n is the number of rainy days in a year (Guidicini and Iwasa, 1977;Wilson and Jayko, 1997;Guzzetti et al., 2007;155 Postance et al., 2018), or other precipitation characteristics (e.g. anomaly relative to 10 years return period rainfall in Marc et al., 2019).
In this paper we test in addition to the well established MAP and RDN normalisations also quantiles of event properties and of daily/hourly rainfall as scaling parameters. Note that there are fundamental differences between scaling with MAP, RDN or rainfall quantiles, in that MAP ignores intermittency of rainfall, while RDN and quantiles are computed only from the rainy 160 hours/days of the rainfall dataset.

Antecedent rainfall
The main criticism raised against rainfall thresholds for landsliding in general, is that they only consider recent/event rainfall, without taking into account the soil status prior to it (e.g., Bogaard and Greco, 2018). To include this antecedent soil moisture state into rainfall thresholds, several ad-hoc approaches have been introduced with varying levels of complexity and data 165 demand. The simplest of these consists in accumulating rainfall over a fixed duration prior to the triggering event rainfall (e.g., Chleborad, 2003;Frattini et al., 2009). In other studies the fixed duration has been modified to account for vanishing memory in rainfall using the Antecedent Precipitation Index (API), which gives less weight to rainfall contributions further back in time (e.g., Crozier et al., 1980;Crozier, 1986), often relating the decay coefficient to the recession curves of storm hydrographs, as first suggested by Glade et al. (2000). A further development of the API is the so-called Antecedent Wetness Index, which 170 accounts also for other hydrological variables by removing from antecedent rainfall the potential evapotranspiration, and then following the same approach as API (e.g., Godt et al., 2006). Finally, a few studies use estimates of the real antecedent soil wetness which are based on the soil water balance  or hydrological modelling (e.g., Segoni et al., 2009;Thomas et al., 2018), or obtained from on-site (e.g., Mirus et al., 2018) or remote sensing measurements (e.g., Brocca et al., 2012). 175 We test whether including antecedent conditions has an informative value on rainfall thresholds by separating rainfall events into 4 subsets, depending on whether they are triggering or not and whether they fall above or below the optimised ED power law threshold. Then for each of the 4 sets and each duration, we compute the average antecedent rainfall, and check if this is different for triggering and non-triggering events. This is the opposite of what is normally done by separating events into with and without antecedent rainfall a priori (e.g., Frattini et al., 2009). In order to ensure we have enough events for this analysis we 180 utilise the longest record available (RDI 1972(RDI -2018 and events with duration up to 6 days. Averaging the antecedent rainfall allows us to see general trends, not focusing on every individual event.

Daily and hourly thresholds
We define several rainfall thresholds by maximising TSS for the different rainfall datasets, as well as the associated timeframes  Figure 3) it can be seen as expected that performance is best with the high quality hourly rainfall product which uses high resolution radar information for the disaggregation of daily sums (RHIR). Disaggregation using the closest hourly rain gauge 6 https://doi.org/10.5194/nhess-2020-125 Preprint. Discussion started: 27 April 2020 c Author(s) 2020. CC BY 4.0 License.
(RHIG) seems to lead to worse performances than the corresponding daily analysis RDI (red and blue bars in the upper part of Figure 3). However this may be deceptive, as the time periods as well as the number of landslides behind the rainfall datasets are different. This is a critical point we investigate below.
A fairer comparison would be to compare performances over the same time period (05/2003-12/2010) and considering the same landslide events (comparison A in Figure 2 and middle panel of Figure 3). In this case, the differences in performances across the different rainfall datasets become smaller. The hourly disaggregated product using radar (RHIR) is still leading to the best performance, but the performance with daily data (RDI) is improved even with the simple disaggregation using the closest 195 rain gauge (RHIG), for all rainfall properties except duration. Remarkably, the daily rainfall dataset RDI retains reasonably good predictive power despite its coarser temporal resolution.
One additional comparison that can be made in the overlapping timeframe (05/2003-12/2010) is with all landslide events, regardless of whether the timing is also known or only the date. The performances obtained with daily data and all these events, are now comparable to the ones with the high quality hourly product (RHIR). The differences between the two are even more While the decrease of the thresholds and performances is consistent for all rainfall properties as the landslide dataset is 205 reduced, this is not a general result. Rather it demonstrates that the size and accuracy of the landslide dataset is important, and that results based on shorter records are likely to be less robust as they are more susceptible to individual events, years, outliers, or mistakenly reported landslides. This short-record bias is also evident when comparing daily thresholds obtained using the 1981-2018 timeframe or the shorter timeframe 05.2003-12.2010 for which RHIR is available (first and third bars in the bottom panel of Figure 3). The thresholds obtained with the latter are higher. The reason is that in 2005, 187 landslides occurred, most 210 of them due to a single intense summer storm in August. Considering all 38 years (1981-2018) the effect of that "outlier" year is reduced as it amounts to ca. 10% of the total number of landslides available with known timing (almost 40% within the period 05. 2003-12.2010).
Final visual evidence of the lower robustness of thresholds defined using hourly rainfall data, is found in the relative frequency plots of triggering events for hourly rainfall data, compared to daily (upper and lower portions of Figure 4). The trig-215 gering events at the hourly resolutions (634 events) are much more sparse than the corresponding daily events (2117 events).

Inaccurate landslides timing: triggering and peak intensities
Results of two different approaches are presented here to illustrate the case when historical landslide inventories have no timing information available. The landslides are assigned to the actual timing of the database, the most intense hour within the actual day, or the most intense hour within a 48h window centred around the actual timing. 220 We defined ED threshold using each of these modified landslide datasets (Figure 2). Searching for the most intense hour within the actual day of the landslides (# 6 in Figure 2) leads to optimal thresholds that are not far off from the ones defined using the actual timing (#3 in Figure 2). Instead, when the hour with the maximum intensity is found within a 48h window centred on the actual timing (#7 in Figure 2), the threshold changes, leading to a higher coefficient a and smaller slope b.
This observation is true for both threshold optimisation using TSS or following the frequentist approach, for which the 225 change in the threshold parameters is present also when limiting the time to the day of the landslides. The explanation for this difference is that the TSS maximisation approach for the definition of ED thresholds is relatively robust, as altering the timing of the landslides some triggering events might change their total rainfall and duration values, but non-triggering events are unaffected. What is important is that for the TSS maximisation in both scenarios of unknown adjusted timing, the TSS value associated with the best threshold is higher than if the timing was known.

230
All the observations presented here are valid also when carrying out the same analysis over the 1981-2018 time period using RHIG. The TSS maximisation leads to basically identical thresholds in the 3 scenarios but the TSS increases from 0.65 (actual timing) to 0.67 (most intense hour within the actual date), or 0.70 (most intense hour within a 48h window). Following the frequentist approach, the 5% exceedance the TSS also increases from 0.44 (actual timing) to 0.51 (most intense hour within the actual date), or 0.60 (most intense hour within a 48h window).

235
This means that if we do not know the timing of landslides accurately and assign them to some a priori decided rainfall event property, then we are overestimating the landslide prediction skill of our ED curves. Extending this to a situation in which the actual timing is unknown and this technique is applied to compensate for it, while the threshold might not be very far off, the user would overestimate model performance leading to a false overconfidence in his/her predictions.
Nevertheless, having to make a choice between the two methods of correcting timing, limiting the search of the rainiest 240 hour to the actual date, seems to be slightly better, with smaller overestimation of the performances (TSS), and threshold curve parameters more similar to the ones obtained using the actual timing. Considering a 48 h window not only leads to overestimation of the TSS, but the thresholds are also affected. For both threshold definition methods, the threshold in this case gets higher (higher a) and less steep (smaller b).

245
To test the importance of the general quality of the rainfall dataset in the context of the daily-hourly temporal resolution comparison, we use here the hourly gauge measurements (RHG) in a sparse network and the hourly gridded rainfall dataset (RHIG). The latter, takes advantage of the high quality daily record (RDI), which is based on a denser daily rain gauges network and accounts for climatology and topography (Comparison C in Figure 2).
As before, the comparison between the different rainfall datasets should not be based on the thresholds obtained, since both 250 triggering and non-triggering events can potentially change, but rather on the landslide prediction performances associated with them. When the rain gauge rainfall record is used directly (RHG), whether duplicated at each (closest) landslide location (#5 in Figure 2) or just using one timeseries per gauge (#4 in Figure 2), the sensitivity drops, and so does the TSS. The two rainfall datasets (RHIG and RHG) have exactly identical hourly rainfall fractions and differ only by the daily sum, which for RHIG is forced to match the RDI daily rainfall of the corresponding cell. When using the station hourly timeseries, the 255 triggering rainfall events have generally smaller event characteristics than the corresponding RHIG events. Out of total 634 8 https://doi.org/10.5194/nhess-2020-125 Preprint. Discussion started: 27 April 2020 c Author(s) 2020. CC BY 4.0 License. events, 423 have smaller maximum intensity, 382 have smaller mean intensity, 447 have smaller total rainfall, and 461 have shorter duration. This results in a decrease of the maximum TSS of up to 0.07, mostly due to a lower sensitivity (for total rain, the sensitivity drops from 0.72 to 0.63).
The same drop in performance is observed when following the frequentist approach (Figure 2). The TSS, which is 0.44 for 260 the analysis using the hourly timeseries adjusted with the daily product (RHIG), drops to 0.29 or 0.24, depending on whether the susceptible cells or rain gauge locations are used. In this case the effect on the threshold (ED curve) is also very consistent: the curves are lower (smaller a) and slightly steeper (higher b). This is a consequence of the fact that it is especially the short (intense) events that are missed (underestimated) when considering rainfall measurements further away from the actual location of landslides (RHG rather than RDI).

Rainfall normalisation
The improvement achieved by defining thresholds not based directly on the values of the different precipitation characteristics, but scaling them by a certain quantile of the corresponding event characteristic, a certain quantile of daily/hourly precipitation, or the mean annual precipitation is shown in Figure 5 for the daily RDI and hourly RHIR datasets. When searching for the event property thresholds, it seems to be irrelevant which quantile is chosen, as the TSS seems to be only slightly fluctuating around 270 a value somewhere between the no normalisation and the mean annual precipitation lines. Completely different behaviour is observed for the normalisation using quantiles of hourly/daily rainfall. In that case, performances comparable to the other cases are achieved only for the highest quantiles, especially for hourly data (right panels in Figure 5).
In general, best performances are obtained with normalisation by mean annual precipitation. In fact, with hourly data, this level of performance can only be reached for few very high rainfall quantiles of the total rainfall (centre right in Figure 5). With 275 daily data instead, performances are comparable with the mean daily precipitation and a wider range of quantiles (q > 0.4) of daily rainfall and event properties.
The results for the RDN normalisation (not shown here) are basically indistinguishable from the MAP, not in terms of value of optimum threshold, but of performances, with differences in the TSS of the normalised optimum threshold of less than 0.01.
The improvement of landslide prediction with normalised rainfall thresholds is statistically demonstrated, but it demands 280 a physical explanation. We hypothesise that the reason lies in the fact that the rainfall regime (climate) and the landsliding process (erosion) are connected through the landscape balance between weathering and soil formation, and the rainfall-driven erosion of the top soil by landsliding and other processes (e.g. Norton et al., 2014). In climates with a highly erosive rainfall regime and high topography, the rate of landsliding has adjusted to match the lower soil formation rates. Consequently, we need on the average higher rainfall intensities to generate landslides there. The scaling of rainfall thresholds by a high-intensity 285 rainfall quantile corrects for landscape scale differences between these process rates and leads to better prediction of landslide occurrence regionally. Evidence for this hypothesis can be found in some studies (e.g., Leonarduzzi et al., 2017;Peruccacci et al., 2017) and can also be observed by comparing the differences in triggering intensities to those of mean daily precipitation values in our data ( Figure 6). Here cells in which the mean daily precipitation is higher, also have generally higher triggering intensities. Accounting for this in the threshold definition, for example dividing the values of maximum intensity to the MAP of the corresponding cell, results in an improvement in the performances.
It is interesting to note that most of the rainfall triggering intensities are indeed among the strongest intensities recorded.
Most of the triggering intensities (circles in Figure 6) lie between the 0.75 and 1 quantiles of rainfall. This is the foundation for the success of rainfall thresholds for landslide prediction.

Antecedent rainfall 295
Including antecedent wetness or rainfall on regional/national scale thresholds is not a simple task. In fact, while antecedent rainfall -triggering rainfall thresholds are successful in many local studies (e.g. for the Seattle area, Chleborad, 2003), the results shown in Section 3.4 are indicative of the heterogeneity at the regional/national scale which will make antecedent rainfall signals difficult to detect. For example, the approach suggested in Chleborad (2003) of defining thresholds considering as variables the 3-day and the 15-day prior cumulative rainfall applied to the RDI data 1972-2018 shows no pattern useful 300 for the definition of thresholds. Nevertheless, the information content even in the simplest proxy of soil wetness, that is the antecedent rainfall, is clear .
In our experiment where we separated the events into observed triggering or non-triggering, and predicted triggering or not-triggering (above or below the ED threshold obtained maximising TSS) and plotting the mean antecedent rainfall for 5 and 30-day periods, we can see that antecedent rainfall can explain some of the misclassifications generated by the ED threshold 305 (Figure 7). We expect that some of the misses (triggering events below ED curve) were actually landslides caused by low rainfall amounts on very wet soil. At the same time some false alarms (non-triggering events above ED curve) were wrongly predicted as triggering and weren't due to a very low antecedent rainfall. These are exactly the patterns we observe in Figure   7. Higher intensity events are generally associated with higher antecedent rainfall, due to seasonality effects (typically in the wetter periods of the year), the false alarms are associated with clearly smaller antecedent rainfall than the true positives, and, 310 even more importantly, the misses have, for almost all durations, higher antecedent rainfall than the false alarms. As expected, the true negative events have on average the smallest antecedent rainfall for most durations.
The highest antecedent rainfall for misses (triggering events below ED curve) for events of duration of 1 day could be indicative of the importance of antecedent conditions, either because the wrong event has been identified as triggering or because those are really triggered due to previous high soil wetness conditions rather than the event itself. However, we cannot 315 provide evidence that this is the case. The patterns for the 5 and 30 days antecedent rainfall look very similar, showing that the antecedent conditions are consistent over longer periods. The only difference is in the true negatives, which for the 30 days, have a much smaller mean antecedent rainfall than the other events.
might not be a straightforward exercise, and that many more aspects should be taken into consideration before concluding that the highest temporal resolution is best for landslide prediction.
We argue that from a theoretical point of view, hourly rainfall data are superior to daily data as they can capture the short convective events lasting few hours which are known to trigger landslides and which get averaged out in the daily sum. This has also been confirmed in some studies, e.g. Marra (2019) and Gariano et al. (2019), where different temporal resolutions 325 were compared to show the underestimation of thresholds at lower temporal resolution. Also in the work presented here, when we consider the exact same time period and landslide events, we see that performances at the hourly temporal resolution are superior to those at the daily resolution, especially for high quality datasets (RHIR). On the other hand, we show with this work that there are several additional factors that should be taken into consideration.
Choosing hourly rainfall data usually implies dealing with shorter historical records, lower quality (sparser) rainfall datasets, 330 and less rich landslide databases. Typically in the past rain gauges were mostly recording precipitation daily, which means that the daily datasets go further back in time, allowing for an analysis spanning over many more years. Taking the example of Switzerland, since 1961 ca. 420 gauges are available for generating the RDI rainfall product. The first hourly gauges start to appear around 1981, and only 45 of those are consistently measuring until 2018. The much lower density of hourly rain gauges makes the quality of the interpolated product lower, or the distance between observed landslide and measured rainfall 335 locations greater, and therefore less representative. In recent years (ca. since 2012) the number of hourly gauges has increased dramatically, with 270 stations at the moment, but this would allow an analysis on maximum 7 years (compared to the 48 years available at the daily resolution). The variability in the optimum threshold for the different time periods is proof of the risk of using shorter timeframes (lower panel in Figure 3).
At the hourly resolution also the richness of the landslide database is affected, as not only the date but also the timing of the 340 landslide must be known. Staley et al. (2013) addressed this issue and showed the overestimation of thresholds when considering peak rainstorm instead of triggering intensity, when the actual timing is unknown. This generally leads to overestimation of the maximum intensity, but potentially also other event parameters. Here, especially when the threshold is obtained maximising TSS, the optimum threshold does not seem to change much, at least if the landslide date is known. This seem to be a better choice whenever possible, as allowing a larger window (48h cantered on the actual timing) leads to a bigger threshold change, 345 both if maximising TSS or following the frequentist approach. Nevertheless, in both cases, the performances are overestimated if the peak intensity is used to time the landslide, giving the user overconfidence in the threshold values themselves.
Some last factors to take into consideration when choosing the temporal resolution, are that in many countries hourly records of rainfall could be even shorter and of lower quality than in Switzerland, and choosing to work with daily data might be even more important. Furthermore, thinking of utilising rainfall thresholds in an operational setting, daily forecasts are usually more In all the comparisons between hourly and daily rainfall, we purposely refrained from comparing the value of the optimal thresholds and of the ED curves between hourly and daily analysis. In fact, to allow this comparison, strong assumptions must be made, which are clearly not realistic, such as assuming that the daily intensity is 24 times the corresponding hourly intensity. This is in agreement with the recommendation in Gariano et al. (2019) and other studies to not extend daily ED or ID rainfall 355 thresholds into the sub-daily domain.
Two methods for rainfall threshold estimation were presented here, to show that the threshold optimisation method used does not impact the main conclusions. While our work does not intend to compare the two methods, the results presented here show clearly that accounting for triggering events also in the definition of the threshold (e.g. maximising TSS) increases the robustness of the obtained threshold. In fact, while the performances and the parameters of the ED curves are affected in both 360 cases, the frequentist approach seems to be more sensitive, with greater differences in optimal thresholds and greater variability in performance (e.g. see variability of the optimum ED thresholds in Figure 2). Nevertheless, there might be conditions in which rainfall records are not available and only triggering events can be reconstructed from newspaper and other historical records.
In those conditions, a method like the frequentist approach would be the only option.
Lastly, we make an argument in our work for the benefits of normalising the rainfall thresholds. The different normalisation 365 methods we test show that high quantiles of rainfall intensities, quantiles of event properties, MAP and RDN are all valuable parameters to be used, especially with daily rainfall data. Nevertheless, we suggest using MAP as a general and widely available climatological variable.

Conclusions
We define and test rainfall thresholds for triggering of landslides by taking advantage of a rich landslide database and several 370 rainfall products available in Switzerland with the main objective of providing a realistic comparison between hourly and daily rainfall resolutions. We explore the impacts of other issues, like shorter datasets, unknown timing, and more sparse networks, that usually accompany higher temporal resolution data, and we test the impacts of two typical analysis steps in threshold definition: normalisation of the threshold, and antecedent rainfall.
Our main findings are:

375
-Although hourly rainfall is more appropriate for landslide prediction, several aspects should be taken into consideration before utilising it exclusively for threshold definition. Generally, hourly rainfall records are shorter (only available in recent years), and of lower quality (e.g. based on sparser rain gauge networks), landslide database only seldom contain accurate timing.
-In ideal conditions, hourly datasets do show best predictive performance for landslides, but daily data are not far behind.

380
The benefits of hourly resolutions can be masked by the higher uncertainties in threshold estimation connected to using short records and unknown timing.