Optimizing and validating the Gravitational Process Path model for regional debris-flow runout modelling

Knowing the source and runout of debris-flows can help in planning strategies aimed at mitigating these hazards. 10 Our research in this paper focuses on developing a novel approach for optimizing runout models for regional susceptibility modelling, with a case study in the upper Maipo river basin in the Andes of Santiago, Chile. We propose a two-stage optimization approach for automatically selecting parameters for estimating runout path and distance. This approach optimizes the random walk and Perla’s two-parameter modelling components of the open-source Gravitational Process Path (GPP) modelling framework. To validate model performance, we assess the spatial transferability of the optimized runout model 15 using spatial cross-validation, including exploring the model’s sensitivity to sample size. We also present diagnostic tools for visualizing uncertainties in parameter selection and model performance. Although there was considerable variation in optimal parameters for individual events, we found our runout modelling approach performed well at regional prediction of potential runout areas. We also found that although a relatively small sample size was sufficient to achieve generally good runout modelling performance; larger samples sizes (i.e. ≥80) had higher model performances and lower uncertainties for estimating 20 runout distances at unknown locations. We anticipate that this automated approach using open-source software R and SAGAGIS will make process-based debris-flow models more readily accessible and thus enable researchers and spatial planners to improve regional-scale hazard assessments.

). There are also many empirical-statistical and numerical 30 methods available to model runout patternssee McDougall (2017) for an overview.
Not all runout methods are suitable for application to large areas. Many of the physically based methods require event-specific geotechnical and rheological parameters, such as material composition (e.g. bulk density and source depths) and flow characteristics (e.g., flow discharge rates). These parameters, such as debris-flow volume, can be extremely difficult to obtain for large areas, let alone single unobserved events (Marchi and D'Agostino 2004;Dong et al. 2009). Consequently, runout 35 modelling at larger scales has been progressing towards applying simplified conceptual models to simulate debris-flow patterns across different environmental conditions. These models combine spreading algorithms to control runout path with empiricalstatistical or numerical friction models to simulate likely runout paths and distances (Guthrie et al. 2008;Horton et al. 2013;Wichmann 2017;Mergili et al. 2019). Many of the spreading algorithms, including multiple flow direction models (Holmgren 1994), cellular automata (Guthrie et al. 2008) and random walk (Gamma 2000;Mergili et al. 2015), simulate runout paths 40 using only topographic data.
Calibration of model parameters continues to be one of the main challenges in runout modelling for single-events and over large areas (Hungr 1995;van Westen et al. 2006;Schraml et al. 2015;McDougall 2017;Mergili et al. 2019). The objective of model calibration is to determine parameter values that best capture main debris-flow characteristics such as runout distance, velocity, and distribution of deposits (Hungr 1995;McDougall 2017). Approaches for model calibration include adjusting 45 parameters based on visual inspection (i.e. trial and error ;Hungr 1995;Mergili et al. 2012); expert knowledge (Horton et al 2013); posterior analysis (Mergili et al. 2019;Aaron et al. 2019); and optimization algorithms that aim to minimize a cost function, i.e. a quantitative measure of runout model performance. Some measures of performance include estimates of the intersection-over-union (Galas et al. 2007), area under the receiver operating characteristic curve (AUROC; Cepeda et al. 2010;Mergili et al. 2015) and depth error (Aaron et al. 2019) of simulated and observed debris flows. Since most of these 50 calibration approaches are for single observed events, they rarely consider how transferable tuned parameter sets are from local to regional applications.
Assessing spatial transferability is essential for testing the assumption a trained model based on a sample of events captures the generic debris-flow characteristics across a region (Fabbri et al. 2003). The distribution of training data and the sample size can have a strong influence on the model calibration and performance of regional models (Heckmann et al 2014;Petschko et 55 al 2014;Rudy et al 2016). For spatially distributed models, spatial transferability can be assessed by exploring model parameter selection and performance under different spatial partitioning scenarios of training and test data (Wenger and Olden 2012;Brenning 2012;Schratz et al. 2019;Mergili et al. 2019). Although spatial transferability has been well explored for regional landslide susceptibility models (Brenning 2005;Lombardo et al. 2014;Petschko et al. 2014;Goetz et al. 2015a;Cama et al. 2017;Knevels et al. 2020), such analysis is not common for regional runout modelling.
In this study, we developed an optimization procedure for process-based models applied for regional simulation of debris-flow runout patterns. The performance evaluation focuses on the spatial transferability and sensitivity to sample size of an optimized regional debris-flow runout model. We achieve this by utilizing the open-source statistical software R to add optimization and evaluation functionality to the open-source Gravitational Process Path (GPP) modelling framework (Wichmann 2017).
Additionally, this study demonstrates the use of spatial cross-validation and visualization techniques to diagnose uncertainties 65 in the prediction of source areas, runout paths and runout distances, including the sensitivity of optimized parameter selection.
The aim of this research is to contribute to improving the development of quantitative techniques for runout model calibration and uncertainty estimation (McDougall 2017). This is especially important in large and inaccessible mountainous areas where various types of mass movements pose unique challenges to the safety of the local population, the integrity of transportation infrastructure, and the reliability of drinking water supplies. 70 2 Materials and Methods

Study area
Out study area is the upper Maipo river basin (3303 to 3418 S, 800 m to 6108 m a.s.l.), located in the semi-arid Andes of central Chile. Debris-flow activity in remote and populated areas of the Maipo river basin have caused many deaths and severe disruptions to critical transportation and water supply infrastructure supporting Chile's capital city Santiago (Hauser 2002;75 Sepúlveda et al. 2015;Moreiras and Sepúlveda 2015).
High-intensity rainfall , rapid snowmelt (Moreiras et al 2012) and seismic activity (Serey et al 2019) are the main triggers of debris flows in this region. They occur in steep gullies and talus slopes consisting of gravel, small boulders and a fine sandy-silty matrix. Much of this material is from weathered volcanic and sedimentary rocks of the Abanico and Farellones formations in the western Main Cordillera (Sepúlveda et al 2006). A typical runout track will cut through previously 80 formed debris flow channels and alluvial fans, resulting in new erosion and deposition paths . Rainfalltriggered runout distances in this area have been observed up to 5.5 km, and the thickness of deposits varies from 1 to 2 m in deposition areas .

The debris-flow inventory
Debris flow polygons and source points were mapped based on photointerpretation of high-spatial resolution (0.50 m) satellite 85 imagery (2000 to 2019) from CNES/Airbus and Maxar Technologies available through Google Earth Pro software, field observations, reviewing news articles and the compilation of data collected by public authorities. Each mapped polygon represents a debris-flow track that includes source, runout and deposition area. In total, 541 source points and 521 debris flow polygons were mapped (Table 1). Manually mapping all debris flows across the upper Maipo basin is a challenging task due https://doi.org/10.5194/nhess-2021-22 Preprint. Discussion started: 3 February 2021 c Author(s) 2021. CC BY 4.0 License.
to its large geographical extent and its high abundance of mass movements. Therefore, a mapping strategy was employed that 90 divided the basin into 58 sub-drainage basins (5,439 km 2 ), 45 (3,936 km 2 ) of which were selected for mapping (Figure 1).

Modelling debris-flow source areas
Potential debris-flow source areas were spatially predicted using a generalized additive model (GAM). In general, GAMs demonstrate good performance for susceptibility modeling compared to other commonly used physically-based and machinelearning techniques (Goetz et al. 2011;Goetz et al. 2015a). To improve model generality and avoid overfitting, the GAM 100 smoothing spline parameters were allowed a maximum 5 effective degrees of freedom (Wenger and Olden 2012;Goetz et al. 2015b). The training and test data were based on the common 1:1 sampling strategy (Heckmann et al. 2014) of presence to absence of source points.
The predictor variables of source areas included hillslope angle, elevation, catchment area, plan curvature and distance to faults. These predictor variables generally have a high importance for modeling debris-flow initiation susceptibility as observed 105 in previous works (Blahut et al. 2010b;Goetz et al. 2015a;Angillieri 2020). The publicly available ALOS PALSAR Radiometrically Terrain Corrected (RTC) high-resolution (12.5 m) digital elevation model (DEM; ASF DAAC) was used to derive terrain attributes. Before deriving the terrain attributes, mesh denoising was applied to the DEM to mitigate the propagation of artifacts such as high-frequency noise (Brock et al. 2020) in the prediction of source areas (Sun et al. 2007;Stevenson et al. 2010). We used the implementation of this algorithm in SAGA-GIS. After denoising, an algorithm to fill sinks 110 was applied to the DEM, and the terrain attributes were processed. Distance to faults was calculated as the Euclidean distance from the fault lines (Servicio Nacional de Geología y Minería de Chile 2003; scale 1:1,000,000).
The performance of the source area prediction model was assessed using repeated k-fold spatial cross-validation (Brenning 2012). Like cross-validation, spatial cross-validation randomly splits the data (e.g., source points and non-source points) into k disjoint subsets, where the model is trained using k-1 sets and tested with remaining set during each cross-validation iteration. 115 For spatial cross-validation, the data is divided into spatially disjoint sub-areas, in our case using the k-means clustering algorithm (Ruß and Brenning 2010). This approach should provide a rigorous estimate of the spatial transferability of a model by attempting to reduce spatial autocorrelation between test and training data (Brenning 2005;Wenger and Olden 2012;Schratz et al. 2019). We estimated model performance by repeating 5-fold spatial cross-validation 1,000 times. Model performance was measured using the AUROC, which is an overall measure of goodness-of-fit that is independent of any particular decision 120 threshold.

Modelling debris flow runout
The Gravitational Process Path (GPP; Wichmann 2017) model was used to regionally model runout. The GPP model is an open-source framework in the SAGA-GIS software that provides users with various model components to simulate runout path, distance, velocity and deposition of material of mass movements (e.g. snow avalanches, rock falls, and debris flows). 125 Due to the extent and remoteness of the study area, we focus on modelling the likely spatial patterns of runout. That is, we are not modelling flow velocity and depth. Runout path was modelled using the random walk process path component of the GPP model (Gamma 2000). It is a common approach for debris-flow runout path modelling at medium scales (Mergili et al. 2012;Heckmann and Schwanghart 2013;Mergili et al. 2015). Random walk models the potential path of runout by iteratively simulating (via Monte Carlo simulation) 130 the downslope movement path of debris flows originating in an individual source-area grid cell. This simulation results in a grid with runout frequencies that indicate how many times a grid cell has been traversed. There are three parameters that need to be calibrated to obtain a desired runout path: (1) a slope threshold (°) defining where divergent flow is allowed; (2) the exponent of divergence that controls the amount of divergence, or lateral spreading; and (3) a persistence factor that controls the direction of movement (Wichmann 2017). 135 Runout distance was constrained using Perla's two-parameter friction model (PCM; Perla et al. 1980) component of the GPP model. The PCM model, which is also a component of the Flow-R model (Horton et al. 2013), has also been used for modelling debris-flow behaviour at medium scales (Mergili et al. 2012;Heckmann and Schwanghart 2013;Mergili et al. 2015). It is a centre-of-mass model where motion is controlled by (1) the sliding friction coefficient and (2) the mass-to-drag ratio. The sliding friction coefficient controls the velocity of movement and the mass-to-drag ratio (m) controls velocity movement over 140 steep terrain (Wichmann 2017).

Optimizing model parameters
For regionally applying the runout model we needed to determine the combination of model parameters that result in the best match to our debris-flow inventory. Determining optimal parameters was based on two criteria: (1) the ability of the model to capture our observed runout paths, and (2) its ability to match the observed runout distances. Therefore, we performed this 145 optimization task using a two-stage approach that first optimizes the random walk model and then the PCM model parameters.
A random sample of 100 debris-flow tracks and corresponding source points was used for optimizing the runout models. This sample of the inventory was chosen to facilitate quality control and reduce the computational complexity of the optimization. Source areas were determined by buffering each source point by 50 m and masking away the buffered area that exceeds the runout trimline; this ensures the source area is contained within a mapped debris flow polygon. A sink-filled version of the 150 original DEM was used for the runout modelling.
For each model component, a grid search in parameter space was used to find parameter sets that achieve optimal model performance across all sampled debris flows. The search ranges were similar to Wichmann's (2017) suggested parameter limits for debris flows (see Table 2 for value ranges). We additionally tested the use of a spatially varying sliding friction coefficient.
The value for this spatially varying sliding friction coefficient  was calculated as a function of catchment area a (km 2 ) for (1) Similar to Mergili et al. (2012) and suggested by Wichmann (2017), the spatially varying sliding friction coefficient was set to a maximum of 0.3 and minimum of 0.045. Runout was computed using 1,000 model iterations.
The AUROC was used as a performance measure for the random walk model. Model performance was rated higher if the random walk model contained observed debris-flow tracks within its simulated paths. After optimizing the random walk model, 160 we fixed these parameters for the PCM model, and optimized the sliding friction coefficient and mass-to-drag ratio parameters for determining runout distance. The performance of the PCM model was measured using the relative error of runout distance.
Relative error was used so that each debris flow was weighted equally regardless of its magnitude.
Runout distance was measured in terms of horizontal length of the debris-flow track. This distance was measured as the length of a bounding box containing the observed debris-flow track (Figure 2; Niculiţǎ 2016;Taylor et al. 2018). Estimated debris-165 flow tracks were defined as grid cells with values greater than a median runout frequency (Figure 2). In this case, the median value represents the most typical simulated debris-flow track. It also provides a conservative estimate of runout distance, which helps mitigate the chance that the optimized model regionally underestimates runout distances. In addition to determining a global optimal parameter set, the best-performing parameters for individual debris-flow events were also explored. We similarly applied a grid search to each event and determined optimal parameter sets based on performances (AUROC and relative error) of each runout model component.

Assessing spatial transferability 175
We assessed the transferability of optimized runout model (random walk and PCM) parameters by performing 5-fold spatial cross-validation with 1,000 repetitions (Figure 3). This approach allows us to explore the sensitivity of grid-search optimized parameter combinations to spatial variation in training and test data. To do this, we observed the frequency of variations in https://doi.org/10.5194/nhess-2021-22 Preprint. Discussion started: 3 February 2021 c Author(s) 2021. CC BY 4.0 License.
optimized parameter combinations within all cross-validated iterations. Optimal parameter combinations that occurred more frequently were considered to have a higher degree of transferability; thus, being considered more reliable for application to 180 the entire study area. We also assessed if there were any spatial patterns in the optimized performance for each model component. That is, were 185 there any spatial trends in model performance that may indicate our model is locally overfitting? We explored for such spatial trends by mapping the distribution of individual debris-flow runout model performances based on the optimized parameters.
Additionally, we were concerned if the optimized parameters had a stronger fit to debris flows of a certain magnitude or initiating conditions. Therefore, the potential to overfit to certain debris flow characteristics was assessed by determining Spearman's rank correlation of individual debris-flow performance (for each model component) with observed runout distance 190 and the elevation, catchment area and hillslope angle of the corresponding source points.

Testing sample-size dependence of performance
We explored how runout model parameter selection, performance and robustness were affected by the number of debris flows used for optimization. This analysis was done by applying repeated spatial cross-validation to data sets of varying training sample sizes. To ensure a fair comparison, the size of the test data for each cross-validation iteration was set to 20 debris flows, 195 which is the maximum test sample size when performing 5-fold cross-validation with a sample of 100 debris flows. We tested training samples sizes from 10 to 80. Model performance for each runout modelling components was summarized using the median and IQR (AUROC and relative error). The optimal parameter set for a given sample size were determined as the parameter combinations that were most frequent.

Finding a suitable source area prediction threshold 200
For regionally applying the runout model for susceptibility mapping or exposure analysis, a dichotomous classification of the predicted source areas is required to define the grid cells where simulated debris flows initiate. In the case of basing the source areas on a susceptibility model, a suitable threshold of the prediction values needs to be selected. In this study, we determine a suitable prediction threshold to classify source areas by searching for the threshold that results in the best-performing runout model for the entire area. Using the optimized model parameters, we tested runout models based on source areas that were 205 delineated using prediction thresholds from 0.5 to 0.95 with a step of 0.05. The performance of each these models was measured using the AUROC. The AUROC was calculated using a sample of 1,000 debris-flow runout locations and 1,000 non-debris flow locations. The source-area prediction threshold resulting in the highest AUROC values was selected for regionally computing a debris-flow runout map for our study area.

3.1 Source area model performance
The overall performance of the source area prediction based on the GAM was good with a spatially cross-validated median AUROC of 0.80 and an interquartile range (IQR) of 0.001. We found the source-area prediction map was also geomorphologically plausible. Locations most likely to be source areas were within steep terrain associated with channels, gullies and scree slopes. Shallow-flat terrain and areas along ridgelines were modelled as least likely to be source areas ( Figure  225   4). This geomorphological knowledge was also expressed in the plots of the GAM spline transformations (Figure 5).
Relatively steep terrain, slightly concave profile curvature, and areas near faults were modelled as more likely being source areas.

Runout model parameter optimization
The parameter optimization produced runout models with a good spatially cross-validated performance. The optimal parameters for the runout-path model were a slope threshold of 40, an exponent of divergence of 3.0, and a persistence factor of 1.9 with a median AUROC of 0.94 (IQR = 0.02; Table 2). Using these values as plug-in estimates for the PCM runoutdistance model component, the optimal sliding friction coefficient and mass-to-drag ratio were 0.11 and 40 m, respectively. 240 The median spatially cross-validated relative length error of the runout-distance model was 0.11, or 11% (IQR = 0.09). We also found that the model based on a global sliding friction parameter estimate performed better than the runout-distance model using a spatially varying sliding friction coefficient (optimal mass-to-drag-ratio = 95 m;  By visualizing the runout-distance optimization results across grid search space, we can observe model performance and sensitivity to different parameter combinations. In this case, we observed only slight model performance differences for sliding friction coefficients just under 0.2 to 0.04 and mass-to-drag ratios from 20 to 150 m (Figure 6a). In general, the values in this band across grid search space would result in good performances with median relative errors ≤ 0.15. However, in terms of controlling the spread of error (Figure 6b), sliding coefficients from about 0.05 to 0.15 and mass-to-drag ratios from 20 to 95 255 m had the lowest IQRs (≤ 0.2). This result illustrates a generally low sensitivity in runout distance model for a wide range of parameter combinations.
Exploring the optimal combination of parameter values using spatial cross-validation provides insights into performance reliability of the optimized model. Given different spatial combinations of testing and training data, we found that the optimal combinations of parameters were associated with high performance values (AUROC = 0.94, relative error = 0.11) and high 260 relative frequencies (61-69%) of occurring in each cross-validation iteration (Figure 7). Additionally, the spatially crossvalidated optimal parameter sets were generally clustered in grid search space, showing there was little variation in optimized parameters depending on the spatial partitions (Figure 7).   : Model performance and frequency of optimal parameters for the runout path (a; given an optimized slope threshold = 40) and runout distance models (b) estimated using 1,000-repeated 5-fold spatial-cross-validation. Relative frequency is the percentage of all repeated spatial-cross validation iterations where a given parameter combination was optimal.

Thresholding source areas for runout analysis 275
The best threshold for delineating source areas from the GAM prediction for runout modelling was 0.7, which results in runout affecting 22% of the study area. This threshold had the peak AUROC value of 0.83. The performance of the runout model drastically decreased with thresholds > 0.8 and gradually decreased towards a thresholds of 0.5: lower thresholds spatially predicted more runout area (Figure 8). The resulting runout prediction map (Figure 9) appears to be geomorphologically plausible. The locations of the runout deposit areas are similar to what we have observed in the satellite imagery. Additionally, 280 lateral spreading is generally low in narrow gullies and much broader on talus slopes, which is what we would expect for debris flows occurring in our study area.

Exploring patterns in model performance
We observed no clear spatial pattern of individual debris-flow performances of the runout model components (Figure 10), 290 which is evidence that there was no local overfitting to a particular region of the study area. The distribution of individua l AUROC performances of the runout path model were generally high for most of the region, showing that optimization of the path model performs well across the study area. One debris flow was not captured within the runout path model (AUROC < 0.5; Figure 10; Figure 11). Having a closer look at this occurrence, we found that the modelled runout paths failed to follow the flow direction of the observed debris flow and source point. 295 The runout distance model had generally more spatial heterogeneity in performance (Figure 10), but again no clear spatial patterns in the distribution of relative errors were observed. We investigated the debris-flows that did not perform well for https://doi.org/10.5194/nhess-2021-22 Preprint. Discussion started: 3 February 2021 c Author(s) 2021. CC BY 4.0 License. modelling runout distance (relative error > 0.8) by looking at the satellite imagery used for mapping. It was found that these cases were related to misclassifying stream erosion in relatively shallow and long-hillslope channels as debris flows. Overall, the regionally optimized runout modelled distances fit well with our mapped observations (Figure 11). It tended to slightly overestimate debris-flow travel lengths with a median runout distance error of 16.6 m (Figure 11). This error is just slightly over a single grid cell size of the DEM. The highest runout distance errors were due to misclassified debris-flows, as 305 previously mentioned. The optimization of the runout model avoided overfitting to higher or smaller runout distance and elevations of source points. That is, we observed no clear relationship between runout model optimization performance to the observed runout length and the elevation, catchment area or hillslope angle of the training source points (Figure 11) correlations of estimated and observed runout distance were -0.36 and -0.17 for runout path (AUROC) and runout distance (relative error model) performance, respectively. The runout model error correlations with elevation, catchment area, and 310 hillslope angle of the source points were -0.21, 0.11 and 0.29 respectively.

Optimization for individual debris-flows
The optimal parameters for individual debris-flows were also computed for general comparison to the regional model. We found that the optimized runout model parameters were highly variable for individual debris flows (Figure 12). The relative errors were low (median = 0.002, IQR = 0.007), except for a few debris flows that failed to optimize (Figure 12). There was no clear spatial pattern in optimal sliding friction coefficient and mass-to-drag ratio parameter combinations across the study 320 area, which shows that runout characteristics were quite diverse for individual debris-flows across this large study area (Figure https://doi.org/10.5194/nhess-2021-22 Preprint. Discussion started: 3 February 2021 c Author(s) 2021. CC BY 4.0 License.

13).
We also observed no strong relationships between terrain attributes and optimal parameters. Spearman's correlations of sliding friction coefficient and mass-to-drag ratio with elevation, slope and catchment area were ≤|0.29|.

Runout model robustness to sample size
Runout model performance and variation tends to improve slightly when using larger sample sizes (Figure 14). Large sample sizes also resulted in more consistently selected model parameters across spatial cross-validation iterations (Figure 15). The larger spread across grid search space of the relative frequency of optimal parameter combinations for smaller sample sizes illustrates that we may be less likely to find the best model parameters that generalize well for a large region (Figure 15). In 335 general, as we increased sample size, we reduced sampling variability and narrowed the number of optimal combinations of parameters in grid search space, meaning we became more confident that the selected parameters would transfer better to estimate runout distance in adjacent areas.

.1 Model performance and transferability
Assessing the spatial transferability of runout models is essential when extending their use from single or local events to regionally modelling runout across unknown space. Overall, this study demonstrates that our novel optimization approach performed well at regionally modelling the spatial distribution of runout path and distances across the upper Maipo river valley basin. A key component of the success of our modelling approach was its ability to generalize. The transferability of a regional 350 runout model can be affected by the generalization ability of both the source area prediction and the optimized process-based runout model.

Source area modelling
In the development of our source area prediction model, we aimed to produce a simplified empirical model that would result in good performance and transferability. Wenger and Olden (2012) recommended to control the flexibility of the GAM 355 smoothing parameters to avoid overfitting that would lead to poorer model transferability. Therefore, similarly to previous landslide susceptibility studies using the GAM (Goetz et al. 2011;Goetz et al. 2015b;Bordoni et al. 2020) we limited the degrees of freedom for smoothing-spline fitting.
A detailed model based on a large set of predictors can impede its ability to transfer to other locations (Tuanmu et al. 2011).
As we and others have demonstrated, good predictive performance of susceptibility models of debris flows can be achieved with a relatively small set of predictors, which are primarily DEM-derived terrain attributes (Blahut et al. 2010b;Heckmann et al. 2014;Goetz et al. 2015a). Additionally, by fitting our models with multi-temporal data, we may be more likely to achieve better transferability in time (Tuanmu et al. 2011;Knevels et al. 2020). Event-specific inventories may not be large enough for regional optimization and risk the potential of overfitting source conditions spatially varying conditions of that event (e.g., precipitation and snowmelt patterns). A smaller sample may also lead to not capturing the range of terrain conditions across 365 the study area required for robust empirical modelling of source area locations (Petschko et al. 2014;Rudy et al. 2016).

Runout analysis
By optimizing the runout distance model using the median relative error as a metric, we managed to reduce the impact of possible outliers in our training and test data. We additionally reduced our chances of overfitting the regional model to larger debris flow events, which was crucial for a model to maintain a generalization that makes it transferable across large areas. 370 Plotting the individual optimized models and exploring correlations between terrain attributes and optimal parameters allowed us to see if there were any broad trends in the parameter selection. In our study, we observed high variability in optimal parameters the PCM model. While applying a trial and error approach, Mergili et al. (2012) also observed different optimal parameter combinations when modelling runout for a couple of debris flows just north of our study area. It seems apparent that determining runout parameter values for regional modelling without an optimization procedure would be difficult. It is 375 also no surprise that the errors of the individually optimized debris-flows were low (Figure 12). Whether meaningful or not, the optimization approach should be able to match the runout distance with very low error. Poor individually optimized events could be attributed to locally poor DEM quality and/or complex or ambiguous events. It was interesting that despite the variety in optimal parameters, we were still able to produce a good spatially distributed model of runout patterns using a global set of parameters. This may indicate that the combination of random walk and the process based PCM model dictates a general 380 runout pattern that is insensitive to values within a broad and nearly optimal range of physically reasonable parameters. For regional applications, such models that are already well adapted to generalize runout patterns are likely to transfer well to unknown locations, as we observed in this study by assessing runout model transferability.

Regionally optimizing runout models
As with any optimization problem, using a suitable cost function is critical to ensure the model parameters are optimized to 385 solve a very a specific problem. It can be difficult to define a single metric that simultaneously measures both performance of path and distance simulations. As shown in this study, it may also not be necessary. By using the two-stage approach to optimization, where we first optimize the runout path model and then plugin those values to optimize the runout-distance model component, we can considerably reduce computational complexity, while ensuring that our final model result explicitly optimizes for runout distancea key characteristic for spatially predicting areas potentially impacted by debris flow runout. 390 In our case, we only needed to solve two separate problems with 3 (random walk) and 2 (PCM) unknown parameters, as opposed to solving simultaneously for 5 unknowns. We used an exhaustive grid search to optimize the runout model components because it allowed us to visualize and assess model performance across parameter space. However, a drawback of this optimization method is that it can be computationally slow to explore all candidate parameter combinations. If speed is a requirement for regional runout analysis, then a faster method like random search (Bergstra and Bengio 2012) may be 395 preferred (Schratz et al. 2019).
In terms of model improvement, analysis of the spatial distribution of optimal parameters may lead to better parameter optimization based on terrain or geological characteristics. The spatially varying values used in this study were based on modelling for alpine regions (Gamma 2000), and do not account for the potential variability in sliding conditions between different catchments (Guthrie 2002). This challenge may be overcome by using regional modelling strategies already applied 400 for landslide initiation susceptibility modelling, where the study area is divided into geologically similar regions, runout model optimization is performed for each region and then combined (e.g., Petschko et al. 2014).

Conclusions
Modelling the spatial pattern of debris flow runout in large, mainly remote areas, with sparse data, requires model calibration and validation methods that ensure spatial transferability. In this study, we demonstrated that our integrated two-stage 405 optimization approach is one such method suitable for regionally simulating debris-flow runout patterns. Through this analysis, we observed a general insensitivity in runout distance performance of the PCM model to a range of parameters, which is an indication of the model's inherent capability to characterize general runout behaviour across our study area. Future improvements to our approach may include building a model that allows for more spatial variation in optimized parameters, especially in regions with better availability of soil physical information. Overall, our open-source modelling approach, which 410 enhanced the GPP model by adding support for automated model calibration and transferability assessment, provides an accessible and extendible modelling framework for such advances in regional runout modelling.

Code availability
To go with this paper, we developed the runoptGPP R package for optimizing mass movement runout models using the random walk and PCM model components of the GPP tool in SAGA-GIS. It is available for download from the GitHub repository: 415 https://github.com/jngtz/runoptGPP (DOI: 10.5281/zenodo.4428050). This package contains tutorials on how to apply the optimization methods regionally or for single runout events. The GitHub repository also contains the R code used to conduct our analysis.