Note: This file contains supplementary information that highlights the overall data analysis in R code. Although the accompanying text here is mostly the same as in the accompanying paper for illustrative purposes, you should refer to the final, peer-reviewed paper for citations. You can click on the Code button in the upper right of this file to download its *.Rmd version.

Landslide inventories have become cornerstones for estimating the relationship between the frequency and size of slope failures, thus informing appraisals of hillslope stability, erosion, and commensurate hazard. Numerous studies have reported how larger landslides are systematically rarer than smaller ones, drawing on probability distributions fitted to mapped landslide areas or volumes. In these models, much uncertainty concerns the larger landslides (defined here as affecting >0.1 km2) that are rarely sampled, and often projected by extrapolating beyond the observed size range in a given study area. Relying instead on size-scaling estimates from other inventories is problematic because landslide detection and mapping, data quality, resolution, sample size, model choice, and fitting method can vary. To overcome these constraints, we employ a Bayesian multi-level model with a Generalised Pareto likelihood that provides a single, objective, and consistent comparison grounded in extreme-value theory. We explore whether and how scaling parameters vary between 37 inventories that, although incomplete, bring together 8627 large landslides. Despite the broad range of mapping protocols and lengths of record, and differing topographic, geological, and climatic settings, the posterior power-law scaling exponents remain indistinguishable between most inventories. Likewise, the scaling statistics fail to separate known earthquake from rainfall triggers, and event-based from multi-temporal catalogues. Instead, our model identifies several inventories with outlier scaling statistics that reflect intentional censoring during mapping. Our results thus caution against a universal or solely mechanistic interpretation of the scaling parameters, at least in the context of large landslides.

1 Introduction

Keeping records of the size and frequency of landslides is key to estimate rates of erosion, geomorphic work, and hillslope evolution (Dente et al. 2023; Saito et al. 2014; Marc et al. 2019); infer material strength and weathering of hillslopes (G. K. Li and Moon 2021; Alberti et al. 2022); and inform hazard appraisals of slope instability (Guzzetti et al. 2012), particularly in response to contemporary climate change (Smith et al. 2023). A popular way to characterise the relative frequency of landslide-affected areas or volumes is to fit probability distributions to size data compiled in inventories (Malamud et al. 2004). These catalogues contain locations and geometries of individual footprint areas mapped largely from air photos or satellite images. The choice of probability distribution (or “scaling laws”) has favoured the inverse power-law or Pareto, the inverse gamma, or the lognormal distributions (Tebbens 2020), or combinations thereof (Jain, Khosa, and Gosain 2022). All these distributions are skewed, often heavy-tailed, and capture the widespread observation that larger landslides are systematically rarer than smaller ones.

Reported values of the parameters that define these distributions have seemingly narrow numerical ranges (Tebbens 2020). This similarity among model fits has led to a lively discussion about whether these parameters reflect generic geometric or mechanistic properties of landslides or the hillslopes that they occur on (Bellugi et al. 2021; Bernard, Lague, and Steer 2021). For example, physical interpretations of the “roll-over” that marks the lower bound of inverse power-law distributions include that of a hillslope length scale that is susceptible to failure, or the cohesive strength of failure planes (Tebbens 2020). While landslide size distributions may reflect the nature and spatial intensity of a common trigger such as a strong earthquake (Valagussa et al. 2019), the average landslide size may contain information about both cohesive strength of slope material and hillslope relief (Medwedeff et al. 2020). Some of these physical interpretations are backed by, or derived from, numerical simulations of slope instability (Frattini and Crosta 2013). Yet, a different line of arguments proposed that the roll-over is a statistical artefact of landslide detection and mapping, approximately marking the smallest discernible landslide in a given study area (Tebbens 2020). Either way, this discussion has questioned whether these scaling laws are universally applicable to landslides irrespective of environmental setting, mapping methods, and trigger mechanisms (Malamud et al. 2004; Tanyaş et al. 2019). Many of these interpretations have relied on the direct comparison of reported parameter values, and scrutiny concerning possible effects of data sources and quality, mapping method, and statistical errors of the fitted models has increased in more recent work (Bellugi et al. 2021).

Still, most uncertainty remains about the large landslides that are rarely sampled naturally. Hence, the bulk of studies on landslide size has disclosed little about these large landslides, let alone their prediction as first-time failures (Fan et al. 2019). One reason for this knowledge gap is that large landslides are often elusive in catalogues compiled shortly after a landslide-triggering earthquake or rainstorm (Hao et al. 2020; Abancó et al. 2021; Santangelo et al. 2023). Sample sizes often involve only a handful to several dozen large landslides, and thus often remain too small for robust statistics in a given study area. Hence, inference is mostly based on the simple extrapolation of model fits beyond the observed size range. Yet, large landslides may often re-shape hillslope geometry and dominate erosion (Korup et al. 2007; Marc et al. 2019), but may involve phases of creep motion, and respond differently to triggering conditions than smaller failures because of a longer and more complex slope history of accumulated stress and strain (Lacroix, Handwerger, and Bièvre 2020).

In general, the statistics of landslides are derived from a sequence of detection, mapping, and statistical inference (Fig. 1). Uncertainties that propagate throughout each step can affect the outcome in terms of landslide scaling statistics.

Landslide scaling statistics rely on accurate detection, mapping, and statistical inference. These three (colour-coded) steps are prone to a number of uncertainties that may propagate as errors into the derived landslide statistics. Boxes with black outline are most likely directly tied to physical processes of landsliding.
Landslide scaling statistics rely on accurate detection, mapping, and statistical inference. These three (colour-coded) steps are prone to a number of uncertainties that may propagate as errors into the derived landslide statistics. Boxes with black outline are most likely directly tied to physical processes of landsliding.


At the level of the input data, both landslide detection and mapping face several constraints. The mapping objective can dictate whether to focus on landslides attributed to a single trigger such as a strong earthquake (Meunier, Uchida, and Hovius 2013; Gorum et al. 2014; Tanyaş et al. 2017; Lombardo et al. 2019), a rainstorm (Abancó et al. 2021; Hao et al. 2020; Emberson et al. 2022; Santangelo et al. 2023), or instead to compile landslide traces that have accumulated over years to millennia (LaHusen et al. 2016; Luetzenburg et al. 2022; Fusco et al. 2023; Pánek et al. 2016). Some of the most comprehensive catalogues today feature thousands to hundreds of thousands of slope failures across entire nations or beyond (Luetzenburg et al. 2022; Fusco et al. 2023). The methods to detect, map, and compile landslides have become more diverse and elaborate beyond traditional mapping from air photos, optical satellite data or historical records (Y. Xu et al. 2020; Casagli et al. 2023). Newer catalogues derive from laser scanning (Bernard, Lague, and Steer 2021), radar imagery (Song et al. 2022), object-based image analysis (Milledge et al. 2022), deep neural networks (Schönfeldt et al. 2022; Bhuyan et al. 2023), text mining (Franceschini et al. 2022), and seismology (Hibert et al. 2019). Most methods require specific mapping protocols adjusted to the effective resolution of imagery that may be compromised by vegetation, land cover, cloud, and shadow (Brardinoni, Slaymaker, and Hassan 2003; Burrows, Marc, and Remy 2022). Varying image quality, resolution, and coverage also affect landslide size estimates, as does the experience of mapping operators (Van Den Eeckhaut et al. 2005). The mapping outcome may depend on whether a single person or a team is at work, and the attention to detail in delineating source, deposit, or total affected areas. Overlapping landslide source areas or bodies can obscure the dimensions of slope failure (Marc and Hovius 2015). Debris- flows and snow-avalanche tracks, moraines, and wind-throw gaps in forests can be mistaken for landslide evidence. Experience and training aid detection and mapping, but also introduce bias, for example favouring fresh landslides, certain types of failure, or a size range that is easiest to recognise. Hence, the size range of a given inventory is bounded by the smallest mappable, and the largest recognisable, landslide (Barlow et al. 2012). Thus, without any standard mapping protocol in place, landslide researchers have to deal with catalogues of varying coverage, detail, and quality for the same task of obtaining traits of landslide size.

At the model level, estimates of size scaling hinge on the choice of probability distribution to characterise the mapped landslides (Brink et al. 2009). These estimates also depend on sample size, data pre-processing, fitting method, residuals, and cross-validation. Numerical experiments show that small sample sizes yield volatile estimates of scaling parameters for inverse power-law distributions in particular (Korup, Görüm, and Hayakawa 2012). Most estimation methods involve either regression of log-binned - and thus smoothed - landslide frequencies versus size (Gilham, Barlow, and Moore 2018), or maximum likelihood estimates (Clauset, Shalizi, and Newman 2009); various biases apply to both methods. However, reports of confidence intervals or goodness of fit, and hence ways to assess overfitting remain rare. Still, the basis for statistical inference of landslide size distributions varies at the level of the individual inventory, each of which embodies the methods of detection and mapping used, and the biases they may induce. In the light of these constraints, a direct comparison of landslide scaling estimates between different inventories may be misleading.

We propose a compact solution to compare more fairly the landslide-size distributions from diverse inventories by estimating scaling parameters with a single, probabilistically consistent model. We apply this model to large landslides that affect a total area of >0.1 km2, and address the problem of small sample size by using Bayesian inference in a multi-level model that uses data from multiple inventories to estimate the variance of scaling parameters within and across these catalogues (Luna and Korup 2022). The multi-level approach acknowledges structure in landslide size data in a consistent way. One natural grouping of data is by inventory, and reflects the diversity in data input quality reviewed above. Our focus on large landslides makes the Generalised Pareto Distribution (GPD) a natural model choice, because extreme-value theory predicts that data above a high threshold are approximately GP distributed (Castro-Camilo, Huser, and Rue 2022). Another advantage of this distribution is that its parameters can be translated directly into those used most widely in studies of landslide size scaling.

2 Data and Methods

We consider data on total landslide-affected areas from several dozen published landslide inventories with open access. We excluded many other detailed catalogues that had no records of landslides meeting our size threshold of 0.1 km2. Besides information about their size, many databases have landslide types and triggers reported, and many data were recorded following recent (i.e. post-1900) major earthquakes and rainstorms with the intention to characterise the impact of these events. We also included catalogues spanning time intervals of several years to millennia, featuring mostly undated large landslides with unknown triggers to test whether these cumulative inventories have size distributions that differ from those of event-based inventories.

# Read data file on landslide inventories
zz <- read_csv("lsinventory_scaling_master.csv")

# Order by target variable
zz <- zz %>% arrange(xx)

# Create any subsets if needed
zz <- zz %>% filter(!is.na(Inventory))

# Add synthetic noise to break ties in ranks for exceedance probabilities
zz$xx <- zz$xx + runif(length(zz$xx))

The choice of probability distribution to model landslide area often rests on implicit assumptions. For example, the inverse power law draws on considerations of physical sand-pile models and the concept of self-organised criticality (Hergarten and Neugebauer 1998), whereas the lognormal distribution arises naturally from multiplicative effects of random variables (Brink et al. 2009). Here we model the reported sizes of large landslides with the Generalized Pareto Distribution (GPD). The GPD is rooted in extreme-value theory and approximates the distribution of a continuous random variable \(x\) above a high threshold (or location parameter) \(\mu\). The GPD thus captures what we would expect theoretically from a sample consisting of observations filtered above a minimum value (Katz, Parlange, and Naveau 2002). Any physical interpretation of the GPD parameters may need to account, or correct, for this statistical expectation first. The probability density function of the GPD is:

\[ \mathrm{GPD}(x|\mu,\sigma,k) = \frac{1}{\sigma} \left( 1 + \frac{k \left( x - \mu \right)}{\sigma} \right) ^ {-1/k - 1}, \]

where \(x >= \mu\), \(\sigma > 0\) is a scale parameter, and \(k \geq 0\) is a shape parameter. The scale parameter \(\sigma\) is somewhat comparable to the “roll-over” in studies using an inverse power-law model for estimating landslide size scaling. This roll-over marks the lowest landslide size above which power-law scaling is assumed. The GPD shape parameter is the inverse of the “scaling exponent” of the power-law tail \(\alpha\), such that \(k = 1 / \alpha\).

Here, the location parameter \(\mu\) sets the minimum landslide size for data to be admitted to the GPD. This is known as a peak-over-threshold approach in extreme-value statistics (Katz, Parlange, and Naveau 2002). For large landslides, we let \(\mu\) = 0.1 km2. Empirical relationships between landslide volume and total affected area across a wide range of environmental settings show that an area of 0.1 km2 corresponds to an average volume of roughly 106 m3 (Larsen, Montgomery, and Korup 2010), which is the suggested lower threshold for large landslides (McColl and Cook 2024). This particular choice of \(\mu\) is a compromise, because fewer samples and landslide inventories are available for higher values of \(\mu\), whereas the GPD becomes a less and less valid approximation to the data for lower values of \(\mu\).

# Consider only POT data, i.e. those xx >= min_size
zz$xx <- zz$xx / 1e6 # convert to [km ^ 2]

# Record minimum landslide area in full inventory
zz <- zz %>% 
  group_by(Inventory) %>% 
  mutate(minArea = min(xx))

# Set minimum size mu (location parameter)
min_size <- 0.1 # [km ^ 2]

# Record number of landslides below mu
zz <- zz %>% 
  group_by(Inventory) %>% 
  mutate(n_below_mu = sum(xx < min_size))

# Record total landslide area
tot_A <- sum(zz$xx)

# Filter for POT data
zz <- zz %>% filter(xx >= min_size)

Fitting the GPD to data can involve maximum likelihood estimates or, in case of few samples (\(n\) < 30), probability-weighted (L-)moments to avoid volatile parameter estimates (Katz, Parlange, and Naveau 2002). We use Bayesian inference to learn the GPD parameters from the data, acknowledging explicitly that these are derived from different inventories that reflect different environmental conditions across study areas, and landslides likely detected at varying resolution and mapped with different techniques. A Bayesian treatment of this fitting problem seeks a compromise between a likelihood function and a probability distribution of prior knowledge about the model parameters. This approach also obviates the need for binned landslide-size data to use frequency density (e.g. Malamud et al. 2004). Instead, we work with the joint probability that is the numerator of Bayes’ Rule:

\[ p(\theta|\mathcal{D}) = \frac{p(\mathcal{D}|\theta) p(\theta)}{p(\mathcal{D})}, \]

where \(\theta\) is the vector of model parameters that we wish to update from both the landslide data \(\mathcal{D}\) and prior knowledge. We use the GPD as the likelihood function \(p(\mathcal{D}|\theta)\) and choose (hyper-)prior distributions \(p(\theta)\) to approximate what we know about landslide size distributions so far and irrespective of the data \(\mathcal{D}\) studied here.

Our model uses a multi-level setup, in which \(i \in \{1,\dots,n\}\) indexes each landslide observation \(x_i\) from a sample of size \(n\), and \(j \in \{1,\dots,J\}\) indexes each of \(J\) different landslide inventories. The idea of the multi-level model is that the size distribution in each landslide inventory \(j\) is characterised by an individual set of GPD parameters \(\sigma_j\) and \(k_j\). We further assume that the values of each of these inventory-specific parameter pairs are drawn from the same two probability distributions:

\[ x_i \sim \mathrm{GPD}(\mu, \sigma_{j[i]}, k_{j[i]}) \]

\[ \sigma_j \sim \mathrm{Gamma}(\alpha_\sigma, \beta_{\sigma}) \] \[ k_j \sim \mathrm{Gamma}(\alpha_k, \beta_{k}) \]

Here we choose independent Gamma distributions for both \(\sigma_j\) and \(k_j\) to ensure that the parameters are positive and uncorrelated. The multi-level model thus learns the parameters for each catalogue informed by both its data, the overarching Gamma distributions, and prior knowledge. While the model allows \(\sigma_j\) and \(k_j\) to vary between landslide inventories, it also draws on information from the full data set via this multi-level structure.

Bayesian reasoning requires that we specify our prior knowledge explicitly. We do this by choosing the hyper-parameters of the two Gamma distributions of \(\sigma_j\) and \(k_j\). These hyper-parameters describe the distribution of landslide scaling parameters across all inventories and offer a global summary from all data. We draw on the growing literature of landslide scaling. Recent reviews have summarised that the power-law scaling exponent \(\alpha\) for landslide inventories is most often reported in the range of 1 < \(\alpha\) < 3 (Tebbens 2020). Recalling that the GPD shape parameter \(k = 1 / \alpha\), we can use this information to constrain our (hyper-)prior distributions accordingly. Hence we choose these hyper-parameter values such that they contain findings from landslide scaling studies based on data other than the ones used here. The exact shape of these distributions may matter little in the light of the large sample size that informs our likelihood function. We disregard any correlation between the hyper-parameters and simplistically assume independent distributions:

\[ \alpha_\sigma \sim \mathcal{N}(1, 0.25) \] \[ \beta_{\sigma} \sim \mathcal{N}(5, 5) \] \[ \alpha_k \sim \mathcal{N}(6, 1) \] \[ \beta_{k} \sim \mathcal{N}(9, 1) \]

where \(\alpha_\sigma\) and \(\alpha_k\) are the corresponding shape parameters, and \(\beta_{\sigma}\) and \(\beta_{k}\) are the inverse scale (or rate) parameters. We assume independent Gaussian prior distributions for these hyperparameters and choose the prior means and standard deviations informed by previous research on landslide scaling properties (Tebbens 2020).

# Set minimum samples per inventory
min_grp_size <- 25

To avoid having too many inventories with only a handful of large landslides, we consider only those data collections with at least 25 landslides that exceed the threshold size \(\mu\).

# Filter for minimum samples per group
min_grp <- zz %>% 
  group_by(Inventory) %>% 
  summarise(count = n()) %>% 
  filter(count <= min_grp_size)
zz <- zz %>% filter(!(Inventory %in% min_grp$Inventory))
# Compute empirical exceedance probabilities per group
zz <- zz %>% 
  group_by(Inventory) %>% 
  arrange(xx, .by_group = TRUE) %>% 
  mutate(p = seq(n(), 1, -1) / n())

The data that we need for a numerical approximation of the posterior distribution of the parameters of the GPD consist of the individual landslide areas and labels of the inventories they belong to.

# Prepare data input for STAN (custom)
stan_d <- list(
  ymin = min_size,
  N = length(zz$xx),
  y = zz$xx,
  L = length(unique(zz$Inventory)),
  ll = as.numeric(factor(zz$Inventory)),
  ymax = aggregate(zz$xx, by = list(zz$Inventory), max)$x,
  #yt = 10 ^ seq(2.9, 5, 0.01),   # testing data
  #Nt = 211
  yt = zz$xx,   # testing data
  Nt = length(zz$xx),
  lltt = as.numeric(factor(zz$Inventory))
)
# Data overview
(mydat <- zz %>%
  group_by(Inventory) %>%
  summarise(Samples = n(),
            `Max. area (km^2)` = max(xx) %>% round(1),
            `Min. area (km^2)` = min(minArea) %>% round(4),
            `n < threshold` = max(n_below_mu),
            Trigger = first(Trigger),
            Reference = first(group)))

The size threshold \(\mu\) means that we had to discard many published landslide inventories that only contain smaller slope failures. Our data thus consists of 8627 large landslides filtered from 37 different inventories. Together, these large slope failures affected an area of 6407 km2, or 59% of the total landslide-affected area recorded in these catalogues. The largest landslide is unnamed and extends over 201 km2 in the Caspian Sea basin (Pánek et al. 2016). Our data thus span more than three orders of magnitude in landslide area; the largest mapped landslide areas per inventory differ by up two orders of magnitude.

We note that 19 (or 51%) of our selected landslide inventories were compiled following an earthquake, including several versions that were mapped by different research teams. Only 3 catalogues (8%) are attributed to a rainfall trigger, while 15 catalogues (41%) contain information about landslides that accumulated over many years and thus likely reflect various triggers.

We use the probabilistic programming language STAN (Carpenter et al. 2017) to code our model and call it via the statistical programming environment R. We ran four independent Hamiltonian Monte Carlo chains to explore the model parameter space with the No U-Turn (NUTS) sampler coded in STAN and verified that the numerical solutions converged. Unless stated otherwise, we use medians and 95% highest density intervals (HDIs) to summarise all posterior distributions. A 95% HDI means that there is a 95% probability for a given parameter to be in the specified interval.

# We extract the posterior into a more amenable format:
posterior <- rstan::extract(mfit)
rm(mfit)

# Obtain posterior predictions
mypred <- data.frame(A = stan_d$yt,
                          lo = apply(posterior$predccdf, 2,
                                     HDIofMCMC)[1, ],
                          md = apply(posterior$predccdf, 2,
                                     median),
                          hi = apply(posterior$predccdf, 2,
                                     HDIofMCMC)[2, ])

# Combine with data
zz <- cbind(zz, mypred)

# Collect hyperparameters
post_hyper <- as.data.frame(cbind(posterior$alpha_k, 
                                  posterior$beta_k,
                                  posterior$a_sigma,
                                  posterior$b_sigma,
                                  posterior$lp__))
names(post_hyper) <- c("alpha_k", "beta_k", "a_sigma", "b_sigma", "logposterior")

3 Results

3.1 Model fits and residuals

We express the size distributions of large landslides in cumulative form using the exceedance probability \(p\) for a given landslide area.

ggplot(zz, aes(x = xx,
               y = md,
               col = factor(Inventory))) +
  geom_line() +
  geom_ribbon(aes(ymin = hi, ymax = lo, fill = factor(Inventory)), 
              alpha = 0.35, linewidth = 0) +
  scale_color_viridis_d(begin = 1, end = 0) +
  geom_point(data = zz, aes(x = xx, y = p), alpha = 2/3) +
  labs(x = expression(paste("Landslide area (", km^2, ")")), 
       y = "Empirical exceedance probability") +
  scale_x_log10(breaks = trans_breaks("log10", function(x) 10 ^ x, n = 3),
                labels = trans_format("log10", math_format(10 ^ .x))) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10 ^ x, n = 3),
                labels = trans_format("log10", math_format(10 ^ .x))) +
  theme_bw() +
  theme(aspect.ratio = 0.6, 
        legend.position = "none") +
  facet_wrap( ~ Inventory, ncol = 5)

To express how well the GPD model fits the data, we compute the residuals in terms of the log-odds ratios between the empirical exceedance probabilities (\(p\)) and the predicted averages (\(\hat p\)) for each inventory. The log-odds ratio is \(\log \frac{\hat p(1 - p)}{p (1-\hat p)}\), conditioned on each observed landslide. A positive (negative) log-odds ratio means that the model overestimates (underestimates) the empirical exceedance probability of a given landslide area.

# Plot residuals in exceedance probability
ggplot(zz %>% filter(p < 1), 
       aes(x = xx,
           y = -log((p / (1 - p)) / (md / (1 - md))),
           col = factor(Inventory))) +
  geom_ribbon(aes(ymin = -log((p / (1 - p)) / (hi / (1 - hi))), 
                  ymax = -log((p / (1 - p)) / (lo / (1 - lo))), 
                  fill = factor(Inventory)), 
              alpha = 0.35, lwd = 0) +
  scale_color_viridis_d() +
  geom_abline(slope = 0, intercept = 0, lty = 2) +
  geom_point(alpha = 2/3) +
  labs(x = expression(paste("Landslide area (", km^2, ")")), 
       y = "Log-odds ratio") +
  scale_x_log10(breaks = trans_breaks("log10", function(x) 10 ^ x, n = 3),
                labels = trans_format("log10", math_format(10 ^ .x))) +
  ylim(-2, 2) +
  theme_bw() +
  theme(aspect.ratio = 0.6, legend.position = "none") +
  facet_wrap( ~ Inventory, ncol = 5)


# Save posterior log-odds ratios
post_logodds <- zz %>% 
  group_by(Inventory) %>% 
  summarise(lo_logodds = HDIofMCMC(log((p / (1 - p)) / (md / (1 - md))))[1],
            hi_logodds = HDIofMCMC(log((p / (1 - p)) / (md / (1 - md))))[2],
            HDI_logodds = hi_logodds - lo_logodds)

We find that the log-odds ratios reveal most mismatches at either extreme end of the size range, though without any consistency across the inventories. For example, the model underestimates the execeedance probabilities of landslide areas <0.8 km2 in catalogues M9.1 Tohoku JPN 2 (Tanyaş et al. 2017) and Owyhee USA (Safran et al. 2011), while overestimating the execeedance probabilities of landslide areas >1.7 km2 in catalogue Dauna Apennines ITA (Ardizzone et al. 2023).

3.2 Effects of different landslide inventories

# Extract and plot estimates of shape parameter k
post_k <- posterior$k
colnames(post_k) <- unique(zz$Inventory)
post_k <- melt(post_k, value.name = "k")
colnames(post_k) <- c("iterations", "Inventory", "k")

post_k %>% 
  group_by(Inventory) %>% 
  mutate(md_grp_k = median(k)) %>% 
  ggplot(aes(x = k, 
             y = fct_reorder(Inventory, md_grp_k), 
             fill = fct_reorder(Inventory, md_grp_k))) +
  geom_vline(xintercept = median(post_hyper$alpha_k / post_hyper$beta_k), 
             color = "darkgrey") +
  geom_vline(xintercept = HDIofMCMC(post_hyper$alpha_k / post_hyper$beta_k), 
             linetype = "dashed", color = "darkgrey") +
  stat_halfeye(
               interval_size = 0.5, 
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               show.legend = FALSE
               ) +
  scale_fill_viridis_d() +
  xlim(0, 1.25) +
  labs(x = expression(k[j]), y = NULL) +
  theme_bw() +
  theme(aspect.ratio = 4)

# Group summary of posterior k
post_k_grp <- post_k %>%
  group_by(Inventory) %>%
  summarise(md_grp_k = median(k),
            hdi_lo_grp_k = HDIofMCMC(k)[1],
            hdi_hi_grp_k = HDIofMCMC(k)[2],
            whdi_grp_k = HDIofMCMC(k)[2] - HDIofMCMC(k)[1])

# Group summary of posterior alpha = 1 / k
post_alpha_grp <- post_k %>%
  group_by(Inventory) %>%
  summarise(md_grp_alpha = median(1 / k),
            hdi_lo_grp_alpha = HDIofMCMC(1 / k)[1],
            hdi_hi_grp_alpha = HDIofMCMC(1 / k)[2],
            whdi_grp_alpha = HDIofMCMC(1 / k)[2] - HDIofMCMC(1 / k)[1])

Our model estimates shape parameters \(k_j\) that vary across the landslide inventories with posterior medians ranging from \(\tilde{k}_j\) = 0.21 in catalogue M8 Haiyuan CHN (Y. Xu et al. 2020) to \(\tilde{k}_j\) = 0.92 in catalogue M7.9 Alaska USA 1 (Gorum et al. 2014). Narrower posterior distributions mean less uncertainty, mainly owing to more large landslides that inform the model in the relevant catalogue. For example, the Campania ITA inventory (Fusco et al. 2023) contains most, i.e. 1854, large landslides, and its 95% HDI is narrowest (0.57 < \(k_j\) < 0.72). In contrast, the M6.2 Aisen CHL 1 (Gorum et al. 2014) catalogue has the fewest, i.e. 29, large landslides; its broad posterior distribution is thus informed more by the pooled estimate from all inventories together.

The mean of the Gamma prior distribution is \(\bar{k} = \alpha_k / \beta_{k}\) by definition, and we derive the power-law exponent \(\alpha\) from the identity \(\alpha = 1 / k\). Similarly, we obtain the mean pooled posterior \(\bar{\sigma} = \alpha_\sigma / \beta_{\sigma}\) from the sampled hyperparameters. We find that most of the 95% credible intervals of \(k_j\) overlap with that of the mean \(\bar{k}\) learned from the pooled model (grey vertical line, flanked by dashed lines marking its 95% HDI). Only two inventories, i.e. one on historic rock avalanches in the St. Elias mountains of Alaska, United States (St. Elias USA, Bessette-Kirton and Coe 2020), and one on landslides triggered by the 1920 Haiyuan earthquake, China (Y. Xu et al. 2020), stand out with a \(k_j\) that is credibly below that of the population average.

Estimates of \(k_j\) can differ credibly between inventories in the same geographic region such as western Canada and Alaska, for example when comparing St. Elias USA (Bessette-Kirton and Coe 2020); Kluane, CAN-USA (W. Smith, pers. comm.); and M7.9 Alaska USA (Gorum et al. 2014). On the contrary, inventories covering very different geographic regions and time spans can have largely overlapping, and thus statistically indifferent, posterior distributions of \(k_j\). This is the case, for example, for a catalogue of landslides triggered by the M7.6 Kashmir earthquake in 2005 (M7.6 PAK 3, Basharat et al. 2016), and one on mostly Quaternary landslides in the Caspian Sea basin (Caspian Sea KAZ, Pánek et al. 2016). Similarly, the inventory of rainfall-triggered landslides in far western Nepal covering 79 time steps between 2002 and 2018 (Far Western NPL, Muñoz-Torrero 2020), and the one on landslides following the 2018 M7.5 Porgera earthquake in Papua New Guinea, a database fully compiled by a deep learning algorithm (Porgera* PNG, Bhuyan et al. 2023), have indistinguishable posterior distributions of \(k_j\).

# Plot estimates of scale parameter sigma 
post_sig <- posterior$sigma
colnames(post_sig) <- unique(zz$Inventory)
post_sig <- melt(post_sig, value.name = "sig")
colnames(post_sig) <- c("iterations", "Inventory", "sig")

post_sig %>% 
  group_by(Inventory) %>% 
  mutate(md_grp_sig = median(sig)) %>% 
  ggplot(aes(x = sig, 
             y = fct_reorder(Inventory, md_grp_sig), 
             fill = fct_reorder(Inventory, md_grp_sig))) +
  geom_vline(xintercept =  median(post_hyper$a_sigma / post_hyper$b_sigma), 
             color = "darkgrey") +
  geom_vline(xintercept = HDIofMCMC(post_hyper$a_sigma / post_hyper$b_sigma),
             linetype = "dashed", color = "darkgrey") +
  geom_vline(xintercept = min_size, 
             linetype = "dashed", color = "red") +
  stat_halfeye(
    interval_size = 0.5, 
    shape = 21,
    point_color = "red",
    point_fill = "white",
    point_size = 1.5,
    show.legend = FALSE
  ) +
  scale_fill_viridis_d() +
  scale_x_log10(expression(paste(sigma[j], " (", km^2, ")")),
        breaks = trans_breaks("log10", function(x) 10 ^ x),
        labels = trans_format("log10", math_format(10 ^ .x))) +
  labs(y = NULL) +
  theme_bw() +
  theme(aspect.ratio = 4)

# Group summary of posterior sigma
post_sig_grp <- post_sig %>%
  group_by(Inventory) %>%
  summarise(md_grp_sig = median(sig),
            hdi_lo_grp_sig = HDIofMCMC(sig)[1],
            hdi_hi_grp_sig = HDIofMCMC(sig)[2],
            whdi_grp_sig = HDIofMCMC(sig)[2] - HDIofMCMC(sig)[1])

The spread of the posterior scale parameter \(\sigma_j\) is more pronounced across the inventories such that the pooled estimate overlaps with those of seven inventories only. Inventory-specific medians range over two orders of magnitude from \(\tilde{\sigma}_j\) = 0.04 km2 in the M7.9 Sichuan CHN 1 catalogue (C. Xu et al. 2014) to \(\tilde{\sigma}_j\) = 2.56 km2 in the Caspian Sea KAZ catalogue (Pánek et al. 2016). Again, inventories with different environmental settings and landslide triggers have very similar posterior distributions of \(\sigma_j\), such as the one on landslides triggered during Typhoon Morakot, Taiwan, in 2009 (TC Morakot TWN, Emberson et al. 2022), and the one on landslides triggered by the M7.6 Kashmir earthquake in 2005 (M7.6 PAK 3, Basharat et al. 2016). This also holds for different inventories addressing the same triggering event, such as the 2008 Wenchuan earthquake, for example M7.9 Sichuan CHN 1 (C. Xu et al. 2014) and M7.9 Sichuan CHN 2 (G. Li et al. 2014). Higher values of \(\tilde{\sigma}_j\) especially identify inventories with a less pronounced, or more limited range of, power-law scaling for the large landslides recorded. Examples include many of the inventories containing Quaternary landslides such as those of the Caspian Sea (Caspian Sea KAZ, Pánek et al. 2016) or the Columbia River basins (Owyhee USA, Safran et al. 2011), but also the above-mentioned inventories of rock avalanches that happened in the past few decades (St. Elias USA and Kluane CAN-USA).

3.3 Pooled estimates of landslide scaling

The pooled estimates in our multi-level model express the variance of the learned parameters across all inventories. The sampled hyperparameters of \(k_j\) that describe the shape \(\alpha_k\) and rate (or inverse scale) \(\beta_{k}\) of the Gamma-distributed parameter \(k_j\) are positively correlated; the same applies for the hyperparameters \(\alpha_{\sigma}\) and \(\beta_{\sigma}\).

# Plot hyperparameter estimates of k
p_hyp_k <- ggplot(post_hyper, aes(x = alpha_k, y = beta_k)) + 
  scale_fill_gradient(low = "purple", high = "orange") +
  labs(fill = expression(n[s]),
       x = expression(alpha[k]),
       y = expression(beta[k])) +
  geom_hex(alpha = 0.8) +
  theme_bw() +
  theme(aspect.ratio = 1,
        legend.key.size = unit(0.1, "cm"),
        legend.key.height = unit(0.1, "cm"),
        legend.key.width = unit(0.4, "cm"),
        legend.position = "top")

# Plot hyperparameter estimates of sigma
p_hyp_sig <- ggplot(post_hyper, aes(x = a_sigma, y = b_sigma)) + 
  scale_fill_gradient(low = "purple", high = "orange") +
  labs(fill = expression(n[s]),
       x = expression(alpha[sigma]),
       y = expression(beta[sigma])) +
  geom_hex(alpha = 0.8) +
  theme_bw() +
  theme(aspect.ratio = 1,
        legend.key.size = unit(0.1, "cm"),
        legend.key.height = unit(0.1, "cm"),
        legend.key.width = unit(0.4, "cm"),
        legend.position = "top")

plot_grid(p_hyp_k, p_hyp_sig, align = "h",
          labels = c("a", "b"))

# Posterior mean of k
p_pool_k <- ggplot(post_hyper %>% 
    mutate(
      a_k_prior = abs(rnorm(nrow(post_hyper), 6, 1)), # Gaussian hyperprior
      b_k_prior = abs(rnorm(nrow(post_hyper), 9, 1)), # Gaussian hyperprior
      a_prior = 1 / (a_k_prior / b_k_prior),
      a_post = 1 / (alpha_k / beta_k))
    ) +
    # prior
    stat_slab(aes(x = a_prior),
               slab_fill = "grey",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
    # posterior
    stat_halfeye(aes(x = a_post),
               interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "orchid",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
  xlim(0.75, 3) +
  ylim(0, 3.25) +
  annotate("text", x = 1, y = 0.9, label = "Prior", color = "grey") +
  annotate("text", x = 2, y = 1.5, label = "Posterior", color = "orchid") +
  labs(x = expression(paste("Mean pooled scaling exponent ", 
                            alpha)), 
       y = expression(paste("p(", alpha, ")"))) +
  theme_bw() +
  theme(aspect.ratio = 1/4)

# Posterior mean of sigma
p_pool_sig <- ggplot(post_hyper %>% 
    mutate(
      a_sig_prior = abs(rnorm(nrow(post_hyper), 1, 0.25)), # Gaussian hyperprior
      b_sig_prior = abs(rnorm(nrow(post_hyper), 5, 5)),  # Gaussian hyperprior
      sig_prior = a_sig_prior / b_sig_prior,
      sig_post = a_sigma / b_sigma)
    ) +
    # prior
    stat_slab(aes(x = sig_prior),
               slab_fill = "grey",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
    # posterior
    stat_halfeye(aes(x = sig_post),
               interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "darkorange",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
  xlim(0, 0.75) +
  ylim(0, 10) +
  annotate("text", x = 0.1, y = 6, label = "Prior", color = "grey") +
  annotate("text", x = 0.275, y = 9, label = "Posterior", color = "darkorange") +
  labs(x = expression(paste("Mean pooled scale parameter ", 
                            sigma, " (", km^2, ")")),
       y = expression(paste("p (", sigma, ") (", km^-2, ")"))) +
  theme_bw() +
  theme(aspect.ratio = 1/4)

plot_grid(p_pool_k, p_pool_sig, align = "v", ncol = 1,
          labels = c("a", "b"))

We find that the numerical approximation of the joint posterior distribution has a distinct maximum. We obtain a mean power-law exponent of 1.35 < \(\bar{\alpha}\) < 1.83 across all inventories with 95% probability; the posterior median of \(\bar{\alpha}\) is 1.6. Compared to the prior distribution based on published values of the scaling parameter, our model has gained more certainty from the data considered in this study, yielding a much narrower posterior. The mean scaling parameter is 0.18 < \(\bar{\sigma}\) < 0.37 km2 across all inventories with 95% probability; the posterior median of \(\bar{\sigma}\) is 0.27 km2. This posterior shifted up from the prior distribution that we centered on our arbitrary size threshold for large landslides. Our model has learned much from the inventory data compared to the priors, especially concerning the high variance of \(\sigma_j\) across the individual landslide catalogues.

3.4 Comparison with maximum likelihood estimate

To assess the sensitivity of our model results to our choice of inference, we compare our results to maximum likelihood estimates (MLE) of the exponent \(\alpha_j\) of the inverse power-law distribution, based on the Hill estimator (Clauset, Shalizi, and Newman 2009). By definition, the MLE standard error for each inventory decays with the inverse square root of sample size, whereas the Bayesian estimates are informed by all data via the multi-level model structure. Hence we do not expect a 1:1 correspondence from this comparison. Instead, it underlines how variable and uncertain landslide scaling estimates can be for different inventories, regardless of method.

# MLE estimate for power-law alpha 
# Equation 3.1 in Clauset et al. (2009)
MLE_alpha <- zz %>% 
  group_by(Inventory) %>% 
  summarise(n = n(),
            MLE = 1 + n / sum(log(xx / min_size)),
            MLE_sd = (MLE - 1) / sqrt(n)) # Appendix B in Clauset et al.

# Posterior medians of k
posterior_k <- post_k %>% 
  group_by(Inventory) %>% 
  summarise(md_grp_alpha = median(1 / k),
            lo_grp_alpha = HDIofMCMC(1 / k)[1],
            hi_grp_alpha = HDIofMCMC(1 / k)[2])

# Posterior medians of sigma
posterior_sig <- post_sig %>% 
  group_by(Inventory) %>% 
  summarise(md_grp_sig = median(sig),
            lo_grp_sig = HDIofMCMC(sig)[1],
            hi_grp_sig = HDIofMCMC(sig)[2])

modcomp <- merge(mydat, MLE_alpha)
modcomp <- merge(modcomp, posterior_k)
modcomp <- merge(modcomp, posterior_sig)
modcomp <- merge(modcomp, post_logodds)

ggplot(modcomp, aes(y = MLE - 1, # Offset due to definition in Clauset et al.
                    x = md_grp_alpha, 
                    size = n,
                    col = `n < threshold` / (n + `n < threshold`))) +
  geom_errorbar(aes(ymax = MLE - 1 + 2 * MLE_sd, 
                     ymin = MLE - 1 - 2 * MLE_sd),
                 col = "grey", linewidth = 0.5) + 
  geom_errorbarh(aes(xmax = hi_grp_alpha, 
                    xmin = lo_grp_alpha),
                col = "grey", linewidth = 0.5) + 
  geom_point() +
  geom_point(pch = 21, color = "black") +
  scale_color_viridis_c(begin = 0.95, end = 0.15) +
  geom_abline(intercept =  0, slope = 1, linetype = "dashed") +
  geom_text_repel(aes(label = ifelse(md_grp_alpha > 3, 
                               Inventory, ""),
                      bg.color = "white", bg.r = 0.2), 
            hjust = -0.1, vjust = -0.5, col = "black") +
  labs(y = expression(paste("MLE of ", alpha)), 
       x = expression(paste("Posterior ", alpha[j])),
       size = "Large\nlandslides\nin inventory",
       col = expression(paste("Fraction of\nlandslides < ", mu))) +
  # xlim(0, 5) +
  # ylim(0, 5) +
  scale_x_continuous(breaks = seq(0, 8, 1)) +
  coord_fixed() +
  theme_bw() +
  theme(legend.position = "top")

We obtain inventory-specific MLEs of 0.33 < \(\hat\alpha_j\) < 2.39. This spread encompasses most reported values in the literature (Tebbens 2020). In contrast, the posterior medians of \(\alpha_j = 1/k_j\) occupy a seemingly broader range (1.09 < \(\tilde\alpha\) < 4.82), though nominally similar to that of the MLE method for most inventories within the respective errors. However, the coefficient of variation is narrower for the Bayesian median estimates. Two inventories stand out with very high scaling exponents, e.g. M8 Haiyuan CHN and St. Elias USA. Both inventories have only few landslides that we censored because they were below the size threshold \(\mu\) = 0.1 km2; in other words, these catalogues contained mostly large landslides originally.

4 Discussion

4.1 Implications

Our results show that posterior shape parameters \(k_j = 1/\alpha_j\) of landslide-size distributions may vary despite being informed by thousands of data points and previous research. The narrowest 95% HDI of \(\alpha_j\), and thus the best we can constrain this parameter, is that of the Campania ITA catalogue with 1.37 < \(\alpha_j\) < 1.74. This numerical range has much overlap with that of previously reported scaling exponents that were obtained for mostly smaller landslides, however (Tebbens 2020). Still, the nearly triangular posterior distribution has much of its probability mass near its peak, and the same goes for the pooled estimate. Except for a few inventories, the corresponding posterior distributions for other landslide inventories also remain largely indistinguishable from the pooled estimate. This low variance of \(\alpha_j\) across inventories is striking if we consider the diverse mapping techniques, levels of data quality, coverage, environmental setting, and landslide triggers. The inventories we selected cover several climatic zones with different vegetation and land cover characteristics that likely affect the detection and mapping or large landslides. Some inventories were generated from deep learning algorithms (Bhuyan et al. 2023), whereas most others were mapped manually. While many posterior \(\sigma_j\) have values close to our arbitrary size threshold for large landslides of 0.1 km2, the variance in this parameter is high compared to the pooled estimates. Overall, the median estimates of both \(\alpha_j\) and \(\sigma_j\) are unaffected by the number of large landslides reported in a given inventory.

Our model highlights several inventories with scaling statistics that stand out. The M8 Haiyuan CHN and St. Elias USA catalogues have high estimates of \(\alpha_j\), whereas Caspian Sea KAZ, M9.1 Tohoku JPN 2, and Owyhee USA have high estimates of \(\sigma_j\). These inventories consist almost exclusively of large landslides and feature very few other data below the size threshold \(\mu\). In contrast, we observe that inventories with most landslides falling below \(\mu\) have low values of \(\sigma_j\) consistently. We infer that landslide catalogues that were focused on compiling information about larger landslides tend to have elevated values of \(\sigma_j\). In this context, our selection of inventories is nearly balanced: twenty of them have less than 10% large landslides, whereas seventeen of them have more than 50%.

ggplot(modcomp, 
       aes(x = md_grp_alpha,
           y = md_grp_sig, 
           size = n,
           col = `n < threshold` / (n + `n < threshold`),
           #col = `Min. size (sqkm)`,
           label = Inventory)) +
  geom_errorbarh(aes(xmax = hi_grp_alpha, 
                    xmin = lo_grp_alpha),
                col = "grey", linewidth = 0.5) +
  geom_errorbar(aes(ymax = hi_grp_sig, 
                    ymin = lo_grp_sig),
                col = "grey", linewidth = 0.5) +
  scale_color_viridis_c(begin = 0.95, end = 0.15) +
  labs(x = expression(paste("Posterior ", alpha[j])),
       size = "Large\nlandslides",
       col = expression(paste("Fraction of\nlandslides < ", mu))) +
  geom_point() +
  geom_point(pch = 21, color = "black") +
  geom_text_repel(aes(label = ifelse(md_grp_alpha > 3 | md_grp_sig > 1, 
                               Inventory, ""),
                      bg.color = "white", bg.r = 0.2), 
            hjust = -0.1, vjust = -0.5, col = "black") +
  scale_x_continuous(breaks = seq(0, 8, 1)) +
  scale_y_log10(expression(paste("Posterior ", 
                                 sigma[j], " (", km^2, ")")),
        breaks = trans_breaks("log10", function(y) 10 ^ y),
        labels = trans_format("log10", math_format(10 ^ .x))) +
  theme_bw() +
  theme(aspect.ratio = 1)

One possible explanation for these outlying landslide-size statistics is that the GPD is a poor fit to inventories that mainly feature large landslides, at least for the chosen threshold \(\mu\). The residuals for most of these inventories show pronounced underestimates for the smallest landslide range, except for the South Central CHL and Caspian Sea KAZ catalogues. Yet, other inventories with similar residuals (e.g. Glacier Bay N.P. CAN) hardly stand out compared to the pooled estimates. Clearly, extrapolating the model across the full size range, for example to infer the number of seemingly missing, underreported, or overlooked landslides (Tanyaş et al. 2019) can be misleading in these cases. Another explanation for the high estimates of \(k_j\) and \(\sigma_j\) is that the original mapping was focused on landslides well beyond the size threshold we chose, so that slope failures close to this cutoff might be undersampled. Some of these inventories also contain partly overlapping slope-failure deposits of multiple ages, marking several phases of reactivation. Such overlaps may cause more landslides to surpass the size threshold. Hence, the mapping objective would partly bias estimates of \(\sigma_j\) for a size threshold that is too low.

ggplot(modcomp, 
       aes(x = md_grp_alpha,
           y = md_grp_sig, 
           size = n,
           fill = Trigger,
           label = Inventory)) +
  geom_errorbarh(aes(xmax = hi_grp_alpha, 
                    xmin = lo_grp_alpha),
                col = "grey", linewidth = 0.5) +
  geom_errorbar(aes(ymax = hi_grp_sig, 
                    ymin = lo_grp_sig),
                col = "grey", linewidth = 0.5) +
  scale_fill_brewer(palette = "PuOr") +
  labs(x = expression(paste("Posterior ", alpha[j])),
       size = "Large\nlandslides",
       bg = "Landslide\ntrigger") +
  geom_point() +
  geom_point(pch = 21, color = "black") +
  geom_text_repel(aes(label = ifelse(md_grp_alpha > 3 | md_grp_sig > 1, 
                               Inventory, ""),
                bg.color = "white", bg.r = 0.2), 
            hjust = -0.1, vjust = -0.5) +
  scale_x_continuous(breaks = seq(0, 8, 1)) +
  scale_y_log10(expression(paste("Posterior ", 
                                 sigma[j], " (", km^2, ")")),
        breaks = trans_breaks("log10", function(y) 10 ^ y),
        labels = trans_format("log10", math_format(10 ^ .x))) +
  theme_bw() +
  theme(aspect.ratio = 1)

We also find that the 95% credible intervals of both GDP parameters overlap for landslide inventories regardless of whether they are attributed to recent earthquakes and rainstorms, or whether they integrate landslide observations, and thus likely various triggers, over many years. We infer that the scaling statistics disclose very little about the type of landslide trigger. In this context, our findings caution against a mechanistic interpretation of the scaling parameters.

4.2 Role of size threshold and sample size

The heavy-tailed distribution of landslides means that we discarded many samples for small increases in \(\mu\). To test the sensitivity of our results to the choice of size threshold \(\mu\), we replicated our analyses, and recorded the variation of the pooled estimates \(\bar{k}\) and \(\bar{\sigma}\) as a function of both \(\mu\) and the minimum number of large landslides that an inventory needs to have to be considered in our analysis.

mingrp_labeller <- as_labeller(
  c(`5` = "n > 5",
    `10` = "n > 10",
    `25` = "n > 25",
    `50` = "n > 50",
    `100` = "n > 100")
)

p_sens1 <- ggplot(hyparam, aes(x = 1 / a_post, y = as.factor(mu))) +
  labs(x = expression(paste("Pooled ", alpha)),
       y = expression(paste(mu, " (", km^2, ")"))) +
  stat_halfeye(interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "orchid",
               slab_alpha = 0.75,
               normalize = "panels",
               show.legend = FALSE) + 
  xlim(1, 2.25) +
  scale_y_discrete(breaks = seq(0.1, 0.3, 0.1)) +
  theme_bw() +
  theme(aspect.ratio = 1, legend.position = "none") +
  facet_wrap( ~ grpsize, ncol = 5, labeller = mingrp_labeller)

p_sens2 <- ggplot(hyparam, aes(x = sigma_post, y = as.factor(mu))) +
  labs(x = expression(paste("Pooled ", sigma, " (", km^2, ")")),
       y = expression(paste(mu, " (", km^2, ")"))) +
  stat_halfeye(interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "darkorange",
               slab_alpha = 0.75,
               normalize = "panels",
               show.legend = FALSE) + 
  xlim(0.1, 1.5) +
  scale_y_discrete(breaks = seq(0.1, 0.3, 0.1)) +
  theme_bw() +
  theme(aspect.ratio = 1, legend.position = "none") +
  facet_wrap( ~ grpsize, ncol = 5, labeller = mingrp_labeller)

plot_grid(p_sens1, p_sens2, align = "v", ncol = 1,
          labels = c("a", "b"))

We find that varying the size threshold such that 0.075 km2 < \(\mu\) < 0.3 km2 returns posterior pooled values of \(\alpha\) that decrease slightly with increasing \(\mu\) and the minimum number of samples per inventory, though with much overlap. Overall, 1.23 \(<\bar{\alpha}<\) 1.9 with 95% probability regardless of the threshold or sample size that we pick. Estimates of \(\bar{\sigma}\) increase sightly with \(\mu\), but also with some overlap. Preferring larger inventories for a given size threshold reduces the total sample size such that the pooled posterior distributions of \(\sigma\) get broader.

We infer that the choice of \(\mu\), and hence the definition of “large” landslides (McColl and Cook 2024), has limited influence on the scaling statistics, and especially \(\bar{\alpha}\). Values of \(\mu\) below the range that we tested violate the assumption of a high threshold such that a GPD would be inappropriate, whereas values above this range suffer from too small sample sizes. The pooled scale parameter \(\bar{\sigma}\) shows less variation with \(\mu\) and has largely overlapping posteriors. Hence, the choice of the minimum number of large landslides per inventory (\(n_{\mathrm{min}}\) = 25) limits both the number of levels in our model and the overall sample size. Admitting more inventories that contain fewer large landslides changes the posterior \(\bar{k}\) and \(\bar{\sigma}\) only slightly, but does narrow the uncertainties, especially for higher thresholds \(\mu\).

4.3 Advantages

Our Bayesian multi-level approach expands on previous, though largely separate, efforts of comparing landslide size statistics across different inventories. We offer here a single, consistent model that has several advantages:

First, the Bayesian setup can handle the small sample problem of large landslides. Scaling parameters for large landslides from a single landslide inventory are commonly estimated from extrapolating model fits that largely draw on more numerous, smaller landslides. Yet, even simulated power-law distributed data have natural scatter in the largest of observations (Clauset, Shalizi, and Newman 2009), making it difficult to validate extrapolations and leading to over-confident parameter estimates. Using data from other landslide inventories to validate these estimates often tacitly assumes that the scaling parameters agree or differ within error, but offers no way of determining of whether this assumption is at all valid. The multi-level model instead draws on the larger sample size from all inventories, and explicitly refines this shared knowledge in dedicated posterior distributions for each catalogue. In general, group-level parameter estimates tend to be closer to the pooled mean than those derived for separate models using fewer data from each group alone. This effect is known as parameter shrinkage and guards against overfitting, especially for inventories with fewer data.

Second, the Bayesian treatment documents explicitly all parameter uncertainties, and especially those that capture previous knowledge about reported parameters that characterise landslide size distributions. We can thus quantify how much we have learned from the data by comparing the posterior and prior distributions. By design, a Bayesian model seeks a compromise between these previous findings and the data considered here in a probabilistically consistent way. To this end, we made sure to include mostly recently published landslide inventories or those that had not been considered in scaling studies before. Posterior parameter estimates for catalogues with few large landslides are broader compared to those with many large landslides.

Third, the multi-level model structure enables direct comparison of parameter estimates across and between landslide inventories. Any differences in the underlying workflows of detecting and mapping landslides and the commensurate sample sizes are being accounted for by separate posterior distributions and their deviation from the pooled estimates. Our model measures objectively how similar landslide inventories are in terms of the scaling parameters that jointly, instead of separately, define the size distributions of large landslides.

Fourth, the peak-over-threshold approach that defines the GPD is rooted in extreme-value theory and thus expresses naturally what we expect statistically in the size distribution of large landslides. The parameters of the GPD contain information about the “roll-over” location and power-law scaling, and translate readily into parameters of other distributions used to characterise the size scaling of landslides.

4.4 Further applications

We can flexibly modify our multi-level model in several ways. One option is to group the data in other ways than by inventories. For example, we can specify the group levels such that they represent the type of inventory (event-based versus integrating over a period of time); triggering mechanism (earthquake, rainstorm, snowmelt, etc.); soil or rock type; or any other characteristic that may have been collected during the process of compiling the landslide inventory. We recall that the GPD is by definition “blind” to data below the threshold \(\mu\) in that it truncates all observations below the threshold. One alternative to improve the parameter estimates would be to use a censored GPD model that acknowledges the presence of smaller landslides, though without knowing their reported sizes.

5 Conclusions

  • We propose a multi-level model as common ground for consistently estimating and comparing size distributions of large slope failures across different inventories. In choosing a peak-over-threshold approach the Generalised Pareto distribution (GPD) reflects what we would expect statistically from a given landslide size distribution. The multi-level setup remediates the problem of low sample size by making use of all available data for estimating scaling parameters while acknowledging inherent differences across inventories.

  • Our model results based on 37 inventories with 8627 large landslides (\(>\) 0.1 km2) show that the power-law exponent for each inventory \(k_j = 1 / \alpha_j\) discloses little about the different underlying landslide trigger(s), geographic region, or time span concerning a given inventory, i.e. whether it is event-based or compiles landslides of many different ages. Inventories of mostly undated, prehistoric landslides have scaling exponents \(\alpha_j\) that hardly differ from those of historic, event-based catalogues.

  • Despite thousands of large landslides to learn from, the uncertainty about \(\alpha_j\) spans several decimal points. If taking all inventories together, the pooled \(\alpha\) captures most of this variance. The location parameter \(\sigma\) has more spread across inventories and is affected especially by those with with few or no landslides below our size threshold. We infer that \(\sigma\) is sensitive to the desired landslide size range and likely reflects the influence of mapping choices.

  • While several numerical modelling studies have attributed a physical meaning to scaling statistics of landslide size, we argue that some of this meaning might get diluted or even lost in empirical data that may combine confounding controls. We surmise that landslide inventories record these physical processes, though in a mixed way that could admit, for example, different failure types, rock and soil types, and groundwater conditions. We suspect that most landslide inventories may have mixed size distributions. For example, mixing data from inventories with differing size thresholds could add variance to \(\sigma\). Hence, the resulting scaling statistics would at best reflect bulk physical characteristics instead of clearly representing parameters of a single, well-constrained physical model of slope stability.

  • Our model measures objectively by how much scaling statistics differ across inventories within estimation error. Such differences can be vital if using scaling statistics for predicting future landslide hazard in terms of size and frequency (Hergarten 2023).

6 Acknowledgements

We used the statistical programming environment R (https://cran.r-project.org) with RStudio (https://posit.co/) as a frontend for all data analysis, and coded the Bayesian GPD in the probabilistic language STAN (https://mc-stan.org/); all are freely available. The landslide inventories studied here have either been published as indicated or are available directly from the original contributors. Will Smith and Stuart Dunning kindly shared data on landslides in Kluane National Park. Part of this research was funded by the German Research Foundation via the Research Training Group NatRiskChange (DFG GRK 2043, https://www.uni-potsdam.de/en/natriskchange/).

7 References

Abancó, C., G. L. Bennett, A. J. Matthews, M. A. M. Matera, and F. J. Tan. 2021. The role of geomorphology, rainfall and soil moisture in the occurrence of landslides triggered by 2018 Typhoon Mangkhut in the Philippines.” Natural Hazards and Earth System Sciences 21 (5): 1531–50. https://doi.org/10.5194/nhess-21-1531-2021.
Alberti, Stefano, Ben Leshchinsky, Josh Roering, Jonathan Perkins, and Michael J. Olsen. 2022. “Inversions of Landslide Strength as a Proxy for Subsurface Weathering.” Nature Communications 13 (1): 6049. https://doi.org/10.1038/s41467-022-33798-5.
Antinao, Jose Luis, and John Gosse. 2009. Large rockslides in the Southern Central Andes of Chile (32–34.5\(\,^{\circ}\)S): Tectonic control and significance for Quaternary landscape evolution.” Geomorphology 104 (3): 117–33. https://doi.org/10.1016/j.geomorph.2008.08.008.
Ardizzone, F., F. Bucci, M. Cardinali, F. Fiorucci, L. Pisano, M. Santangelo, and V. Zumpano. 2023. Geomorphological landslide inventory map of the Daunia Apennines, southern Italy.” Earth System Science Data 15 (2): 753–67. https://doi.org/10.5194/essd-15-753-2023.
Barlow, John, Michael Lim, Nick Rosser, David Petley, Matthew Brain, Emma Norman, and Melanie Geer. 2012. “Modeling Cliff Erosion Using Negative Power Law Scaling of Rockfalls.” Geomorphology 139-140: 416–24. https://doi.org/10.1016/j.geomorph.2011.11.006.
Basharat, Muhammad, Abid Ali, Ishtiaq A. K. Jadoon, and Joachim Rohn. 2016. Using PCA in evaluating event-controlling attributes of landsliding in the 2005 Kashmir earthquake region, NW Himalayas, Pakistan.” Natural Hazards 81 (3): 1999–2017. https://doi.org/10.1007/s11069-016-2172-9.
Basharat, Muhammad, Joachim Rohn, Mirza Shahid Baig, and Muhammad Rustam Khan. 2014. Spatial distribution analysis of mass movements triggered by the 2005 Kashmir earthquake in the Northeast Himalayas of Pakistan.” Geomorphology 206: 203–14. https://doi.org/10.1016/j.geomorph.2013.09.025.
Behling, Robert, Sigrid Roessner, Darya Golovko, and Birgit Kleinschmit. 2016. Derivation of long-term spatiotemporal landslide activity—A multi-sensor time series approach.” Remote Sensing of Environment 186: 88–104. https://doi.org/10.1016/j.rse.2016.07.017.
Belair, G. M., E. S. Jones, S. L. Slaughter, and B. B. Mirus. 2022. Landslide Inventories across the United States version 2.” U.S. Geological Survey. https://doi.org/10.5066/P9FZUX6N.
Bellugi, Dino G., David Graham Milledge, Kurt M. Cuffey, William E. Jr. Dietrich, and Laurel G. Larsen. 2021. Controls on the size distributions of shallow landslides.” Proceedings of the National Academy of Sciences 118 (9): e2021855118. https://doi.org/10.1073/pnas.2021855118.
Bernard, T. G., D. Lague, and P. Steer. 2021. Beyond 2D landslide inventories and their rollover: synoptic 3D inventories and volume from repeat lidar data.” Earth Surface Dynamics 9 (4): 1013–44. https://doi.org/10.5194/esurf-9-1013-2021.
Bessette-Kirton, Erin K., and Jeffrey A. Coe. 2020. A 36-Year Record of Rock Avalanches in the Saint Elias Mountains of Alaska, With Implications for Future Hazards.” Frontiers in Earth Science 8. https://doi.org/10.3389/feart.2020.00293.
Bhuyan, Kushanav, Hakan Tanyaş, Lorenzo Nava, Silvia Puliero, Sansar Raj Meena, Mario Floris, Cees van Westen, and Filippo Catani. 2023. “Generating Multi-Temporal Landslide Inventories Through a General Deep Transfer Learning Strategy Using HR EO Data.” Scientific Reports 13 (1): 162. https://doi.org/10.1038/s41598-022-27352-y.
Brardinoni, Francesco, Olav Slaymaker, and Marwan A Hassan. 2003. “Landslide Inventory in a Rugged Forested Watershed: A Comparison Between Air-Photo and Field Survey Data.” Geomorphology 54 (3): 179–96. https://doi.org/10.1016/S0169-555X(02)00355-0.
Brink, U. S. ten, R. Barkan, B. D. Andrews, and J. D. Chaytor. 2009. “Size Distributions and Failure Initiation of Submarine and Subaerial Landslides.” Earth and Planetary Science Letters 287 (1): 31–42. https://doi.org/10.1016/j.epsl.2009.07.031.
Burrows, K., O. Marc, and D. Remy. 2022. Using Sentinel-1 radar amplitude time series to constrain the timings of individual landslides: a step towards understanding the controls on monsoon-triggered landsliding.” Natural Hazards and Earth System Sciences 22 (8): 2637–53. https://doi.org/10.5194/nhess-22-2637-2022.
Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1): 1–32. https://doi.org/10.18637/jss.v076.i01.
Casagli, Nicola, Emanuele Intrieri, Veronica Tofani, Giovanni Gigli, and Federico Raspini. 2023. “Landslide Detection, Monitoring and Prediction with Remote-Sensing Techniques.” Nature Reviews Earth & Environment 4 (1): 51–64. https://doi.org/10.1038/s43017-022-00373-x.
Castro-Camilo, Daniela, Raphaël Huser, and Håvard Rue. 2022. “Practical Strategies for Generalized Extreme Value-Based Regression Models for Extremes.” Environmetrics 33 (6). https://doi.org/10.1002/env.2742.
Clauset, Aaron, Cosma Rohilla Shalizi, and M E J Newman. 2009. Power-Law Distributions in Empirical Data.” SIAM Review 51 (4): 661–703. https://doi.org/10.1137/070710111.
Dente, E., O. Katz, O. Crouvi, and A. Mushkin. 2023. The Geomorphic Effectiveness of Landslides.” Journal of Geophysical Research: Earth Surface 128 (12): e2023JF007191. https://doi.org/10.1029/2023JF007191.
Domènech, Guillem, Fan Yang, Xiaojun Guo, Xuanmei Fan, Gianvito Scaringi, Lanxin Dai, Chaoyang He, Qiang Xu, and Runqiu Huang. 2018. Two multi-temporal datasets to track the enhanced landsliding after the 2008 Wenchuan earthquake.” https://doi.org/10.5281/zenodo.1484667.
Emberson, R., D. B. Kirschbaum, P. Amatya, H. Tanyas, and O. Marc. 2022. “Insights from the Topographic Characteristics of a Large Global Catalog of Rainfall-Induced Landslide Event Inventories.” Natural Hazards and Earth System Sciences 22 (3): 1129–49. https://doi.org/10.5194/nhess-22-1129-2022.
Fan, Xuanmei, Gianvito Scaringi, Oliver Korup, A. Joshua West, Cees J. van Westen, Hakan Tanyas, Niels Hovius, et al. 2019. Earthquake-Induced Chains of Geologic Hazards: Patterns, Mechanisms, and Impacts.” Reviews of Geophysics 57 (2): 421–503. https://doi.org/10.1029/2018RG000626.
Franceschini, Rachele, Ascanio Rosi, Filippo Catani, and Nicola Casagli. 2022. Exploring a landslide inventory created by automated web data mining: the case of Italy.” Landslides 19 (4): 841–53. https://doi.org/10.1007/s10346-021-01799-y.
Frattini, Paolo, and Giovanni B. Crosta. 2013. “The Role of Material Properties and Landscape Morphology on Landslide Size Distributions.” Earth and Planetary Science Letters 361: 310–19. https://doi.org/https://doi.org/10.1016/j.epsl.2012.10.029.
Fusco, Francesco, Rita Tufano, Pantaleone De Vita, Diego Di Martire, Mariano Di Napoli, Luigi Guerriero, Florindo Antonio Mileti, Fabio Terribile, and Domenico Calcaterra. 2023. A revised landslide inventory of the Campania region (Italy).” Scientific Data 10 (1): 355. https://doi.org/10.1038/s41597-023-02155-6.
Gilham, Jamie, John Barlow, and Roger Moore. 2018. Marine control over negative power law scaling of mass wasting events in chalk sea cliffs with implications for future recession under the UKCP09 medium emission scenario.” Earth Surface Processes and Landforms 43 (10): 2136–46. https://doi.org/10.1002/esp.4379.
Gorum, Tolga, Oliver Korup, Cees J. van Westen, Mark van der Meijde, Chong Xu, and Freek D. van der Meer. 2014. Why so few? Landslides triggered by the 2002 Denali earthquake, Alaska.” Quaternary Science Reviews 95: 80–94. https://doi.org/10.1016/j.quascirev.2014.04.032.
Guzzetti, Fausto, Alessandro Cesare Mondini, Mauro Cardinali, Federica Fiorucci, Michele Santangelo, and Kang-Tsung Chang. 2012. Landslide inventory maps: New tools for an old problem.” Earth-Science Reviews 112 (1): 42–66. https://doi.org/10.1016/j.earscirev.2012.02.001.
Hao, L., Rajaneesh A., C. van Westen, Sajinkumar K. S., T. R. Martha, P. Jaiswal, and B. G. McAdoo. 2020. Constructing a complete landslide inventory dataset for the 2018 monsoon disaster in Kerala, India, for land use change analysis.” Earth System Science Data 12 (4): 2899–2918. https://doi.org/10.5194/essd-12-2899-2020.
Harp, Edwin L., Raymond C. Wilson, and Gerald F. Wieczorek. 1981. Landslides from the February 4, 1976, Guatemala earthquake.” Professional Paper 1204-A. Professional Paper. U.S. Geological Survey. https://doi.org/10.3133/pp1204A.
Hergarten, S. 2023. The concept of event-size-dependent exhaustion and its application to paraglacial rockslides.” Natural Hazards and Earth System Sciences 23 (9): 3051–63. https://doi.org/10.5194/nhess-23-3051-2023.
Hergarten, S., and H. J. Neugebauer. 1998. “Self-Organized Criticality in a Landslide Model.” Geophysical Research Letters 25 (6): 801–4. https://doi.org/10.1029/98GL50419.
Hibert, C, D Michéa, F Provost, J-P Malet, and M Geertsema. 2019. Exploration of continuous seismic recordings with a machine learning approach to document 20 yr of landslide activity in Alaska.” Geophysical Journal International 219 (2): 1138–47. https://doi.org/10.1093/gji/ggz354.
Hu, Kaiheng, Xiaopeng Zhang, Yong You, Xudong Hu, Weiming Liu, and Yong Li. 2019. Landslides and dammed lakes triggered by the 2017 Ms6.9 Milin earthquake in the Tsangpo gorge.” Landslides 16 (5): 993–1001. https://doi.org/10.1007/s10346-019-01168-w.
Jain, Saloni, Rakesh Khosa, and A. K. Gosain. 2022. Impact of landslide size and settings on landslide scaling relationship: a study from the Himalayan regions of India.” Landslides 19 (2): 373–85. https://doi.org/10.1007/s10346-021-01794-3.
Jones, J. N., S. J. Boulton, G. L. Bennett, M. Stokes, and M. R. Z. Whitworth. 2021. Temporal Variations in Landslide Distributions Following Extreme Events: Implications for Landslide Susceptibility Modeling.” Journal of Geophysical Research: Earth Surface 126 (7): e2021JF006067. https://doi.org/10.1029/2021JF006067.
Karakas, Gizem, Hakan A. Nefeslioglu, Sultan Kocaman, Mehmet Buyukdemircioglu, Tekin Yurur, and Candan Gokceoglu. 2021. Derivation of earthquake-induced landslide distribution using aerial photogrammetry: the January 24, 2020, Elazig (Turkey) earthquake.” Landslides 18 (6): 2193–2209. https://doi.org/10.1007/s10346-021-01660-2.
Katz, Richard W, Marc B Parlange, and Philippe Naveau. 2002. “Statistics of Extremes in Hydrology.” Advances in Water Resources 25 (8): 1287–1304. https://doi.org/10.1016/S0309-1708(02)00056-8.
Kim, Jinwoo, Jeffrey A. Coe, Zhong Lu, Nikita N. Avdievitch, and Chad P. Hults. 2022. Spaceborne InSAR mapping of landslides and subsidence in rapidly deglaciating terrain, Glacier Bay National Park and Preserve and vicinity, Alaska and British Columbia.” Remote Sensing of Environment 281: 113231. https://doi.org/10.1016/j.rse.2022.113231.
Korup, Oliver, John J. Clague, Reginald L. Hermanns, Kenneth Hewitt, Alexander L. Strom, and Johannes T. Weidinger. 2007. “Giant Landslides, Topography, and Erosion.” Earth and Planetary Science Letters 261 (3): 578–89. https://doi.org/10.1016/j.epsl.2007.07.025.
Korup, Oliver, Tolga Görüm, and Yuichi Hayakawa. 2012. “Without Power? Landslide Inventories in the Face of Climate Change.” Earth Surface Processes and Landforms 37 (1): 92–99. https://doi.org/10.1002/esp.2248.
Lacroix, Pascal, Alexander L. Handwerger, and Grégory Bièvre. 2020. “Life and Death of Slow-Moving Landslides.” Nature Reviews Earth & Environment 1 (8): 404–19. https://doi.org/10.1038/s43017-020-0072-8.
LaHusen, Sean R., Alison R. Duvall, Adam M. Booth, and David R. Montgomery. 2016. “Surface Roughness Dating of Long-Runout Landslides Near Oso, Washington (USA), Reveals Persistent Postglacial Hillslope Instability.” Geology 44 (2): 111–14. https://doi.org/10.1130/G37267.1.
Larsen, Isaac J., David R. Montgomery, and Oliver Korup. 2010. “Landslide Erosion Controlled by Hillslope Material.” Nature Geoscience 3 (4): 247–51. https://doi.org/10.1038/ngeo776.
Li, Gen K., and Seulgi Moon. 2021. “Topographic Stress Control on Bedrock Landslide Size.” Nature Geoscience 14 (5): 307–13. https://doi.org/10.1038/s41561-021-00739-8.
Li, Gen, A. Joshua West, Alexander L. Densmore, Zhangdong Jin, Robert N. Parker, and Robert G. Hilton. 2014. Seismic mountain building: Landslides associated with the 2008 Wenchuan earthquake in the context of a generalized model for earthquake volume balance.” Geochemistry, Geophysics, Geosystems 15 (4): 833–44. https://doi.org/10.1002/2013GC005067.
Liu, Jia, Yuming Wu, and Xing Gao. 2021. Increase in occurrence of large glacier-related landslides in the high mountains of Asia.” Scientific Reports 11 (1): 1635. https://doi.org/10.1038/s41598-021-81212-9.
Lombardo, Luigi, Haakon Bakka, Hakan Tanyas, Cees van Westen, P. Martin Mai, and Raphaël Huser. 2019. Geostatistical Modeling to Capture Seismic-Shaking Patterns From Earthquake-Induced Landslides.” Journal of Geophysical Research: Earth Surface 124 (7): 1958–80. https://doi.org/10.1029/2019JF005056.
Luetzenburg, G., K. Svennevig, A. A. Bjørk, M. Keiding, and A. Kroon. 2022. A national landslide inventory for Denmark.” Earth System Science Data 14 (7): 3157–65. https://doi.org/10.5194/essd-14-3157-2022.
Luna, L. V., and O. Korup. 2022. Seasonal Landslide Activity Lags Annual Precipitation Pattern in the Pacific Northwest.” Geophysical Research Letters 49 (18): e2022GL098506. https://doi.org/10.1029/2022GL098506.
Malamud, Bruce D., Donald L. Turcotte, Fausto Guzzetti, and Paola Reichenbach. 2004. Landslide inventories and their statistical properties.” Earth Surface Processes and Landforms 29 (6): 687–711. https://doi.org/10.1002/esp.1064.
Marc, O., R. Behling, C. Andermann, J. M. Turowski, L. Illien, S. Roessner, and N. Hovius. 2019. Long-term erosion of the Nepal Himalayas by bedrock landsliding: the role of monsoons, earthquakes and giant landslides.” Earth Surface Dynamics 7 (1): 107–28. https://doi.org/10.5194/esurf-7-107-2019.
Marc, O., and N. Hovius. 2015. “Amalgamation in Landslide Maps: Effects and Automatic Detection.” Natural Hazards and Earth System Sciences 15 (4): 723–33. https://doi.org/10.5194/nhess-15-723-2015.
McColl, S. T., and S. J. Cook. 2024. “A Universal Size Classification System for Landslides.” Landslides 21 (1): 111–20. https://doi.org/10.1007/s10346-023-02131-6.
Medwedeff, William G., Marin K. Clark, Dimitrios Zekkos, and A. Joshua West. 2020. “Characteristic Landslide Distributions: An Investigation of Landscape Controls on Landslide Size.” Earth and Planetary Science Letters 539: 116203. https://doi.org/10.1016/j.epsl.2020.116203.
Meunier, Patrick, Taro Uchida, and Niels Hovius. 2013. “Landslide Patterns Reveal the Sources of Large Earthquakes.” Earth and Planetary Science Letters 363: 27–33. https://doi.org/10.1016/j.epsl.2012.12.018.
Milledge, David G., Dino G. Bellugi, Jack Watt, and Alexander L. Densmore. 2022. “Automated Determination of Landslide Locations After Large Trigger Events: Advantages and Disadvantages Compared to Manual Mapping.” Natural Hazards and Earth System Sciences 22 (2): 481–508. https://doi.org/10.5194/nhess-22-481-2022.
Muñoz-Torrero, Alberto. 2020. Multi-temporal Landslide Inventory for the Far- Western region of Nepal.” Zenodo. https://doi.org/10.5281/zenodo.4290100.
Pánek, Tomáš, Oliver Korup, Jozef Minár, and Jan Hradecký. 2016. Giant landslides and highstands of the Caspian Sea.” Geology 44 (11): 939–42. https://doi.org/10.1130/G38259.1.
Roback, Kevin, Marin K. Clark, A. Joshua West, Dimitrios Zekkos, Gen Li, Sean F. Gallen, Deepak Chamlagain, and Jonathan W. Godt. 2018. The size, distribution, and mobility of landslides caused by the 2015 Mw7.8 Gorkha earthquake, Nepal.” Geomorphology 301: 121–38. https://doi.org/10.1016/j.geomorph.2017.01.030.
Safran, EB, SW Anderson, M Mills-Novoa, PK House, and L Ely. 2011. Controls on large landslide distribution and implications for the geomorphic evolution of the southern interior Columbia River basin.” Geological Society of America Bulletin 123 (9-10): 1851–62. https://doi.org/10.1130/B30061.1.
Saito, H., O. Korup, T. Uchida, S. Hayashi, and T. Oguchi. 2014. Rainfall conditions, typhoon frequency, and contemporary landslide erosion in Japan.” Geology 42 (11): 999–1002. https://doi.org/10.1130/G35680.1.
Santangelo, M., O. Althuwaynee, M. Alvioli, F. Ardizzone, C. Bianchi, T. Bornaetxea, M. T. Brunetti, et al. 2023. Inventory of landslides triggered by an extreme rainfall event in Marche-Umbria, Italy, on 15 September 2022.” Scientific Data 10 (1): 427. https://doi.org/10.1038/s41597-023-02336-3.
Schönfeldt, Elisabeth, Diego Winocur, Tomáš Pánek, and Oliver Korup. 2022. Deep learning reveals one of Earth’s largest landslide terrain in Patagonia.” Earth and Planetary Science Letters 593: 117642. https://doi.org/10.1016/j.epsl.2022.117642.
Sepúlveda, Sergio A., Alejandra Serey, Marisol Lara, Andrés Pavez, and Sofı́a Rebolledo. 2010. Landslides induced by the April 2007 Aysén Fjord earthquake, Chilean Patagonia.” Landslides 7 (4): 483–92. https://doi.org/10.1007/s10346-010-0203-2.
Smith, William D., Stuart A. Dunning, Neil Ross, Jon Telling, Erin K. Jensen, Dan H. Shugar, Jeffrey A. Coe, and Marten Geertsema. 2023. Revising supraglacial rock avalanche magnitudes and frequencies in Glacier Bay National Park, Alaska.” Geomorphology 425: 108591. https://doi.org/10.1016/j.geomorph.2023.108591.
Song, Chuang, Chen Yu, Zhenhong Li, Stefano Utili, Paolo Frattini, Giovanni Crosta, and Jianbing Peng. 2022. Triggering and recovery of earthquake accelerated landslides in Central Italy revealed by satellite radar observations.” Nature Communications 13 (1): 7278. https://doi.org/10.1038/s41467-022-35035-5.
Tanyaş, Hakan, Tolga Görüm, Islam Fadel, Cengiz Yıldırım, and Luigi Lombardo. 2022. An open dataset for landslides triggered by the 2016 Mw 7.8 Kaikōura earthquake, New Zealand.” Landslides 19 (6): 1405–20. https://doi.org/10.1007/s10346-022-01869-9.
Tanyaş, Hakan, Kevin Hill, Luke Mahoney, Islam Fadel, and Luigi Lombardo. 2022. The world’s second-largest, recorded landslide event: Lessons learnt from the landslides triggered during and after the 2018 Mw 7.5 Papua New Guinea earthquake.” Engineering Geology 297: 106504. https://doi.org/10.1016/j.enggeo.2021.106504.
Tanyaş, Hakan, Cees J. van Westen, Kate E. Allstadt, M. Anna Nowicki Jessee, Tolga Görüm, Randall W. Jibson, Jonathan W. Godt, et al. 2017. “Presentation and Analysis of a Worldwide Database of Earthquake-Induced Landslide Inventories.” Journal of Geophysical Research: Earth Surface 122 (10): 1991–2015. https://doi.org/10.1002/2017JF004236.
Tanyaş, Hakan, Cees J. van Westen, Kate E. Allstadt, and Randall W. Jibson. 2019. Factors controlling landslide frequency–area distributions.” Earth Surface Processes and Landforms 44 (4): 900–917. https://doi.org/10.1002/esp.4543.
Tebbens, S. F. 2020. Landslide Scaling: A Review.” Earth and Space Science 7 (1): e2019EA000662. https://doi.org/https://doi.org/10.1029/2019EA000662.
Valagussa, A., P. Frattini, E. Valbuzzi, and G. B. Crosta. 2021. Role of landslides on the volume balance of the Nepal 2015 earthquake sequence.” Scientific Reports 11 (1): 3434. https://doi.org/10.1038/s41598-021-83037-y.
Valagussa, A., O. Marc, P. Frattini, and G. B. Crosta. 2019. “Seismic and Geological Controls on Earthquake-Induced Landslide Size.” Earth and Planetary Science Letters 506: 268–81. https://doi.org/10.1016/j.epsl.2018.11.005.
Van Den Eeckhaut, M., J. Poesen, G. Verstraeten, V. Vanacker, J. Moeyersons, J. Nyssen, and L. P. H. van Beek. 2005. “The Effectiveness of Hillshade Maps and Expert Knowledge in Mapping Old Deep-Seated Landslides.” Geomorphology 67 (3): 351–63. https://doi.org/10.1016/j.geomorph.2004.11.001.
Xu, Chong, Xiwei Xu, Xin Yao, and Fuchu Dai. 2014. Three (nearly) complete inventories of landslides triggered by the May 12, 2008 Wenchuan Mw 7.9 earthquake of China and their spatial distribution statistical analysis.” Landslides 11 (3): 441–61. https://doi.org/10.1007/s10346-013-0404-6.
Xu, Yueren, Mark B. Allen, Weiheng Zhang, Wenqiao Li, and Honglin He. 2020. Landslide characteristics in the Loess Plateau, northern China.” Geomorphology 359: 107150. https://doi.org/10.1016/j.geomorph.2020.107150.
Zhao, Bo, Weile Li, Yunsheng Wang, Jiayan Lu, and Xiang Li. 2019. Landslides triggered by the Ms 6.9 Nyingchi earthquake, China (18 November 2017): analysis of the spatial distribution and occurrence factors.” Landslides 16 (4): 765–76. https://doi.org/10.1007/s10346-019-01146-2.

  1. Institute of Environmental Sciences and Geography, Institute of Geosciences, University of Potsdam↩︎

  2. Institute of Environmental Sciences and Geography, University of Potsdam; Potsdam Institute for Climate Impact Research PIK↩︎

  3. Institute of Environmental Sciences and Geography, University of Potsdam↩︎

---
title: "Size scaling of large landslides from incomplete inventories"
author: 
  - Oliver Korup^[Institute of Environmental Sciences and Geography, Institute of Geosciences, University of Potsdam]
  - Joaquin Ferrer^[Institute of Environmental Sciences and Geography, University of Potsdam; Potsdam Institute for Climate Impact Research PIK]
  - Lisa Luna^[Institute of Environmental Sciences and Geography, University of Potsdam]
date: "20 September 2024"
output:
  html_notebook:
    theme: cosmo
    toc: true
    toc_float: true
    fig_caption: true
    number_sections: true
  html_document:
    theme: cosmo
    toc: true
    toc_float: true
  word_document:
    toc: true
  pdf_document:
    keep_tex: true
    latex_engine: xelatex
bibliography: gpareto_multilevel_inventories_references.bib
#csl: "geophysical-research-letters.csl"
nocite: |
  @Antinao:2009aa, @essd-15-753-2023, @Basharat:2014aa, @Behling:2016aa, @domenech, @Harp:1981aa, @Hu:2019aa, @Jones:2021aa, @Karakas:2021aa, @Kim:2022aa, @Liu:2021aa, @esurf-7-107-2019, @Roback:2018aa, @Sepulveda:2010aa, @Tanyas:2022aa, @Tanyas:2022ab, @usgs_wlc, @Valagussa:2021aa, @Zhao:2019aa
fontsize: 12pt
---

Note: This file contains supplementary information that highlights the overall data analysis in __R__ code. Although the accompanying text here is mostly the same as in the accompanying paper for illustrative purposes, you should refer to the final, peer-reviewed paper for citations. You can click on the __Code__ button in the upper right of this file to download its *.Rmd version.

__Landslide inventories have become cornerstones for estimating the relationship between the frequency and size of slope failures, thus informing appraisals of hillslope stability, erosion, and commensurate hazard. Numerous studies have reported how larger landslides are systematically rarer than smaller ones, drawing on probability distributions fitted to mapped landslide areas or volumes. In these models, much uncertainty concerns the larger landslides (defined here as affecting >0.1 km^2^) that are rarely sampled, and often projected by extrapolating beyond the observed size range in a given study area. Relying instead on size-scaling estimates from other inventories is problematic because landslide detection and mapping, data quality, resolution, sample size, model choice, and fitting method can vary. To overcome these constraints, we employ a Bayesian multi-level model with a Generalised Pareto likelihood that provides a single, objective, and consistent comparison grounded in extreme-value theory. We explore whether and how scaling parameters vary between 37 inventories that, although incomplete, bring together 8627 large landslides. Despite the broad range of mapping protocols and lengths of record, and differing topographic, geological, and climatic settings, the posterior power-law scaling exponents remain indistinguishable between most inventories. Likewise, the scaling statistics fail to separate known earthquake from rainfall triggers, and event-based from multi-temporal catalogues. Instead, our model identifies several inventories with outlier scaling statistics that reflect intentional censoring during mapping. Our results thus caution against a universal or solely mechanistic interpretation of the scaling parameters, at least in the context of large landslides.__


# Introduction

Keeping records of the size and frequency of landslides is key to estimate rates of erosion, geomorphic work, and hillslope evolution [@Dente:2023aa;@Saito:2014aa;@esurf-7-107-2019]; infer material strength and weathering of hillslopes [@Li:2021aa; @Alberti:2022aa]; and inform hazard appraisals of slope instability [@Guzzetti:2012aa], particularly in response to contemporary climate change [@SMITH2023108591]. A popular way to characterise the relative frequency of landslide-affected areas or volumes is to fit probability distributions to size data compiled in inventories [@Malamud:2004aa]. These catalogues contain locations and geometries of individual footprint areas mapped largely from air photos or satellite images. The choice of probability distribution (or "scaling laws") has favoured the inverse power-law or Pareto, the inverse gamma, or the lognormal distributions [@https://doi.org/10.1029/2019EA000662], or combinations thereof [@Jain:2022aa]. All these distributions are skewed, often heavy-tailed, and capture the widespread observation that larger landslides are systematically rarer than smaller ones.

Reported values of the parameters that define these distributions have seemingly narrow numerical ranges [@https://doi.org/10.1029/2019EA000662]. This similarity among model fits has led to a lively discussion about whether these parameters reflect generic geometric or mechanistic properties of landslides or the hillslopes that they occur on [@Bellugi2021ControlsOT; @esurf-9-1013-2021]. For example, physical interpretations of the "roll-over" that marks the lower bound of inverse power-law distributions include that of a hillslope length scale that is susceptible to failure, or the cohesive strength of failure planes [@https://doi.org/10.1029/2019EA000662]. While landslide size distributions may reflect the nature and spatial intensity of a common trigger such as a strong earthquake [@VALAGUSSA2019268], the average landslide size may contain information about both cohesive strength of slope material and hillslope relief [@Medwedeff:2020aa]. Some of these physical interpretations are backed by, or derived from, numerical simulations of slope instability [@FRATTINI2013310]. Yet, a different line of arguments proposed that the roll-over is a statistical artefact of landslide detection and mapping, approximately marking the smallest discernible landslide in a given study area [@https://doi.org/10.1029/2019EA000662]. Either way, this discussion has questioned whether these scaling laws are universally applicable to landslides irrespective of environmental setting, mapping methods, and trigger mechanisms [@Malamud:2004aa; @Tanyas:2019aa]. Many of these interpretations have relied on the direct comparison of reported parameter values, and scrutiny concerning possible effects of data sources and quality, mapping method, and statistical errors of the fitted models has increased in more recent work [@Bellugi2021ControlsOT].

Still, most uncertainty remains about the large landslides that are rarely sampled naturally. Hence, the bulk of studies on landslide size has disclosed little about these large landslides, let alone their prediction as first-time failures [@https://doi.org/10.1029/2018RG000626]. One reason for this knowledge gap is that large landslides are often elusive in catalogues compiled shortly after a landslide-triggering earthquake or rainstorm [@essd-12-2899-2020; @nhess-21-1531-2021; @Santangelo:2023aa]. Sample sizes often involve only a handful to several dozen large landslides, and thus often remain too small for robust statistics in a given study area. Hence, inference is mostly based on the simple extrapolation of model fits beyond the observed size range. Yet, large landslides may often re-shape hillslope geometry and dominate erosion [@Korup:2007aa;@esurf-7-107-2019], but may involve phases of creep motion, and respond differently to triggering conditions than smaller failures because of a longer and more complex slope history of accumulated stress and strain [@Lacroix:2020aa].

In general, the statistics of landslides are derived from a sequence of detection, mapping, and statistical inference (Fig. 1). Uncertainties that propagate throughout each step can affect the outcome in terms of landslide scaling statistics.

![Landslide scaling statistics rely on accurate detection, mapping, and statistical inference. These three (colour-coded) steps are prone to a number of uncertainties that may propagate as errors into the derived landslide statistics. Boxes with black outline are most likely directly tied to physical processes of landsliding.](ls_stats_schematic.png){height=350px}

\

At the level of the input data, both landslide detection and mapping face several constraints. The mapping objective can dictate whether to focus on landslides attributed to a single trigger such as a strong earthquake [@Meunier:2013aa; @Gorum:2014aa;@tanyas_presentation_2017;@Lombardo:2019aa], a rainstorm [@nhess-21-1531-2021; @essd-12-2899-2020;@emberson_insights_2022;@Santangelo:2023aa], or instead to compile landslide traces that have accumulated over years to millennia [@lahusen_surface_2016;@essd-14-3157-2022; @Fusco:2023aa; @10.1130/G38259.1]. Some of the most comprehensive catalogues today feature thousands to hundreds of thousands of slope failures across entire nations or beyond [@essd-14-3157-2022; @Fusco:2023aa]. The methods to detect, map, and compile landslides have become more diverse and elaborate beyond traditional mapping from air photos, optical satellite data or historical records [@Xu:2020aa;@Casagli:2023aa]. Newer catalogues derive from laser scanning [@esurf-9-1013-2021], radar imagery [@Song:2022aa], object-based image analysis [@milledge_automated_2022], deep neural networks [@Schonfeldt:2022aa; @Bhuyan:2023aa], text mining [@Franceschini:2022aa], and seismology [@Hibert:2019aa]. Most methods require specific mapping protocols adjusted to the effective resolution of imagery that may be compromised by vegetation, land cover, cloud, and shadow [@Brardinoni:2003aa;@nhess-22-2637-2022]. Varying image quality, resolution, and coverage also affect landslide size estimates, as does the experience of mapping operators [@Van-Den-Eeckhaut:2005aa]. The mapping outcome may depend on whether a single person or a team is at work, and the attention to detail in delineating source, deposit, or total affected areas. Overlapping landslide source areas or bodies can obscure the dimensions of slope failure [@nhess-15-723-2015]. Debris- flows and snow-avalanche tracks, moraines, and wind-throw gaps in forests can be mistaken for landslide evidence. Experience and training aid detection and mapping, but also introduce bias, for example favouring fresh landslides, certain types of failure, or a size range that is easiest to recognise. Hence, the size range of a given inventory is bounded by the smallest mappable, and the largest recognisable, landslide [@Barlow:2012aa]. Thus, without any standard mapping protocol in place, landslide researchers have to deal with catalogues of varying coverage, detail, and quality for the same task of obtaining traits of landslide size.

At the model level, estimates of size scaling hinge on the choice of probability distribution to characterise the mapped landslides [@Brink:2009aa]. These estimates also depend on sample size, data pre-processing, fitting method, residuals, and cross-validation. Numerical experiments show that small sample sizes yield volatile estimates of scaling parameters for inverse power-law distributions in particular [@https://doi.org/10.1002/esp.2248]. Most estimation methods involve either regression of log-binned - and thus smoothed - landslide frequencies versus size [@https://doi.org/10.1002/esp.4379], or maximum likelihood estimates [@Clauset:2009iy]; various biases apply to both methods. However, reports of confidence intervals or goodness of fit, and hence ways to assess overfitting remain rare. Still, the basis for statistical inference of landslide size distributions varies at the level of the individual inventory, each of which embodies the methods of detection and mapping used, and the biases they may induce. In the light of these constraints, a direct comparison of landslide scaling estimates between different inventories may be misleading.

We propose a compact solution to compare more fairly the landslide-size distributions from diverse inventories by estimating scaling parameters with a single, probabilistically consistent model. We apply this model to large landslides that affect a total area of \>0.1 km^2^, and address the problem of small sample size by using Bayesian inference in a multi-level model that uses data from multiple inventories to estimate the variance of scaling parameters within and across these catalogues  [@https://doi.org/10.1029/2022GL098506]. The multi-level approach acknowledges structure in landslide size data in a consistent way. One natural grouping of data is by inventory, and reflects the diversity in data input quality reviewed above. Our focus on large landslides makes the Generalised Pareto Distribution (GPD) a natural model choice, because extreme-value theory predicts that data above a high threshold are approximately GP distributed [@Castro_Camilo_2022]. Another advantage of this distribution is that its parameters can be translated directly into those used most widely in studies of landslide size scaling.

```{r message = FALSE, warning = FALSE, include = FALSE}
# Clean
graphics.off()
rm(list = ls(all = TRUE))

# Set working directory as global document option
knitr::opts_chunk$set(root.dir = "~/INVI", echo = TRUE)

setwd("~/INVI")

# Load libraries
require(brms)
require(rstan)
require(readxl)
require(ggplot2)
require(scales)
require(magrittr)
require(dplyr)
require(reshape2)
require(AER)
require(RColorBrewer)
require(ggExtra)
require(ggpubr)
require(cowplot)
require(ggridges)
require(tidybayes)
require(forcats)
require(sf)
require(readr)
require(stringr)
require(ggrepel)
require(rticles)

# Select multiple cores as suggested by STAN manual
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# Load HDI script 
# (courtesy of J. Kruschke, Doing Bayesian Data Analysis 2nd ed.)
# (https://jkkweb.sitehost.iu.edu/DoingBayesianDataAnalysis/)
source("HDIofMCMC.r")
```

# Data and Methods

We consider data on total landslide-affected areas from several dozen published landslide inventories with open access. We excluded many other detailed catalogues that had no records of landslides meeting our size threshold of 0.1 km^2^. Besides information about their size, many databases have landslide types and triggers reported, and many data were recorded following recent (i.e. post-1900) major earthquakes and rainstorms with the intention to characterise the impact of these events. We also included catalogues spanning time intervals of several years to millennia, featuring mostly undated large landslides with unknown triggers to test whether these cumulative inventories have size distributions that differ from those of event-based inventories.


```{r, message = FALSE}
# Read data file on landslide inventories
zz <- read_csv("lsinventory_scaling_master.csv")

# Order by target variable
zz <- zz %>% arrange(xx)

# Create any subsets if needed
zz <- zz %>% filter(!is.na(Inventory))

# Add synthetic noise to break ties in ranks for exceedance probabilities
zz$xx <- zz$xx + runif(length(zz$xx))
```

The choice of probability distribution to model landslide area often rests on implicit assumptions. For example, the inverse power law draws on considerations of physical sand-pile models and the concept of self-organised criticality [@https://doi.org/10.1029/98GL50419], whereas the lognormal distribution arises naturally from multiplicative effects of random variables [@Brink:2009aa]. Here we model the reported sizes of large landslides with the Generalized Pareto Distribution (GPD). The GPD is rooted in extreme-value theory and approximates the distribution of a continuous random variable $x$ above a high threshold (or location parameter) $\mu$. The GPD thus captures what we would expect theoretically from a sample consisting of observations filtered above a minimum value [@Katz:2002aa]. Any physical interpretation of the GPD parameters may need to account, or correct, for this statistical expectation first. The probability density function of the GPD is:

$$
\mathrm{GPD}(x|\mu,\sigma,k) = \frac{1}{\sigma} 
\left(  1 + \frac{k \left( x - \mu \right)}{\sigma} \right) ^ {-1/k - 1},
$$ 

where $x >= \mu$, $\sigma > 0$ is a scale parameter, and $k \geq 0$ is a shape parameter. The scale parameter $\sigma$ is somewhat comparable to the "roll-over" in studies using an inverse power-law model for estimating landslide size scaling. This roll-over marks the lowest landslide size above which power-law scaling is assumed. The GPD shape parameter is the inverse of the "scaling exponent" of the power-law tail $\alpha$, such that $k = 1 / \alpha$.

Here, the location parameter $\mu$ sets the minimum landslide size for data to be admitted to the GPD. This is known as a peak-over-threshold approach in extreme-value statistics [@Katz:2002aa]. For large landslides, we let $\mu$ = 0.1 km^2^. Empirical relationships between landslide volume and total affected area across a wide range of environmental settings show that an area of 0.1 km^2^ corresponds to an average volume of roughly 10^6^ m^3^ [@Larsen:2010aa], which is the suggested lower threshold for large landslides [@McColl:2024aa]. This particular choice of $\mu$ is a compromise, because fewer samples and landslide inventories are available for higher values of $\mu$, whereas the GPD becomes a less and less valid approximation to the data for lower values of $\mu$.

```{r}
# Consider only POT data, i.e. those xx >= min_size
zz$xx <- zz$xx / 1e6 # convert to [km ^ 2]

# Record minimum landslide area in full inventory
zz <- zz %>% 
  group_by(Inventory) %>% 
  mutate(minArea = min(xx))

# Set minimum size mu (location parameter)
min_size <- 0.1 # [km ^ 2]

# Record number of landslides below mu
zz <- zz %>% 
  group_by(Inventory) %>% 
  mutate(n_below_mu = sum(xx < min_size))

# Record total landslide area
tot_A <- sum(zz$xx)

# Filter for POT data
zz <- zz %>% filter(xx >= min_size)
```

Fitting the GPD to data can involve maximum likelihood estimates or, in case of few samples ($n$ < 30), probability-weighted (L-)moments to avoid volatile parameter estimates [@Katz:2002aa]. We use Bayesian inference to learn the GPD parameters from the data, acknowledging explicitly that these are derived from different inventories that reflect different environmental conditions across study areas, and landslides likely detected at varying resolution and mapped with different techniques. A Bayesian treatment of this fitting problem seeks a compromise between a likelihood function and a probability distribution of prior knowledge about the model parameters. This approach also obviates the need for binned landslide-size data to use frequency density [e.g. @Malamud:2004aa]. Instead, we work with the joint probability that is the numerator of Bayes' Rule:

$$ p(\theta|\mathcal{D}) = \frac{p(\mathcal{D}|\theta) p(\theta)}{p(\mathcal{D})}, $$ 

where $\theta$ is the vector of model parameters that we wish to update from both the landslide data $\mathcal{D}$ and prior knowledge. We use the GPD as the likelihood function $p(\mathcal{D}|\theta)$ and choose (hyper-)prior distributions $p(\theta)$ to approximate what we know about landslide size distributions so far and irrespective of the data $\mathcal{D}$ studied here.

Our model uses a multi-level setup, in which $i \in \{1,\dots,n\}$ indexes each landslide observation $x_i$ from a sample of size $n$, and $j \in \{1,\dots,J\}$ indexes each of $J$ different landslide inventories. The idea of the multi-level model is that the size distribution in each landslide inventory $j$ is characterised by an individual set of GPD parameters $\sigma_j$ and $k_j$. We further assume that the values of each of these inventory-specific parameter pairs are drawn from the same two probability distributions:

$$ x_i \sim \mathrm{GPD}(\mu, \sigma_{j[i]}, k_{j[i]}) $$

$$ \sigma_j \sim \mathrm{Gamma}(\alpha_\sigma, \beta_{\sigma}) $$
$$ k_j \sim \mathrm{Gamma}(\alpha_k, \beta_{k}) $$ 

Here we choose independent Gamma distributions for both $\sigma_j$ and $k_j$ to ensure that the parameters are positive and uncorrelated. The multi-level model thus learns the parameters for each catalogue informed by both its data, the overarching Gamma distributions, and prior knowledge. While the model allows $\sigma_j$ and $k_j$ to vary between landslide inventories, it also draws on information from the full data set via this multi-level structure.

Bayesian reasoning requires that we specify our prior knowledge explicitly. We do this by choosing the hyper-parameters of the two Gamma distributions of $\sigma_j$ and $k_j$. These hyper-parameters describe the distribution of landslide scaling parameters across all inventories and offer a global summary from all data. We draw on the growing literature of landslide scaling. Recent reviews have summarised that the power-law scaling exponent $\alpha$ for landslide inventories is most often reported in the range of 1 \< $\alpha$ \< 3 [@https://doi.org/10.1029/2019EA000662]. Recalling that the GPD shape parameter $k = 1 / \alpha$, we can use this information to constrain our (hyper-)prior distributions accordingly. Hence we choose these hyper-parameter values such that they contain findings from landslide scaling studies based on data other than the ones used here. The exact shape of these distributions may matter little in the light of the large sample size that informs our likelihood function. We disregard any correlation between the hyper-parameters and simplistically assume independent distributions:

$$ \alpha_\sigma \sim \mathcal{N}(1, 0.25) $$
$$ \beta_{\sigma} \sim \mathcal{N}(5, 5) $$
$$ \alpha_k \sim \mathcal{N}(6, 1) $$
$$ \beta_{k} \sim \mathcal{N}(9, 1) $$

where $\alpha_\sigma$ and $\alpha_k$ are the corresponding shape parameters, and $\beta_{\sigma}$ and $\beta_{k}$ are the inverse scale (or rate) parameters. We assume independent Gaussian prior distributions for these hyperparameters and choose the prior means and standard deviations informed by previous research on landslide scaling properties [@https://doi.org/10.1029/2019EA000662].

```{r}
# Set minimum samples per inventory
min_grp_size <- 25
```

To avoid having too many inventories with only a handful of large landslides, we consider only those data collections with at least `r min_grp_size` landslides that exceed the threshold size $\mu$.

```{r}
# Filter for minimum samples per group
min_grp <- zz %>% 
  group_by(Inventory) %>% 
  summarise(count = n()) %>% 
  filter(count <= min_grp_size)
zz <- zz %>% filter(!(Inventory %in% min_grp$Inventory))
```

```{r}
# Compute empirical exceedance probabilities per group
zz <- zz %>% 
  group_by(Inventory) %>% 
  arrange(xx, .by_group = TRUE) %>% 
  mutate(p = seq(n(), 1, -1) / n())
```


The data that we need for a numerical approximation of the posterior distribution of the parameters of the GPD consist of the individual landslide areas and labels of the inventories they belong to.

```{r}
# Prepare data input for STAN (custom)
stan_d <- list(
  ymin = min_size,
  N = length(zz$xx),
  y = zz$xx,
  L = length(unique(zz$Inventory)),
  ll = as.numeric(factor(zz$Inventory)),
  ymax = aggregate(zz$xx, by = list(zz$Inventory), max)$x,
  #yt = 10 ^ seq(2.9, 5, 0.01),   # testing data
  #Nt = 211
  yt = zz$xx,   # testing data
  Nt = length(zz$xx),
  lltt = as.numeric(factor(zz$Inventory))
)
```

```{r Summary_Table, fig.cap = 'Summary of landslide inventories listing study area, sample size of large landslides, the smallest landslide detected, number of landslides below size threshold, and trigger. *Derived from deep neural networks.'}
# Data overview
(mydat <- zz %>%
  group_by(Inventory) %>%
  summarise(Samples = n(),
            `Max. area (km^2)` = max(xx) %>% round(1),
            `Min. area (km^2)` = min(minArea) %>% round(4),
            `n < threshold` = max(n_below_mu),
            Trigger = first(Trigger),
            Reference = first(group)))
```

The size threshold $\mu$ means that we had to discard many published landslide inventories that only contain smaller slope failures. Our data thus consists of `r nrow(zz)` large landslides filtered from `r length(unique(zz$Inventory))` different inventories. Together, these large slope failures affected an area of `r round(sum(zz$xx))` km^2^, or `r round(sum(zz$xx) / tot_A, 2) * 100`% of the total landslide-affected area recorded in these catalogues. The largest landslide is unnamed and extends over `r round(max(zz$xx))` km^2^ in the Caspian Sea basin [@10.1130/G38259.1]. Our data thus span more than three orders of magnitude in landslide area; the largest mapped landslide areas per inventory differ by up two orders of magnitude.

We note that `r sum(mydat$Trigger == "Earthquake")` (or `r round(sum(mydat$Trigger == "Earthquake") * 100 / nrow(mydat))`%) of our selected landslide inventories were compiled following an earthquake, including several versions that were mapped by different research teams. Only `r sum(mydat$Trigger == "Rainfall")` catalogues (`r round(sum(mydat$Trigger == "Rainfall") * 100 / nrow(mydat))`%) are attributed to a rainfall trigger, while `r sum(mydat$Trigger == "Various")` catalogues (`r round(sum(mydat$Trigger == "Various") * 100 / nrow(mydat))`%) contain information about landslides that accumulated over many years and thus likely reflect various triggers.

```{r include=FALSE}
# Fit with STAN--------------
mfit <- stan("gpareto_fit_multilevel-inventories.stan",
             warmup = 500,
             iter = 2000,
             data = stan_d,
             chains = 4,
             cores = 4)
```


We use the probabilistic programming language STAN [@Carpenter:2017aa] to code our model and call it via the statistical programming environment **R**. We ran four independent Hamiltonian Monte Carlo chains to explore the model parameter space with the No U-Turn (NUTS) sampler coded in STAN and verified that the numerical solutions converged. Unless stated otherwise, we use medians and 95% highest density intervals (HDIs) to summarise all posterior distributions. A 95% HDI means that there is a 95% probability for a given parameter to be in the specified interval.

```{r include=FALSE}
## Model diagnostics
check_divergences(mfit)
check_energy(mfit)
check_hmc_diagnostics(mfit)
check_treedepth(mfit)
loo(mfit)
```

```{r}
# We extract the posterior into a more amenable format:
posterior <- rstan::extract(mfit)
rm(mfit)

# Obtain posterior predictions
mypred <- data.frame(A = stan_d$yt,
                          lo = apply(posterior$predccdf, 2,
                                     HDIofMCMC)[1, ],
                          md = apply(posterior$predccdf, 2,
                                     median),
                          hi = apply(posterior$predccdf, 2,
                                     HDIofMCMC)[2, ])

# Combine with data
zz <- cbind(zz, mypred)

# Collect hyperparameters
post_hyper <- as.data.frame(cbind(posterior$alpha_k, 
                                  posterior$beta_k,
                                  posterior$a_sigma,
                                  posterior$b_sigma,
                                  posterior$lp__))
names(post_hyper) <- c("alpha_k", "beta_k", "a_sigma", "b_sigma", "logposterior")
```

# Results

## Model fits and residuals

We express the size distributions of large landslides in cumulative form using the exceedance probability $p$ for a given landslide area.

```{r GPD-modelfit, fig.height = 10, fig.width = 10, fig.cap = 'Size and frequency of large landslides from inventories that reported at least 25 slope failures affecting >0.1 km^2^ each. Circles are observed data ranked by their empirical exceedance probabilities, and lines are posterior medians of a fitted multi-level Generalised Pareto distribution (GPD) with shaded 95% highest density intervals (HDIs).'}
ggplot(zz, aes(x = xx,
               y = md,
               col = factor(Inventory))) +
  geom_line() +
  geom_ribbon(aes(ymin = hi, ymax = lo, fill = factor(Inventory)), 
              alpha = 0.35, linewidth = 0) +
  scale_color_viridis_d(begin = 1, end = 0) +
  geom_point(data = zz, aes(x = xx, y = p), alpha = 2/3) +
  labs(x = expression(paste("Landslide area (", km^2, ")")), 
       y = "Empirical exceedance probability") +
  scale_x_log10(breaks = trans_breaks("log10", function(x) 10 ^ x, n = 3),
                labels = trans_format("log10", math_format(10 ^ .x))) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10 ^ x, n = 3),
                labels = trans_format("log10", math_format(10 ^ .x))) +
  theme_bw() +
  theme(aspect.ratio = 0.6, 
        legend.position = "none") +
  facet_wrap( ~ Inventory, ncol = 5)
```

To express how well the GPD model fits the data, we compute the residuals in terms of the log-odds ratios between the empirical exceedance probabilities ($p$) and the predicted averages ($\hat p$) for each inventory. The log-odds ratio is $\log \frac{\hat p(1 - p)}{p (1-\hat p)}$, conditioned on each observed landslide. A positive (negative) log-odds ratio means that the model overestimates (underestimates) the empirical exceedance probability of a given landslide area.

```{r GPD-residuals, warning = FALSE, fig.height = 10, fig.width = 10, fig.cap = 'Residuals of multi-level GPD fit to large landslide size distributions; residuals are expressed as an log-odds ratio of observed versus predicted values. Dashed horizontal line marks perfect fit; positive (negative) ratios indicate over-(under-) estimated exceedance probabilities. Shaded areas are point-wise 95% HDIs estimated at each landslide observation.'}
# Plot residuals in exceedance probability
ggplot(zz %>% filter(p < 1), 
       aes(x = xx,
           y = -log((p / (1 - p)) / (md / (1 - md))),
           col = factor(Inventory))) +
  geom_ribbon(aes(ymin = -log((p / (1 - p)) / (hi / (1 - hi))), 
                  ymax = -log((p / (1 - p)) / (lo / (1 - lo))), 
                  fill = factor(Inventory)), 
              alpha = 0.35, lwd = 0) +
  scale_color_viridis_d() +
  geom_abline(slope = 0, intercept = 0, lty = 2) +
  geom_point(alpha = 2/3) +
  labs(x = expression(paste("Landslide area (", km^2, ")")), 
       y = "Log-odds ratio") +
  scale_x_log10(breaks = trans_breaks("log10", function(x) 10 ^ x, n = 3),
                labels = trans_format("log10", math_format(10 ^ .x))) +
  ylim(-2, 2) +
  theme_bw() +
  theme(aspect.ratio = 0.6, legend.position = "none") +
  facet_wrap( ~ Inventory, ncol = 5)

# Save posterior log-odds ratios
post_logodds <- zz %>% 
  group_by(Inventory) %>% 
  summarise(lo_logodds = HDIofMCMC(log((p / (1 - p)) / (md / (1 - md))))[1],
            hi_logodds = HDIofMCMC(log((p / (1 - p)) / (md / (1 - md))))[2],
            HDI_logodds = hi_logodds - lo_logodds)
```
We find that the log-odds ratios reveal most mismatches at either extreme end of the size range, though without any consistency across the inventories. For example, the model underestimates the execeedance probabilities of landslide areas <0.8 km^2^ in catalogues `r zz %>% filter(p < 1) %>% mutate(lo = ((p / (1 - p)) / (md / (1 - md)))) %>% group_by(Inventory) %>% summarise(qlo = quantile(lo, 0.75)) %>% arrange(qlo) %>% last() %>% select(Inventory) %>% as.character()` [@tanyas_presentation_2017] and Owyhee USA [@Safran:2011aa], while overestimating the execeedance probabilities of landslide areas >1.7 km^2^ in catalogue Dauna Apennines ITA [@essd-15-753-2023].


## Effects of different landslide inventories

```{r Posterior_k, fig.height = 8, fig.width = 6, warning = FALSE, fig.cap = 'Posterior shape parameter $k$ of the GPD; this parameter is the inverse of the scaling exponent in power-law distributions. White circles are the medians per inventory, and black horizontal lines are 95% HDIs; vertical grey line is the posterior median of the pooled model, and dashed lines delimit the 95% HDIs.'}
# Extract and plot estimates of shape parameter k
post_k <- posterior$k
colnames(post_k) <- unique(zz$Inventory)
post_k <- melt(post_k, value.name = "k")
colnames(post_k) <- c("iterations", "Inventory", "k")

post_k %>% 
  group_by(Inventory) %>% 
  mutate(md_grp_k = median(k)) %>% 
  ggplot(aes(x = k, 
             y = fct_reorder(Inventory, md_grp_k), 
             fill = fct_reorder(Inventory, md_grp_k))) +
  geom_vline(xintercept = median(post_hyper$alpha_k / post_hyper$beta_k), 
             color = "darkgrey") +
  geom_vline(xintercept = HDIofMCMC(post_hyper$alpha_k / post_hyper$beta_k), 
             linetype = "dashed", color = "darkgrey") +
  stat_halfeye(
               interval_size = 0.5, 
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               show.legend = FALSE
               ) +
  scale_fill_viridis_d() +
  xlim(0, 1.25) +
  labs(x = expression(k[j]), y = NULL) +
  theme_bw() +
  theme(aspect.ratio = 4)
```

```{r}
# Group summary of posterior k
post_k_grp <- post_k %>%
  group_by(Inventory) %>%
  summarise(md_grp_k = median(k),
            hdi_lo_grp_k = HDIofMCMC(k)[1],
            hdi_hi_grp_k = HDIofMCMC(k)[2],
            whdi_grp_k = HDIofMCMC(k)[2] - HDIofMCMC(k)[1])

# Group summary of posterior alpha = 1 / k
post_alpha_grp <- post_k %>%
  group_by(Inventory) %>%
  summarise(md_grp_alpha = median(1 / k),
            hdi_lo_grp_alpha = HDIofMCMC(1 / k)[1],
            hdi_hi_grp_alpha = HDIofMCMC(1 / k)[2],
            whdi_grp_alpha = HDIofMCMC(1 / k)[2] - HDIofMCMC(1 / k)[1])
```

Our model estimates shape parameters $k_j$ that vary across the landslide inventories with posterior medians ranging from $\tilde{k}_j$ = `r post_k_grp %>% arrange(md_grp_k) %>% first() %>% dplyr::select(md_grp_k) %>% as.numeric() %>% round(2)` in catalogue `r post_k_grp %>% arrange(md_grp_k) %>% first() %>% dplyr::select(Inventory) %>% unlist() %>% as.character()` [@Xu:2020aa] to $\tilde{k}_j$ = `r post_k_grp %>% arrange(md_grp_k) %>% last() %>% dplyr::select(md_grp_k) %>% as.numeric() %>% round(2)` in catalogue `r post_k_grp %>% arrange(md_grp_k) %>% last() %>% dplyr::select(Inventory) %>% unlist() %>% as.character()` [@Gorum:2014aa]. Narrower posterior distributions mean less uncertainty, mainly owing to more large landslides that inform the model in the relevant catalogue. For example, the `r zz %>% group_by(Inventory) %>% summarise(n = n()) %>% arrange(desc(n)) %>% first() %>% dplyr::select(Inventory) %>% as.character()` inventory [@Fusco:2023aa] contains most, i.e. `r zz %>% group_by(Inventory) %>% summarise(n = n()) %>% arrange(desc(n)) %>% first() %>% dplyr::select(n) %>% as.numeric()`, large landslides, and its 95% HDI is narrowest (`r post_k_grp %>% arrange(whdi_grp_k) %>% first() %>% dplyr::select(hdi_lo_grp_k) %>% as.numeric() %>% round(2)` \< $k_j$ \< `r post_k_grp %>% arrange(whdi_grp_k) %>% first() %>% dplyr::select(hdi_hi_grp_k) %>% as.numeric() %>% round(2)`). In contrast, the `r zz %>% group_by(Inventory) %>% summarise(n = n()) %>% arrange(n) %>% first() %>% dplyr::select(Inventory) %>% as.character()` [@Gorum:2014aa] catalogue has the fewest, i.e. `r zz %>% group_by(Inventory) %>% summarise(n = n()) %>% arrange(n) %>% first() %>% dplyr::select(n) %>% as.numeric()`, large landslides; its broad posterior distribution is thus informed more by the pooled estimate from all inventories together.

The mean of the Gamma prior distribution is $\bar{k} = \alpha_k / \beta_{k}$ by definition, and we derive the power-law exponent $\alpha$ from the identity $\alpha = 1 / k$. Similarly, we obtain the mean pooled posterior $\bar{\sigma} = \alpha_\sigma / \beta_{\sigma}$ from the sampled hyperparameters. We find that most of the 95% credible intervals of $k_j$ overlap with that of the mean $\bar{k}$ learned from the pooled model (grey vertical line, flanked by dashed lines marking its 95% HDI). Only two inventories, i.e. one on historic rock avalanches in the St. Elias mountains of Alaska, United States [St. Elias USA, @10.3389/feart.2020.00293], and one on landslides triggered by the 1920 Haiyuan earthquake, China [@Xu:2020aa], stand out with a $k_j$ that is credibly below that of the population average. 

Estimates of $k_j$ can differ credibly between inventories in the same geographic region such as western Canada and Alaska, for example when comparing St. Elias USA [@10.3389/feart.2020.00293]; Kluane, CAN-USA (W. Smith, pers. comm.); and M7.9 Alaska USA [@Gorum:2014aa]. On the contrary, inventories covering very different geographic regions and time spans can have largely overlapping, and thus statistically indifferent, posterior distributions of $k_j$. This is the case, for example, for a catalogue of landslides triggered by the M7.6 Kashmir earthquake in 2005 [M7.6 PAK 3, @Basharat:2016aa], and one on mostly Quaternary landslides in the Caspian Sea basin [Caspian Sea KAZ, @10.1130/G38259.1]. Similarly, the inventory of rainfall-triggered landslides in far western Nepal covering 79 time steps between 2002 and 2018 [Far Western NPL, @alberto_munoz_torrero_manchado_2020_4290100], and the one on landslides following the 2018 M7.5 Porgera earthquake in Papua New Guinea, a database fully compiled by a deep learning algorithm [Porgera* PNG, @Bhuyan:2023aa], have indistinguishable posterior distributions of $k_j$.

```{r Posterior_sig, fig.height = 8, fig.width = 6, warning = FALSE, fig.cap = "Posterior roll-over in terms of the GPD scale parameter. White circles are the medians per inventory, and black horizontal lines are 95% HDIs; vertical grey line is the posterior median of the pooled model, and dashed lines delimit the 95% HDIs. Red vertical dashed line is the landslide size threshold of 0.1 km$^2$."}
# Plot estimates of scale parameter sigma 
post_sig <- posterior$sigma
colnames(post_sig) <- unique(zz$Inventory)
post_sig <- melt(post_sig, value.name = "sig")
colnames(post_sig) <- c("iterations", "Inventory", "sig")

post_sig %>% 
  group_by(Inventory) %>% 
  mutate(md_grp_sig = median(sig)) %>% 
  ggplot(aes(x = sig, 
             y = fct_reorder(Inventory, md_grp_sig), 
             fill = fct_reorder(Inventory, md_grp_sig))) +
  geom_vline(xintercept =  median(post_hyper$a_sigma / post_hyper$b_sigma), 
             color = "darkgrey") +
  geom_vline(xintercept = HDIofMCMC(post_hyper$a_sigma / post_hyper$b_sigma),
             linetype = "dashed", color = "darkgrey") +
  geom_vline(xintercept = min_size, 
             linetype = "dashed", color = "red") +
  stat_halfeye(
    interval_size = 0.5, 
    shape = 21,
    point_color = "red",
    point_fill = "white",
    point_size = 1.5,
    show.legend = FALSE
  ) +
  scale_fill_viridis_d() +
  scale_x_log10(expression(paste(sigma[j], " (", km^2, ")")),
        breaks = trans_breaks("log10", function(x) 10 ^ x),
        labels = trans_format("log10", math_format(10 ^ .x))) +
  labs(y = NULL) +
  theme_bw() +
  theme(aspect.ratio = 4)
```

```{r}
# Group summary of posterior sigma
post_sig_grp <- post_sig %>%
  group_by(Inventory) %>%
  summarise(md_grp_sig = median(sig),
            hdi_lo_grp_sig = HDIofMCMC(sig)[1],
            hdi_hi_grp_sig = HDIofMCMC(sig)[2],
            whdi_grp_sig = HDIofMCMC(sig)[2] - HDIofMCMC(sig)[1])
```

The spread of the posterior scale parameter $\sigma_j$ is more pronounced across the inventories such that the pooled estimate overlaps with those of seven inventories only. Inventory-specific medians range over two orders of magnitude from $\tilde{\sigma}_j$ = `r post_sig_grp %>% arrange(md_grp_sig) %>% first() %>% dplyr::select(md_grp_sig) %>% as.numeric() %>% round(2)` km^2^ in the `r post_sig_grp %>% arrange(md_grp_sig) %>% first() %>% dplyr::select(Inventory) %>% unlist() %>% as.character()` catalogue [@Xu:2014aa] to $\tilde{\sigma}_j$ = `r post_sig_grp %>% arrange(md_grp_sig) %>% last() %>% dplyr::select(md_grp_sig) %>% as.numeric() %>% round(2)` km^2^ in the `r post_sig_grp %>% arrange(md_grp_sig) %>% last() %>% dplyr::select(Inventory) %>% unlist() %>% as.character()` catalogue [@10.1130/G38259.1]. Again, inventories with different environmental settings and landslide triggers have very similar posterior distributions of $\sigma_j$, such as the one on landslides triggered during Typhoon Morakot, Taiwan, in 2009 [TC Morakot TWN, @emberson_insights_2022], and the one on landslides triggered by the M7.6 Kashmir earthquake in 2005 [M7.6 PAK 3, @Basharat:2016aa]. This also holds for different inventories addressing the same triggering event, such as the 2008 Wenchuan earthquake, for example M7.9 Sichuan CHN 1 [@Xu:2014aa] and M7.9 Sichuan CHN 2 [@https://doi.org/10.1002/2013GC005067]. Higher values of $\tilde{\sigma}_j$ especially identify inventories with a less pronounced, or more limited range of, power-law scaling for the large landslides recorded. Examples include many of the inventories containing Quaternary landslides such as those of the Caspian Sea [Caspian Sea KAZ, @10.1130/G38259.1] or the Columbia River basins [Owyhee USA, @Safran:2011aa], but also the above-mentioned inventories of rock avalanches that happened in the past few decades (St. Elias USA and Kluane CAN-USA).

## Pooled estimates of landslide scaling

The pooled estimates in our multi-level model express the variance of the learned parameters across all inventories. The sampled hyperparameters of $k_j$ that describe the shape $\alpha_k$ and rate (or inverse scale) $\beta_{k}$ of the Gamma-distributed parameter $k_j$ are positively correlated; the same applies for the hyperparameters $\alpha_{\sigma}$ and $\beta_{\sigma}$.

```{r Hyper_k, fig.height = 2, fig.width = 3.5, fig.cap = 'a. Posterior distribution of the hyperparameters of the Gamma distribution from which the inventory-specific posterior $k_j$ are drawn. b. Posterior distribution of the hyperparameters of the Gamma distribution from which the inventory-specific posterior of the roll-over are drawn.'}
# Plot hyperparameter estimates of k
p_hyp_k <- ggplot(post_hyper, aes(x = alpha_k, y = beta_k)) + 
  scale_fill_gradient(low = "purple", high = "orange") +
  labs(fill = expression(n[s]),
       x = expression(alpha[k]),
       y = expression(beta[k])) +
  geom_hex(alpha = 0.8) +
  theme_bw() +
  theme(aspect.ratio = 1,
        legend.key.size = unit(0.1, "cm"),
        legend.key.height = unit(0.1, "cm"),
        legend.key.width = unit(0.4, "cm"),
        legend.position = "top")

# Plot hyperparameter estimates of sigma
p_hyp_sig <- ggplot(post_hyper, aes(x = a_sigma, y = b_sigma)) + 
  scale_fill_gradient(low = "purple", high = "orange") +
  labs(fill = expression(n[s]),
       x = expression(alpha[sigma]),
       y = expression(beta[sigma])) +
  geom_hex(alpha = 0.8) +
  theme_bw() +
  theme(aspect.ratio = 1,
        legend.key.size = unit(0.1, "cm"),
        legend.key.height = unit(0.1, "cm"),
        legend.key.width = unit(0.4, "cm"),
        legend.position = "top")

plot_grid(p_hyp_k, p_hyp_sig, align = "h",
          labels = c("a", "b"))
```

```{r Pooled_k, warning = FALSE, fig.cap = 'a. Prior and posterior distributions of power-law scaling exponent for the pooled inventory data. White circle is pooled posterior median and black horizontal line is 95% HDI. b. Prior and posterior distributions of roll-over for the pooled inventory data. White circle is pooled posterior median and black horizontal line is 95% HDI.'}
# Posterior mean of k
p_pool_k <- ggplot(post_hyper %>% 
    mutate(
      a_k_prior = abs(rnorm(nrow(post_hyper), 6, 1)), # Gaussian hyperprior
      b_k_prior = abs(rnorm(nrow(post_hyper), 9, 1)), # Gaussian hyperprior
      a_prior = 1 / (a_k_prior / b_k_prior),
      a_post = 1 / (alpha_k / beta_k))
    ) +
    # prior
    stat_slab(aes(x = a_prior),
               slab_fill = "grey",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
    # posterior
    stat_halfeye(aes(x = a_post),
               interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "orchid",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
  xlim(0.75, 3) +
  ylim(0, 3.25) +
  annotate("text", x = 1, y = 0.9, label = "Prior", color = "grey") +
  annotate("text", x = 2, y = 1.5, label = "Posterior", color = "orchid") +
  labs(x = expression(paste("Mean pooled scaling exponent ", 
                            alpha)), 
       y = expression(paste("p(", alpha, ")"))) +
  theme_bw() +
  theme(aspect.ratio = 1/4)

# Posterior mean of sigma
p_pool_sig <- ggplot(post_hyper %>% 
    mutate(
      a_sig_prior = abs(rnorm(nrow(post_hyper), 1, 0.25)), # Gaussian hyperprior
      b_sig_prior = abs(rnorm(nrow(post_hyper), 5, 5)),  # Gaussian hyperprior
      sig_prior = a_sig_prior / b_sig_prior,
      sig_post = a_sigma / b_sigma)
    ) +
    # prior
    stat_slab(aes(x = sig_prior),
               slab_fill = "grey",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
    # posterior
    stat_halfeye(aes(x = sig_post),
               interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "darkorange",
               slab_alpha = 0.75,
               show.legend = FALSE,
               normalize = "none"
               ) +
  xlim(0, 0.75) +
  ylim(0, 10) +
  annotate("text", x = 0.1, y = 6, label = "Prior", color = "grey") +
  annotate("text", x = 0.275, y = 9, label = "Posterior", color = "darkorange") +
  labs(x = expression(paste("Mean pooled scale parameter ", 
                            sigma, " (", km^2, ")")),
       y = expression(paste("p (", sigma, ") (", km^-2, ")"))) +
  theme_bw() +
  theme(aspect.ratio = 1/4)

plot_grid(p_pool_k, p_pool_sig, align = "v", ncol = 1,
          labels = c("a", "b"))
```

We find that the numerical approximation of the joint posterior distribution has a distinct maximum. We obtain a mean power-law exponent of `r round(HDIofMCMC(1 / (post_hyper$alpha_k / post_hyper$beta_k))[1], 2)` \< $\bar{\alpha}$ \< `r round(HDIofMCMC(1 / (post_hyper$alpha_k / post_hyper$beta_k))[2], 2)` across all inventories with 95% probability; the posterior median of $\bar{\alpha}$ is `r round(median(1 / (post_hyper$alpha_k / post_hyper$beta_k)), 2)`. Compared to the prior distribution based on published values of the scaling parameter, our model has gained more certainty from the data considered in this study, yielding a much narrower posterior. The mean scaling parameter is `r round(HDIofMCMC(post_hyper$a_sigma / post_hyper$b_sigma)[1], 2)` \< $\bar{\sigma}$ \< `r round(HDIofMCMC(post_hyper$a_sigma / post_hyper$b_sigma)[2], 2)` km^2^ across all inventories with 95% probability; the posterior median of $\bar{\sigma}$ is `r round(median(post_hyper$a_sigma / post_hyper$b_sigma), 2)` km^2^. This posterior shifted up from the prior distribution that we centered on our arbitrary size threshold for large landslides. Our model has learned much from the inventory data compared to the priors, especially concerning the high variance of $\sigma_j$ across the individual landslide catalogues.

## Comparison with maximum likelihood estimate

To assess the sensitivity of our model results to our choice of inference, we compare our results to maximum likelihood estimates (MLE) of the exponent $\alpha_j$ of the inverse power-law distribution, based on the Hill estimator [@Clauset:2009iy]. By definition, the MLE standard error for each inventory decays with the inverse square root of sample size, whereas the Bayesian estimates are informed by all data via the multi-level model structure. Hence we do not expect a 1:1 correspondence from this comparison. Instead, it underlines how variable and uncertain landslide scaling estimates can be for different inventories, regardless of method.

```{r Bayes_MLE, warning = FALSE, fig.cap = 'Comparison between posterior GPD estimates and the maximum likelihood estimator (MLE) of the scaling exponent for each inventory. Dashed 1:1 line aids visual comparison. Color scale coded to the fraction of (discarded) landslides below the size threshold in each inventory; bubbles are scaled to sample size of large landslides used for parameter estimation. Grey vertical bars span two standard deviations about the mean; grey horizontal bars encompass the 95% HDI. Axes are scaled equally.'}
# MLE estimate for power-law alpha 
# Equation 3.1 in Clauset et al. (2009)
MLE_alpha <- zz %>% 
  group_by(Inventory) %>% 
  summarise(n = n(),
            MLE = 1 + n / sum(log(xx / min_size)),
            MLE_sd = (MLE - 1) / sqrt(n)) # Appendix B in Clauset et al.

# Posterior medians of k
posterior_k <- post_k %>% 
  group_by(Inventory) %>% 
  summarise(md_grp_alpha = median(1 / k),
            lo_grp_alpha = HDIofMCMC(1 / k)[1],
            hi_grp_alpha = HDIofMCMC(1 / k)[2])

# Posterior medians of sigma
posterior_sig <- post_sig %>% 
  group_by(Inventory) %>% 
  summarise(md_grp_sig = median(sig),
            lo_grp_sig = HDIofMCMC(sig)[1],
            hi_grp_sig = HDIofMCMC(sig)[2])

modcomp <- merge(mydat, MLE_alpha)
modcomp <- merge(modcomp, posterior_k)
modcomp <- merge(modcomp, posterior_sig)
modcomp <- merge(modcomp, post_logodds)

ggplot(modcomp, aes(y = MLE - 1, # Offset due to definition in Clauset et al.
                    x = md_grp_alpha, 
                    size = n,
                    col = `n < threshold` / (n + `n < threshold`))) +
  geom_errorbar(aes(ymax = MLE - 1 + 2 * MLE_sd, 
                     ymin = MLE - 1 - 2 * MLE_sd),
                 col = "grey", linewidth = 0.5) + 
  geom_errorbarh(aes(xmax = hi_grp_alpha, 
                    xmin = lo_grp_alpha),
                col = "grey", linewidth = 0.5) + 
  geom_point() +
  geom_point(pch = 21, color = "black") +
  scale_color_viridis_c(begin = 0.95, end = 0.15) +
  geom_abline(intercept =  0, slope = 1, linetype = "dashed") +
  geom_text_repel(aes(label = ifelse(md_grp_alpha > 3, 
                               Inventory, ""),
                      bg.color = "white", bg.r = 0.2), 
            hjust = -0.1, vjust = -0.5, col = "black") +
  labs(y = expression(paste("MLE of ", alpha)), 
       x = expression(paste("Posterior ", alpha[j])),
       size = "Large\nlandslides\nin inventory",
       col = expression(paste("Fraction of\nlandslides < ", mu))) +
  # xlim(0, 5) +
  # ylim(0, 5) +
  scale_x_continuous(breaks = seq(0, 8, 1)) +
  coord_fixed() +
  theme_bw() +
  theme(legend.position = "top")
```

We obtain inventory-specific MLEs of `r round(min(modcomp$MLE - 1), 2)` \< $\hat\alpha_j$ \< `r round(max(modcomp$MLE - 1), 2)`. This spread encompasses most reported values in the literature [@https://doi.org/10.1029/2019EA000662]. In contrast, the posterior medians of $\alpha_j = 1/k_j$ occupy a seemingly broader range (`r round(min(modcomp$md_grp_alpha), 2)` \< $\tilde\alpha$ \< `r round(max(modcomp$md_grp_alpha), 2)`), though nominally similar to that of the MLE method for most inventories within the respective errors. However, the coefficient of variation is narrower for the Bayesian median estimates. Two inventories stand out with very high scaling exponents, e.g. `r post_k %>% group_by(Inventory) %>% summarise(md_grp_k = median(k)) %>% arrange(md_grp_k) %>% select(Inventory) %>% first() %>% pull() %>% as.character()` and `r post_k %>% group_by(Inventory) %>% summarise(md_grp_k = median(k)) %>% arrange(md_grp_k) %>% select(Inventory) %>% nth(2) %>% pull() %>% as.character()`. Both inventories have only few landslides that we censored because they were below the size threshold $\mu$ = 0.1 km^2^; in other words, these catalogues contained mostly large landslides originally.


# Discussion

## Implications

Our results show that posterior shape parameters $k_j = 1/\alpha_j$ of landslide-size distributions may vary despite being informed by thousands of data points and previous research. The narrowest 95% HDI of $\alpha_j$, and thus the best we can constrain this parameter, is that of the `r post_alpha_grp %>% arrange(whdi_grp_alpha) %>% first() %>% select(Inventory) %>% unlist() %>% as.character()` catalogue with `r post_alpha_grp %>% arrange(whdi_grp_alpha) %>% first() %>% select(hdi_lo_grp_alpha) %>% as.numeric() %>% round(2)` \< $\alpha_j$ \< `r post_alpha_grp %>% arrange(whdi_grp_alpha) %>% first() %>% select(hdi_hi_grp_alpha) %>% as.numeric() %>% round(2)`. This numerical range has much overlap with that of previously reported scaling exponents that were obtained for mostly smaller landslides, however [@https://doi.org/10.1029/2019EA000662]. Still, the nearly triangular posterior distribution has much of its probability mass near its peak, and the same goes for the pooled estimate. Except for a few inventories, the corresponding posterior distributions for other landslide inventories also remain largely indistinguishable from the pooled estimate. This low variance of $\alpha_j$ across inventories is striking if we consider the diverse mapping techniques, levels of data quality, coverage, environmental setting, and landslide triggers. The inventories we selected cover several climatic zones with different vegetation and land cover characteristics that likely affect the detection and mapping or large landslides. Some inventories were generated from deep learning algorithms [@Bhuyan:2023aa], whereas most others were mapped manually. While many posterior $\sigma_j$ have values close to our arbitrary size threshold for large landslides of 0.1 km^2^, the variance in this parameter is high compared to the pooled estimates. Overall, the median estimates of both $\alpha_j$ and $\sigma_j$ are unaffected by the number of large landslides reported in a given inventory.

Our model highlights several inventories with scaling statistics that stand out. The `r modcomp %>% filter(md_grp_alpha > 3) %>% select(Inventory) %>% .[1, ]` and `r modcomp %>% filter(md_grp_alpha > 3) %>% select(Inventory) %>% .[2, ]` catalogues have high estimates of $\alpha_j$, whereas `r modcomp %>% filter(md_grp_sig > 1) %>% select(Inventory) %>% .[1, ]`, `r modcomp %>% filter(md_grp_sig > 1) %>% select(Inventory) %>% .[2, ]`, and `r modcomp %>% filter(md_grp_sig > 1) %>% select(Inventory) %>% .[3, ]` have high estimates of $\sigma_j$. These inventories consist almost exclusively of large landslides and feature very few other data below the size threshold $\mu$. In contrast, we observe that inventories with most landslides falling below $\mu$ have low values of $\sigma_j$ consistently. We infer that landslide catalogues that were focused on compiling information about larger landslides tend to have elevated values of $\sigma_j$. In this context, our selection of inventories is nearly balanced: twenty of them have less than 10% large landslides, whereas seventeen of them have more than 50%. 

```{r Inventory_type, warning = FALSE, fig.cap = 'Median posterior parameter estimates as a function of the number of large landslides per inventory and the fraction of (discarded) landslides below the size threshold.'}
ggplot(modcomp, 
       aes(x = md_grp_alpha,
           y = md_grp_sig, 
           size = n,
           col = `n < threshold` / (n + `n < threshold`),
           #col = `Min. size (sqkm)`,
           label = Inventory)) +
  geom_errorbarh(aes(xmax = hi_grp_alpha, 
                    xmin = lo_grp_alpha),
                col = "grey", linewidth = 0.5) +
  geom_errorbar(aes(ymax = hi_grp_sig, 
                    ymin = lo_grp_sig),
                col = "grey", linewidth = 0.5) +
  scale_color_viridis_c(begin = 0.95, end = 0.15) +
  labs(x = expression(paste("Posterior ", alpha[j])),
       size = "Large\nlandslides",
       col = expression(paste("Fraction of\nlandslides < ", mu))) +
  geom_point() +
  geom_point(pch = 21, color = "black") +
  geom_text_repel(aes(label = ifelse(md_grp_alpha > 3 | md_grp_sig > 1, 
                               Inventory, ""),
                      bg.color = "white", bg.r = 0.2), 
            hjust = -0.1, vjust = -0.5, col = "black") +
  scale_x_continuous(breaks = seq(0, 8, 1)) +
  scale_y_log10(expression(paste("Posterior ", 
                                 sigma[j], " (", km^2, ")")),
        breaks = trans_breaks("log10", function(y) 10 ^ y),
        labels = trans_format("log10", math_format(10 ^ .x))) +
  theme_bw() +
  theme(aspect.ratio = 1)
```

One possible explanation for these outlying landslide-size statistics is that the GPD is a poor fit to inventories that mainly feature large landslides, at least for the chosen threshold $\mu$. The residuals for most of these inventories show pronounced underestimates for the smallest landslide range, except for the South Central CHL and Caspian Sea KAZ catalogues. Yet, other inventories with similar residuals (e.g. Glacier Bay N.P. CAN) hardly stand out compared to the pooled estimates. Clearly, extrapolating the model across the full size range, for example to infer the number of seemingly missing, underreported, or overlooked landslides [@Tanyas:2019aa] can be misleading in these cases. Another explanation for the high estimates of $k_j$ and $\sigma_j$ is that the original mapping was focused on landslides well beyond the size threshold we chose, so that slope failures close to this cutoff might be undersampled. Some of these inventories also contain partly overlapping slope-failure deposits of multiple ages, marking several phases of reactivation. Such overlaps may cause more landslides to surpass the size threshold. Hence, the mapping objective would partly bias estimates of $\sigma_j$ for a size threshold that is too low.

```{r Trigger_type, fig.cap = 'Median posterior parameter estimates as a function of inventory size and type of trigger.'}
ggplot(modcomp, 
       aes(x = md_grp_alpha,
           y = md_grp_sig, 
           size = n,
           fill = Trigger,
           label = Inventory)) +
  geom_errorbarh(aes(xmax = hi_grp_alpha, 
                    xmin = lo_grp_alpha),
                col = "grey", linewidth = 0.5) +
  geom_errorbar(aes(ymax = hi_grp_sig, 
                    ymin = lo_grp_sig),
                col = "grey", linewidth = 0.5) +
  scale_fill_brewer(palette = "PuOr") +
  labs(x = expression(paste("Posterior ", alpha[j])),
       size = "Large\nlandslides",
       bg = "Landslide\ntrigger") +
  geom_point() +
  geom_point(pch = 21, color = "black") +
  geom_text_repel(aes(label = ifelse(md_grp_alpha > 3 | md_grp_sig > 1, 
                               Inventory, ""),
                bg.color = "white", bg.r = 0.2), 
            hjust = -0.1, vjust = -0.5) +
  scale_x_continuous(breaks = seq(0, 8, 1)) +
  scale_y_log10(expression(paste("Posterior ", 
                                 sigma[j], " (", km^2, ")")),
        breaks = trans_breaks("log10", function(y) 10 ^ y),
        labels = trans_format("log10", math_format(10 ^ .x))) +
  theme_bw() +
  theme(aspect.ratio = 1)
```

We also find that the 95% credible intervals of both GDP parameters overlap for landslide inventories regardless of whether they are attributed to recent earthquakes and rainstorms, or whether they integrate landslide observations, and thus likely various triggers, over many years. We infer that the scaling statistics disclose very little about the type of landslide trigger. In this context, our findings caution against a mechanistic interpretation of the scaling parameters. 


## Role of size threshold and sample size

The heavy-tailed distribution of landslides means that we discarded many samples for small increases in $\mu$. To test the sensitivity of our results to the choice of size threshold $\mu$, we replicated our analyses, and recorded the variation of the pooled estimates $\bar{k}$ and $\bar{\sigma}$ as a function of both $\mu$ and the minimum number of large landslides that an inventory needs to have to be considered in our analysis.

```{r include=FALSE}
# Import posterior hyperparameter estimates
hyparam <- read_csv("lsinv_hyparam.csv")
hyparam <- hyparam %>% 
  group_by(mu, grpsize) %>%
  mutate(model_id = cur_group_id())
```

```{r, warning = FALSE, fig.cap = 'Effect of arbitrarily fixed size threshold and minimum sample size $n$ on posterior estimates of the scaling exponent and the roll-over.'}
mingrp_labeller <- as_labeller(
  c(`5` = "n > 5",
    `10` = "n > 10",
    `25` = "n > 25",
    `50` = "n > 50",
    `100` = "n > 100")
)

p_sens1 <- ggplot(hyparam, aes(x = 1 / a_post, y = as.factor(mu))) +
  labs(x = expression(paste("Pooled ", alpha)),
       y = expression(paste(mu, " (", km^2, ")"))) +
  stat_halfeye(interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "orchid",
               slab_alpha = 0.75,
               normalize = "panels",
               show.legend = FALSE) + 
  xlim(1, 2.25) +
  scale_y_discrete(breaks = seq(0.1, 0.3, 0.1)) +
  theme_bw() +
  theme(aspect.ratio = 1, legend.position = "none") +
  facet_wrap( ~ grpsize, ncol = 5, labeller = mingrp_labeller)

p_sens2 <- ggplot(hyparam, aes(x = sigma_post, y = as.factor(mu))) +
  labs(x = expression(paste("Pooled ", sigma, " (", km^2, ")")),
       y = expression(paste(mu, " (", km^2, ")"))) +
  stat_halfeye(interval_size = 0.5,
               shape = 21,
               point_color = "red",
               point_fill = "white",
               point_size = 1.5,
               slab_fill = "darkorange",
               slab_alpha = 0.75,
               normalize = "panels",
               show.legend = FALSE) + 
  xlim(0.1, 1.5) +
  scale_y_discrete(breaks = seq(0.1, 0.3, 0.1)) +
  theme_bw() +
  theme(aspect.ratio = 1, legend.position = "none") +
  facet_wrap( ~ grpsize, ncol = 5, labeller = mingrp_labeller)

plot_grid(p_sens1, p_sens2, align = "v", ncol = 1,
          labels = c("a", "b"))
```

We find that varying the size threshold such that 0.075 km^2^ < $\mu$ < 0.3 km^2^ returns posterior pooled values of $\alpha$ that decrease slightly with increasing $\mu$ and the minimum number of samples per inventory, though with much overlap. Overall, `r round(HDIofMCMC(1 / hyparam$a_post)[1], 2)` $<\bar{\alpha}<$ `r round(HDIofMCMC(1 / hyparam$a_post)[2], 2)` with 95% probability regardless of the threshold or sample size that we pick. Estimates of $\bar{\sigma}$ increase sightly with $\mu$, but also with some overlap. Preferring larger inventories for a given size threshold reduces the total sample size such that the pooled posterior distributions of $\sigma$ get broader. 

We infer that the choice of $\mu$, and hence the definition of "large" landslides [@McColl:2024aa], has limited influence on the scaling statistics, and especially $\bar{\alpha}$. Values of $\mu$ below the range that we tested violate the assumption of a high threshold such that a GPD would be inappropriate, whereas values above this range suffer from too small sample sizes. The pooled scale parameter $\bar{\sigma}$ shows less variation with $\mu$ and has largely overlapping posteriors. Hence, the choice of the minimum number of large landslides per inventory ($n_{\mathrm{min}}$ = `r min_grp_size`) limits both the number of levels in our model and the overall sample size. Admitting more inventories that contain fewer large landslides changes the posterior $\bar{k}$ and $\bar{\sigma}$ only slightly, but does narrow the uncertainties, especially for higher thresholds $\mu$.


## Advantages

Our Bayesian multi-level approach expands on previous, though largely separate, efforts of comparing landslide size statistics across different inventories. We offer here a single, consistent model that has several advantages:

First, the Bayesian setup can handle the small sample problem of large landslides. Scaling parameters for large landslides from a single landslide inventory are commonly estimated from extrapolating model fits that largely draw on more numerous, smaller landslides. Yet, even simulated power-law distributed data have natural scatter in the largest of observations [@Clauset:2009iy], making it difficult to validate extrapolations and leading to over-confident parameter estimates. Using data from other landslide inventories to validate these estimates often tacitly assumes that the scaling parameters agree or differ within error, but offers no way of determining of whether this assumption is at all valid. The multi-level model instead draws on the larger sample size from all inventories, and explicitly refines this shared knowledge in dedicated posterior distributions for each catalogue. In general, group-level parameter estimates tend to be closer to the pooled mean than those derived for separate models using fewer data from each group alone. This effect is known as parameter shrinkage and guards against overfitting, especially for inventories with fewer data.

Second, the Bayesian treatment documents explicitly all parameter uncertainties, and especially those that capture previous knowledge about reported parameters that characterise landslide size distributions. We can thus quantify how much we have learned from the data by comparing the posterior and prior distributions. By design, a Bayesian model seeks a compromise between these previous findings and the data considered here in a probabilistically consistent way. To this end, we made sure to include mostly recently published landslide inventories or those that had not been considered in scaling studies before. Posterior parameter estimates for catalogues with few large landslides are broader compared to those with many large landslides.

Third, the multi-level model structure enables direct comparison of parameter estimates across and between landslide inventories. Any differences in the underlying workflows of detecting and mapping landslides and the commensurate sample sizes are being accounted for by separate posterior distributions and their deviation from the pooled estimates. Our model measures objectively how similar landslide inventories are in terms of the scaling parameters that jointly, instead of separately, define the size distributions of large landslides.

Fourth, the peak-over-threshold approach that defines the GPD is rooted in extreme-value theory and thus expresses naturally what we expect statistically in the size distribution of large landslides. The parameters of the GPD contain information about the "roll-over" location and power-law scaling, and translate readily into parameters of other distributions used to characterise the size scaling of landslides.


## Further applications

We can flexibly modify our multi-level model in several ways. One option is to group the data in other ways than by inventories. For example, we can specify the group levels such that they represent the type of inventory (event-based versus integrating over a period of time); triggering mechanism (earthquake, rainstorm, snowmelt, etc.); soil or rock type; or any other characteristic that may have been collected during the process of compiling the landslide inventory. We recall that the GPD is by definition "blind" to data below the threshold $\mu$ in that it truncates all observations below the threshold. One alternative to improve the parameter estimates would be to use a censored GPD model that acknowledges the presence of smaller landslides, though without knowing their reported sizes.


# Conclusions

- We propose a multi-level model as common ground for consistently estimating and comparing size distributions of large slope failures across different inventories. In choosing a peak-over-threshold approach the Generalised Pareto distribution (GPD) reflects what we would expect statistically from a given landslide size distribution. The multi-level setup remediates the problem of low sample size by making use of all available data for estimating scaling parameters while acknowledging inherent differences across inventories.

- Our model results based on `r stan_d$L` inventories with `r stan_d$N` large landslides ($>$ 0.1 km^2^) show that the power-law exponent for each inventory $k_j = 1 / \alpha_j$ discloses little about the different underlying landslide trigger(s), geographic region, or time span concerning a given inventory, i.e. whether it is event-based or compiles landslides of many different ages. Inventories of mostly undated, prehistoric landslides have scaling exponents $\alpha_j$ that hardly differ from those of historic, event-based catalogues.

- Despite thousands of large landslides to learn from, the uncertainty about $\alpha_j$ spans several decimal points. If taking all inventories together, the pooled $\alpha$ captures most of this variance. The location parameter $\sigma$ has more spread across inventories and is affected especially by those with with few or no landslides below our size threshold. We infer that $\sigma$ is sensitive to the desired landslide size range and likely reflects the influence of mapping choices. 
- While several numerical modelling studies have attributed a physical meaning to scaling statistics of landslide size, we argue that some of this meaning might get diluted or even lost in empirical data that may combine confounding controls. We surmise that landslide inventories record these physical processes, though in a mixed way that could admit, for example, different failure types, rock and soil types, and groundwater conditions. We suspect that most landslide inventories may have mixed size distributions. For example, mixing data from inventories with differing size thresholds could add variance to $\sigma$. Hence, the resulting scaling statistics would at best reflect bulk physical characteristics instead of clearly representing parameters of a single, well-constrained physical model of slope stability.

- Our model measures objectively by how much scaling statistics differ across inventories within estimation error. Such differences can be vital if using scaling statistics for predicting future landslide hazard in terms of size and frequency [@nhess-23-3051-2023].

# Acknowledgements

We used the statistical programming environment __R__ (<https://cran.r-project.org>) with _RStudio_ (<https://posit.co/>) as a frontend for all data analysis, and coded the Bayesian GPD in the probabilistic language STAN (<https://mc-stan.org/>); all are freely available. The landslide inventories studied here have either been published as indicated or are available directly from the original contributors. Will Smith and Stuart Dunning kindly shared data on landslides in Kluane National Park. Part of this research was funded by the German Research Foundation via the Research Training Group _NatRiskChange_ (DFG GRK 2043, <https://www.uni-potsdam.de/en/natriskchange/>).

# References