Natural and human-induced landslides in a tropical mountainous region: the Rift flank west of Lake Kivu (DR Congo)

15 Tropical mountainous regions are often identified as landslide hotspots with particularly vulnerable populations. Anthropogenic factors are assumed to play a role in the occurrence of landslides in these populated regions, yet the relative importance of these human-induced factors remains poorly documented. In this work, we aim to explore the impact of forest cover dynamics, roads and mining activities on the occurrence of landslides in the Rift flank west of Lake Kivu in the DR Congo. To do so, we compile an inventory of 2730 landslides using © Google 20 Earth imagery, high resolution topographic data, historical aerial photographs from the 1950’s and extensive field surveys. We identify old and recent (post 1950’s) landslides, making a distinction between deep-seated and shallow landslides, road landslides and mining landslides. We find that susceptibility patterns and area distributions are different between old and recent deep-seated landslides, which shows that natural factors contributing to their occurrence were either different or changed over time. Observed shallow landslides are recent 25 processes that all occurred in the past two decades. The analysis of their susceptibility indicates that forest dynamics and the presence of roads play a key role in their regional distribution pattern. Under similar topographic conditions, shallow landslides are more frequent, but of smaller size, in areas where deforestation has occurred since the 1950’s as compared to shallow landslides in forest areas, i.e. in natural environments. We attribute this size reduction to the decrease of regolith cohesion due to forest loss, which allows for a smaller minimum critical 30 area for landsliding. In areas that were already deforested in 1950’s, shallow landslides are less frequent, larger, and occur on less steep slopes. This suggests a combined role between regolith availability and soil management practices that influence erosion and water infiltration. Mining activities increase the odds of landsliding. Mining and road landslides are larger than shallow landslides but smaller than the recent deep-seated instabilities. The susceptibility models calibrated for shallow and deep-seated landslides do not predict them well, highlighting that 35 they are controlled by environmental factors that are not present under natural conditions. Our analysis demonstrates the role of human activities on the occurrence of landslides in the Lake Kivu region. Overall, it highlights the need to consider this context when studying hillslope instability characteristics and distribution patterns in regions under anthropogenic pressure. Our work also highlights the importance of considering the timing of landslides over a multi-decadal period of observation. 40


Introduction
Tropical mountainous regions are often identified as landslide hotspots with particularly vulnerable populations (Broeckx et al., 2018;Froude and Petley, 2018;Emberson et al., 2020). Nevertheless, the current knowledge on landslide processes in these regions remains limited as it is mostly derived from susceptibility models made at continental or global levels (Stanley and Kirschbaum, 2017;Broeckx et al., 2018). Because they are not based on The growing demographic pressure and widespread land use and land cover (LULC) changes are expected to increase the frequency and impacts of landslides in tropical mountainous regions, especially in rural environments DeFries et al., 2010;Mugagga et al., 2012;Guns and Vanacker, 2014;Froude and Petley, 2018;50 Depicker et al., 2021a;Muñoz-Torrero Manchado et al., 2021). Deforestation and the associated loss of tree roots lower the slope stability by decreasing regolith cohesion and altering drainage patterns (Sidle and Bogaard, 2016). Mining, quarrying and road construction alter the environment through undercutting of hillslopes, overloading, landfills and inadequate drainage systems. This increases the landside activity, particularly in the first years following the alteration of the landscape (e.g. Brenning et al., 2015;Arca et al., 2018;McAdoo 55 et al., 2018;Vuillez et al., 2018;Muñoz-Torrero Manchado et al., 2021). However, the exact impact of these anthropogenic factors on landslide processes depends on their timing and other environmental conditions such as slope angle and lithology (Depicker et al., 2021b). It is therefore important to further develop our understanding of landslides and their natural-and human-induced drivers.
To achieve this, a detailed multi-temporal regional landslide inventory spanning several decades is essential 60 (Guzzetti et al., 2012). In this regard, a distinction between deep-seated landslides and shallow landslides is important; the latter type is much more sensitive to LULC and its changes (Sidle and Bogaard, 2016). However, sufficiently long and precise multi-decadal records of LULC and landslide activity are rare, especially in tropical regions (e.g. Glade, 2003;Guns and Vanacker, 2014;Monsieurs et al., 2018;Shu et al., 2019). This important data gap is not easy to fill: global and regional LULC assessments derived from the first satellite data from the 70s and 65 80s offer a spatial resolution that is often too coarse for this purpose and very high resolution satellite data became available only at the end of the 90s at best (Belward and Skøien, 2015;Joshi et al., 2016).
Historical aerial photographs offer the best opportunity at the regional level to work across several decades, both to compile a landslide inventory but also to reconstruct LULC changes (Glade, 2003;Guns and Vanacker, 2014 ;Shu et al., 2019). It is complementary to very high spatial resolution satellite images such as those available on ©

70
Google Earth, which are widely used in the identification of landslides in many environments (e.g. Broeckx et al., 2018;Depicker et al., 2020). Fieldwork is also essential in order to validate observations made from the different image sources, to discriminate between deep-seated and shallow processes, or to confirm depth estimates (Dewitte et al., 2021). Field surveys also help to understand the role of human activities on slope dynamics (Dewitte et al., 2021).

75
The aim of this work is to explore the role played by natural and anthropogenic factors on the occurrence of landslides in a rural tropical mountainous region under high anthropogenic pressure. More specifically, we are interested in the Rift flank west of Lake Kivu, a region in the DR Congo where recent studies based on the sole and partial analysis of © Google Earth images have shown that landslides are frequent and that recent deforestation has impacted the occurrence of shallow landslides (Maki Mateso and Dewitte, 2014;Depicker et al., 2020;80 Depicker et al., 2021b). We aim to: (1) further develop the existing landslide dataset and compile a comprehensive detailed multi-temporal regional landslide inventory spanning several decades; (2) describe the general characteristics of the landslides, and (3) analyze landslide distributions and regional susceptibility according to different controlling factors, with special attention to multi-decadal forest cover dynamics. Historical aerial photographs and careful field surveys are key elements in this study.

Environmental settings and current knowledge of the landslide processes
The study is conducted in the Rift flank west of Lake Kivu in the DR Congo (Fig. 1a). It is one of the most seismic regions of the African continent, crossed by active faults and composed of six main rock types of varying age ( Fig.  1b) (Delvaux et al., 2017;Laghmouch et al., 2018). The presence of mineral resources (gold and 3T minerals -tin, 90 tantalum and tungsten) favours the proliferation of, often illegal, artisanal and small-scale mining and quarrying (Van Acker, 2005;Bashwira et al., 2014). Industrial mining is not present in the region and there is no new road construction associated with it (Bashwira et al., 2014).
The region has a tropical savannah/monsoon climate tempered by its altitude (Peel et al., 2007). The natural vegetation is mainly montane forest, still preserved in the Kahuzi-Biega National Park (Imani et al., 2017).

95
However, between the 17th and 18th century, the region began to suffer the first strong effects of human influence through deforestation (Nzabandora and Roche, 2015). There has been significant deforestation and forest loss in recent decades as well (Basnet and Vodacek, 2015;Depicker et al., 2021a). Selective cutting is done for energy needs, house construction, furniture production and dugout canoes. Clearcutting, mostly small-scale, is associated https://doi.org/10.5194/nhess-2021-336 Preprint. Discussion started: 17 November 2021 c Author(s) 2021. CC BY 4.0 License. with agriculture, mining and quarrying activities and road construction (Musumba Teso et al., 2019;Drake et al., 100 2019). After deforestation, the land is often permanently converted to agricultural land (cropland, grassland) or tree plantations (Depicker et al., 2021a). In some places, however, natural regeneration of the forest takes place (Masumbuko et al., 2012). The study area (~ 5,700 km²) is one of the most densely populated regions of the DR Congo with more than 200 inhabitants/km² living mainly from agriculture, mining and quarrying activities (Linard et al., 2012;Michellier et al., 2016;Trefon, 2016). This region plays a key role in the supply of food and charcoal to the smaller rural centers 110 and to the cities of Goma and Bukavu. Over the last decades, the population in both cities increased from a few tens of thousands to more than one million inhabitants (Michellier et al., 2016). The population growth in the study area was partly caused by the influx of Rwandan refugees in 1994-1995, as well as the growing artisanal mining industry that offers job opportunities (Bashwira et al., 2014;Van Acker, 2005;Butsic et al., 2015;).
Recent studies have highlighted the presence of many landslides in the region. Compiled from a limited number 115 of very-high spatial resolution © Google Earth images partially covering the study area, a first preliminary inventory of a few hundred landslides showed that the landslide processes are diverse (deep-seated, shallow, recent, old, active, inactive) and that their impacts can be high and associated with fatalities and serious damages to infrastructures (Maki Mateso and Dewitte, 2014). The inventory over the North Tanganyika-Kivu Rift region (hereafter called NTK Rift) of which our study area is a subregion was further expanded by Depicker et al. (2020) 120 through the use of © Google Earth imagery with a search time limited per image. This inventory consisted of shallow and deep-seated landslides but did not make a distinction between these two processes in the susceptibility https://doi.org/10.5194/nhess-2021-336 Preprint. Discussion started: 17 November 2021 c Author(s) 2021. CC BY 4.0 License.
analysis. Depicker et al. (2020) showed that, in addition to slope angle, land cover is a key landslide predictor in the NTK Rift region. A more detailed investigation of the annual evolution of the forest cover over the last 20 years showed that deforestation increases landslide erosion 2-8 times during a period of approximately 15 years 125 before it eventually falls back to a level similar to forest conditions (Depicker et al., 2021b). A catalogue of > 150 accurately dated landslide events over the last two decades was compiled for the NTK Rift, allowing to demonstrate the role of rainfall seasonality on the annual distribution of the occurrence of new landslides (Monsieurs et al., 2018;Dewitte et al., 2021). Among those landslide events, some are very large and consist of clusters of several hundreds to thousands of slope failures. The spatial extent of such a cluster can easily be larger 130 than 10 km². A few events like these occur during each wet season (Depicker et al., 2020;Dewitte et al., 2021). They are commonly associated with particularly intense convective rainfall (Monsieurs et al., 2018b). Most of these landslides are shallow and small and, due to quick vegetation regeneration and/or land reclamation, their scars can disappear after a few years (Depicker et al., 2020;Dewitte et al., 2021). None of the dated landslide events were triggered by earthquakes (Dewitte et al., 2021). This do not discard the role of earthquakes in triggering 135 landslides in the region, but instead this reminds us that the return period of earthquakes with a magnitude large enough to trigger slope instabilities can be much longer than a few decades (Delvaux et al., 2017). Their potential impact, rather localized compared to that of climatic drivers, can be inexistent during a narrow time window of observation (Delvaux et al., 2017;Dewitte et al., 2021;Depicker et al., 2021).
Landslides can also occur due to rock weathering and regolith formation (Dille et al., 2019). In other words, the 140 long-term evolution of these preconditioning drivers alone can explain that a slope can also fail without any apparent trigger. This implies that the many landslides that occur in isolation of other events must be interpreted with care in terms of origin. For these features, it is not clear from a visual analysis of the satellite images whether they can be directly linked to a direct trigger. In addition, many landslides occur in isolation along roads (Dewitte et al., 2021). Some of the larger, historical, landslides (i.e. landslides that do not appear active in our oldest source 145 of information) clearly occurred more than 10,000 years ago (Dewitte et al., 2021), i.e. over a period of time that underwent changes in environmental conditions (Felton et al., 2007;Wassmer et al., 2013;Ross et al., 2014;Smets et al., 2016).

150
The landslide inventory is an update of the inventory compiled by Depicker et al. (2020). Moreover, we differentiated between the processes and timing of landsliding. We strongly relied on three image products: • A careful and detailed 3D (elevation exaggeration of 1) visual interpretation of © Google Earth images, which provides a complete coverage of the region at a very high spatial resolution (~0.5 m), often multi-temporal (Depicker et al., 2021b);

155
• The interpretation of two hillshade images derived from a TanDEM-X digital elevation model (DEM) provided at 5 m resolution and covering most of the region (Albino et al., 2015;Dewitte et al., 2021). The hillshade images were produced with a sun elevation angle of 30° and sun azimuth angle of 315° and 45°; • The stereoscopic analysis of one single cover of historical panchromatic photographs acquired during 160 the 1955-1958 period at the scale ~1/50,000 (i.e. about 1 m spatial resolution on the ground); the photographs are conserved at the Royal Museum for Central Africa (RMCA, Belgium).
The historical aerial photographs allowed to differentiate between old deep-seated landslides (i.e. landslides with an unknown time of origin that can be identified on the photographs) and recent deep-seated landslides that have 165 occurred during the last 60 years (i.e. after the acquisition of the photographs). The aerial photographs were not used for mapping shallow landslides since this inventory would be biased. Indeed, the spatial resolution of the photographs is twice lower than that of the images in © Google Earth. Furthermore, the photographs provide a single temporal cover, whereas the multi-temporal © Google Earth images cover information for an imagery range of up to 13 years, i.e. the age difference between the oldest and youngest image. (e.g. Minova, Kalehe, Matanda 170 in Fig. 1: Depicker et al., 2021b).
For the recent landslides, the distinction between deep-seated and shallow landslides was made by visually estimating the relative landslide depth from © Google Earth and TanDEM-X hillshade images (Depicker et al., https://doi.org/10.5194/nhess-2021-336 Preprint. Discussion started: 17 November 2021 c Author(s) 2021. CC BY 4.0 License. 2020; Dewitte et al., 2021). In the literature, a landslide is usually defined as shallow when the depth of its surface of rupture ranges between 2 to 5 m (Keefer, 1984;Bennett et al., 2016;Sidle and Bogaard, 2016). Here, landslides 175 with a depth < 5 m were considered as shallow. The landslides occurring in mining and quarrying sites were all classified as mining landslides, regardless of their depth. A specific attention was also given to the landslides occurring along roads.
Six field surveys were conducted over the period 2016 to 2019 to validate the inventory and get extra information on the landslide timing and their causes and triggers. The work was carried out by selecting representative areas 180 with various landslide and landscape characteristics, while taking into account accessibility and safety issues. We also used information from media and grey literature (student theses, field reports from local research, and academic institutions and the civil protection).
The frequency of landslide surface area distributions were analyzed to check the completeness of the inventory and also enable comparison with other inventories in different environments; if the area frequency density can be 185 properly fitted to an inverse Γ distribution, it is considered representative of the study area (Malamud et al., 2004). A bad fit could suggest that the inventory is biased and/or incomplete. Indeed, the use of several data sources in the inventory could bias the distribution of landslides, especially bearing in mind the limitations related to the interpretation of satellite images (Guzzetti et al., 2012). We performed this analysis separately for different subsets of the inventory: all landslides, old and recent deep-seated landslides, shallow landslides, mining landslides (that 190 also includes landslides associated with quarrying) and road landslides. The analysis of the frequency area distributions for the different shallow landslide populations defined according to the LULC and its dynamics was also used to infer about differences in environmental characteristics and slope failure mechanisms (Malamud et al., 2004;Van Den Eeckhaut et al., 2007;Guns and Vanacker, 2014;Tanyaş et al., 2018). Box-plots complemented the shallow landslide area analysis. The Wilcoxon rank sum comparison test was used for assessing the 195 independence between the landslide populations.
Since the extent of the study area is relatively small when considering regional climatic characteristics and given that the time window of the shallow landslide inventory is limited to a few years, the location and properties (extent, number of occurrences) of shallow landslide clusters depends strongly on the stochastic nature (location, extent and magnitude) of the triggering rainfall and less on local terrain conditions. The consideration of all 200 landslides of such a cluster could bias the analysis by giving an excessive weight to the local terrain conditions (Depicker et al., 2020). Thus, for the susceptibility analysis (see Section 3.2), we retained a maximum of 30 landslides per cluster, randomly sampled.

Multi-decadal forest dynamics
LULC and its dynamics can influence the prevalence and characteristics of shallow landslides (Sidle and Ochiai, 205 2006;Sidle and Bogaard, 2016). In the study area, the agricultural land use is complex (multiple cropping, multilayer farming) and dynamic due to crop rotations and associations, shifting cultivation, and the bimodal annual rainfall pattern (Heri-Kazi and Bielders, 2021). A detailed regional land use mapping serving as input in our analysis is therefore not feasible (e.g. Jacobs et al., 2018). However, the dynamics of the forest can be better constrained. Here, to complement the analysis conducted by Depicker et al. (2021b; see Section 2.1) that focused 210 on the impact of deforestation on shallow landslides over the last 20 years, we reconstructed the forest dynamics over the last ~60 years. We used 1 m resolution orthomosaics generated from the RMCA's aerial photographs of the years 1955-1958 and according to the photogrammetric processing described in Depicker et al.(2021a) and Smets et al. (to be submitted). The forest areas were delineated visually. The 2016 forest cover was extracted from the continental ESA CCI land cover model which is available at a 20 m resolution (ESA, 2016) and has an accuracy 215 of roughly 86 % in the region (Depicker et al., 2021b).

Landslide susceptibility and distribution analysis
We applied the logistic regression (Hosmer and Lemeshow, 2000) and the frequency ratio (Lee and Pradhan, 2007) models to analyze the susceptibility and distribution of the landslides, respectively, in order to better understand 220 how they are distributed across different landscapes and how natural and human environmental factors contribute to their occurrence. The analysis was carried out with a distinction between shallow landslides and old deep-seated landslides. The analysis was done at the scale of one point (pixel) per landslide to avoid spatial autocorrelation (e.g. Jacobs et al., 2018;Kubwimana et al., 2021). The point is manually positioned in the center of the landslide's https://doi.org/10.5194/nhess-2021-336 Preprint. Discussion started: 17 November 2021 c Author(s) 2021. CC BY 4.0 License. trigger area. For deep-seated landslides, a point outside the trigger area where topography does not appear to have 225 been disturbed by the instability is considered for the calculation of the slope associated with the landslide origin. Table 1 presents the 10 predictor variables used for the susceptibility and frequency ratio analyses and the ancillary data from which they are derived. We used eight predictors that can be considered as natural factors that influence landslide occurrence: elevation, slope angle, planar curvature, profile curvature, topographic wetness index (TWI), 230 aspect, lithology, and distance to faults. Although these predictors are commonly used (Reichenbach et al., 2018), it is worth specifying that, here, elevation is used as proxy for climatic conditions, namely orographic rainfall and the probability of thunderstorms, as the resolution of regional-climate derived products is too low (at least 2.8 km) to accurately capture the effect of elevation on rainfall (Monsieurs et al., 2018a;Van de Walle et al., 2020;Monsieurs, 2020;Depicker et al., 2021b). Distance to fault is used to determine the possible contribution of seismic 235 activity in the occurrence of deep-seated landslides not only as a triggering factor (e.g. Keefer, 1984), but also as a rock weathering factor (Vanmaercke et al., 2017). Using the fault pattern is the most appropriate option to tackle the seismic zonation context since the most detailed seismic hazard assessment for this part of the continent is at a spatial resolution of 2.2 km; i.e. at a resolution that is too coarse for our study (Delvaux et al., 2017). 1955-58 that have disappeared in 2016. Since it is impossible to identify for each portion of the landscape the exact cause of forest loss, this class contains a mix of various forest management practices and other causes of forest cut/removal. The forest gain class represents the new forest that has appeared since 1955-58. Similarly, the causes associated with the occurrence of new forest are not exactly known; afforestation and natural forest regeneration being certainly drivers at play. Permanent anthropogenic environment (e.g. cropland, grassland, built-up lands) 250 means that the landscape was not forested in both dates and it is assumed that it remained so during that period.

Predictor variables
OpenStreetMap was used to retrieve the main roads in the study area. Using the historical photographs, we observe that the main roads date back to the colonial times and that no major changes in the network have occurred over the last 60 years. The few recent landslides that are observed in the field along these roads confirm the assumption that the direct impact of the main roads on the occurrence of recent landslides is limited. These landslides are 255 clearly linked to the road cut topography, i.e. topographic conditions that cannot be constrained at the resolution of the SRTM elevation data (1" or roughly 30 m). They are often of very limited size, i.e. at a size that is too small to be features that can be identified in © Google Earth in a consistent manner. For our study, the distance to roads is taken as a proxy for human settlement, trail density, and intensity and diversity of agricultural practices. Since motorized transportation means are very limited in the region, the population growth, the expansion of villages 260 and the agricultural activities are indeed highly associated with the main road networks.
Prior to analysis, the predictor variables were resampled at the resolution of the SRTM elevation data, a resolution that provided the best results in similar regions (Jacobs et al., 2018). The association between the dependent variable and each predictor variable was tested using the Pearson 2 test at a 95 % level of confidence (Van Den Eeckhaut et al., 2006;Dewitte et al., 2010). The predictors were tested for multicollinearity, variables with variance 265 inflation factor (VIF) > 2 being excluded from the analysis (Van Den Eeckhaut et al., 2006;Dewitte et al., 2010). The flat areas (slope angle < 1°) that are spread across the region were not excluded from the analysis since their total extent is limited and their impact on the inflation of susceptibility model performance would be minor (Brenning, 2012;Depicker et al., 2020).
For the analysis of deep-seated landslides, the predictor variables associated with anthropogenic activities were 270 excluded. For the shallow landslides, the 'distance to faults' variable was also excluded. As explained earlier, the shallow landslide inventory represents a narrow time window of observation. As such, the spatial distribution of the shallow landslides could be biased by the stochastic pattern of the recent heavy rainfall events and anthropogenic disturbances rather than being the reflect of the longer-term impact of weathering conditions associated with seismicity.

Logistic regression
Logistic regression is used to describe the relationship between a binary dependent variable (the presence or absence of landslides) and one or more independent predictor variables (Hosmer and Lemeshow, 2000). Hence, the logistic regression does not only require landslide data, but also non-landslide data. We sampled this nonlandslide data by generating a number of random points that is equal to the number of landslides in the inventory 280 in order to avoid prevalence (Hosmer and Lemeshow, 2000). Non-landslide points were randomly generated outside a 40 m buffer zone around landslide areas. The basic equation for logistic regression is: where P is the likelihood of landslide occurrence and takes values between 0 and 1, α is the intercept of the model,

285
Xi represents i-th of n predictors, and the accompanying coefficient that has to be fitted to the data.
Calculations were performed in an RStudio environment version 1.4.1717 with LAND-SE software (Rossi and Reichenbach, 2016). In order to be considered in the final logistic regression equation, continuous variable coefficients needed to be significant at the 95 % level of confidence (e.g. Jacobs et al., 2018). For categorical 290 variables, as soon as one dummy variable was significant, all other dummy variables were included in the model (e.g. Depicker et al., 2020). The quality of the models was judged by (i) the prediction rate (e.g. Depicker et al., 2020), (ii) a visual inspection of the susceptibility maps after reclassifying each map into four classes of increasing susceptibility that cover 40 %, 30 %, 20 %, and 10 % of the study area, and (iii)  AUC = 0.5 shows that the model performance is equivalent to random classification, while an AUC = 1 indicates a perfect classification (Hosmer and Lemeshow, 2000). Training and validation datasets were taken in the proportions of 70 % and 30 %, respectively (Broeckx et al., 2018;Fang et al., 2020).

300
We assessed the importance of each individual predictor for the logistic regression in two ways. First, we calculated the AUC for landslide susceptibility models that only relied on the considered predictor, to assess the extent to which this predictor can be used to differentiate between landslide and non-landslide locations. This allowed to quantify for the contribution of each variable to the susceptibility model (Depicker et al., 2020). Note that only predictors with AUC values between 0.5 and 1.0 were retained for the logistic regression models. A second way 305 to determine the impact of the predictors was the analysis of the odds ratio (OR). The OR of a predictor expresses how a change of a predictor value translates into an increase/decrease in the odds of landsliding, whereby the odds of landsliding is calculated as 1− (see Eq. (1)). The ORi of predictor i is calculated as: whereby is the coefficient of predictor i, and is the increase in predictor . For continuous variables an 310 arbitrary but realistic value for is chosen. For the dummy variables, equals 1. For the categorical variables, the OR for each dummy reflects an increase or decrease relative to the reference variable (Kleinbaum and Klein, 2010).

Frequency ratio
The frequency ratio model considers each landslide predictor variable individually and classifies its values into a 315 set of bins (Lee and Pradhan, 2007;Kirschbaum et al., 2012). The value of a frequency ratio indicates for each bin of the predictor variable the probability of occurrence of a landslide. The frequency ratio is calculated as: where Fr cb is the frequency ratio value for a bin = (1,2, … , ) of a predictor variable = (1, 2, … , m), is 320 the cumulative landslide area within bin of predictor , is the cumulative landslide area in the entire study area, is the area attributed to bin of predictor , and A is the total extent of the study area.

Landslide inventory
Overall, 2730 landslides were mapped from the image analysis over the study area ( Fig. 2a; Table 2), which is an 325 extension of 326 % compared to the inventory of Depicker et al. (2020). The landslides are diverse in terms of size, age and type (Fig. 3). The inventoried landslides cover ~3 % of the study area. The largest landslide is old and deep-seated (426.4 ha), while the smallest detected landslide is shallow (16 m 2 ). The landslides are grouped into five categories ( Fig. 2a; Table 2): • Old deep-seated landslides represent 45,5 % of the inventoried landslides and cover 93 % of the total 330 landslide affected area; • Shallow landslides represent 40.4 % of inventoried landslides, but represent only 2.7 % of the total affected area. These landslides are all recent; • Recent deep-seated landslides represent a small percentage of landslides (5.8 %) but cover an area (2.9 %) similar to shallow landslides;

335
• Mining landslides (that also include quarrying landslides) represent 5.6 % of the inventoried landslides and cover 1.2 % of the total landslide affected area. Mining landslides are considered as one type, regardless of their depth; construction and altered rainwater drainage. The old deep-seated landslides located close to roads were retained in the old deep-seated landslide group because their timing is likely to precede road construction. Several clusters of shallow landslides related to heavy convective rainfall events have occurred in recent years.

350
One of the clusters is related to the Kalehe rainstorm of October 2014 (Fig. 2a: event 2; Fig. 3a) reported by Maki Mateso and Dewitte (2014). This event triggered 634 shallow landslides, 346 of them being connected to talwegs and providing materials to 17 debris flows. Ten debris flows were particularly destructive and deadly when they reached villages on the shores of Lake Kivu (Maki Mateso and Dewitte, 2014 populations indicated that the shallow landslides that are not associated with these clusters are also rainfalltriggered.  Landslide mapping was largely done using © Google Earth; the TanDEM-X hillshades being useful to confirm the identification of about one fifth of the old deep-seated landslides (Table 2). Fieldwork carried out to validate 786 landsides showed that they were identified with a precision of more than 96 % (Table 3). Old deep-seated landslides and shallow landslides were mapped with the highest accuracy. Mining landslides were mapped with a 370 lower accuracy due to the difficulty of differentiating between landslide processes and anthropogenic soil disturbance in © Google Earth imagery. The field validation allowed to also map an extra 126 landslides (Fig. 2b) that could only be identified in the field (Table 3). For the old deep-seated landslides, this represents an extra 25% of observations. Nevertheless, landslides identified only in the field were not considered in the analysis to avoid biases due to overrepresentation.

375
Each debris flow is connected to up to hundreds of shallow landslides that act as source areas. A clear distinction was made between theses source areas and the debris flow path and deposition areas (Fig. 3a). Out of a total of the 184 debris flows identified from the images, 90 with a length-to-width ratio > 50 were excluded from the analysis since they show greater similarities to debris-rich floods than to the other landslides present in the region (Malamud et al., 2004). Nevertheless, the shallow landsides acting as source areas were kept in the analysis. Also, 22 very 380 large, old, deep-seated landslides were excluded from the analysis because they have complex main scarps where it is difficult to determine the pixels that best represent the natural conditions of occurrence. Overall, from the 2730 landslides identified from the images, 2618 landslides were used for the subsequent analysis.  The inverse Γ distribution fits well the distributions for all the subsets of the inventory, except recent deep-seated and mining landslides (Fig. 4a,c). There is also a good fit with this inventory, which supports its use for further susceptibility analysis. The Wilcoxon rank comparison test confirms significant statistical differences (p-value < 0.05) among the area distributions (Fig. 4b).    In 1955-58, 42 % of the territory was already deforested (Fig. 5a). From 1955-58 to 2016, the loss of forest continued, the forest cover decreasing from 58 % to 24 % of the study area. The area affected by the forest loss over the last 60 years is larger than the remaining permanent forest (Fig. 5b).

415
72 % of the shallow landslides are found in areas of forest loss (Fig. 6). The landslides in the permanent anthropogenic environment have the largest mean area, followed by the landslides in permanent forest, and the landslides in areas of forest loss. In forest gain zones, landslides are on average the smallest. The Wilcoxon rank comparison test confirms significant statistical differences (p-value < 0.05) among the landslide area distributions. The same differences are also confirmed for the landslide slope distribution (Fig. 6b). In permanent forest areas, 420 shallow landslides occur on steeper slopes compared to shallow landslides in anthropogenic environments (Fig.  6b). The analysis of the completeness of the inventory (Fig. 6b,d) shows that an acceptable distribution emerges for each category of shallow landslides except for the landslide inventory in permanent forest minus event (Fig.  6b).

Landslide susceptibility and distribution analysis
The Pearson 2 tests confirm the association between the dependent variable and each predictor variable at a 95 430 % level of confidence. There was not multicollinearity between the predictors (VIF < 2) retained for this study. Depicker et al. (2020) assessed the impacts of the size of the landslide training dataset to calibrate a landslide susceptibility model. They showed that the quality of a susceptibility assessment is questionable if the number of landslides is too small. In view of the low number of recent deep-seated, mining, and road landslides in the present study (Table 3), we did not calibrate susceptibility models from these three types of landslides. Instead, we tested 435 these inventories against the two susceptibility models computed from the shallow and/or old deep-seated landslide datasets (Fig. 7). The two susceptibility models of shallow and old deep-seated landslides show similar AUC and prediction rates (Figure 7). The spatial patterns of the susceptibility values of the two models are quite different as it reflects the 445 differences in the importance of the predictors included in the assessment (Table 4, Table 5). Table 4 shows that for shallow landslides, anthropogenic variables (forest loss, distance to roads and permanent anthropogenic environment) have a great influence on their occurrence. In contrast, continuous topographic variables and distance to faults are the most important for deep-seated landslides. For both landslide types, slope angle and elevation also have a great influence on their occurrence. Variables related to slope aspect and lithology have lower importance 450 for both types of landslides. Coefficient included in the logistic regression model = *p-value < 0.05, ** p-value < 0.01, *** p-value < 0.001

460
The odds ratios of the significant predictors for the two susceptibility models allow for an assessment of their relative importance (Table 5). Forest loss has a large influence on the occurrence of shallow landslides as deforestation increases the odds of landsliding by a factor 2.5 (Tables 4 & 5). However, anthropogenic environments (which were deforested before 1955-1958) appear to be less landslide-prone than permanent forest.

465
Slope is similarly important for the prediction of both types of landslides (Table 4) but has a slightly larger impact on the odds of deep-seated landsliding that on the odds of shallow landsliding (Table 5). Slope aspect has a greater impact on the occurrence of shallow landslides than for old deep-seated landslides. It appears that the plan curvature reduces the occurrence of shallow landslides while it affects the occurrence of old deep-seated landslides. The effect of lithology is also different for shallow and deep-seated landslides. For shallow landslides, the gneiss 470 and micaschists are most landslide-prone and the lowest susceptibility is associated with black shales, tillite and old basalts. For deep-seated landslides, black shales, tillite and old basalts favour landslides while gneiss and micaschists do not. The variables 'distance to roads' and 'distance to faults' have a significant but rather limited impact on shallow and old deep-seated landslides, respectively.
The susceptibility model for shallow landslides was tested against the mining and road landslide datasets, while 475 the old deep-seated landslide model was tested against the mining, road and recent deep-seated landslide datasets. Results indicate that mining and road landslides are poorly predicted using the shallow landslide model (Fig. 7c). Recent deep-seated landslides are reasonably well predicted using the old deep-seated landslide model, whereas prediction of road and mining landslides using the same model is also poor, although less problematic for the mining landslides (Fig. 7d).  The frequency ratio analysis shows that slope angle is an important driver for shallow landslides as well as for old deep-seated landslides (Fig. 8a,b). Figure 8c shows a trend in the landscape of increasing slopes and forest loss and decreasing forest cover with increasing elevation. The decrease in forest cover at high altitudes is also 490 associated with a natural change of the vegetation: bamboo vegetation is found at 2300-2600 m asl and subalpine vegetation such as ferns occur at 2400-3300 m asl (Mokoso et al., 2013;Cirimwami et al., 2019). We observe that at higher elevations (> 2000 m), shallow landslides occur more frequently, and this can be explained by the drastic forest loss and steeper slopes associated to these elevations (Fig. 8c). Deep-seated landslides are also favoured by steeper slopes and higher elevations. Regarding the dynamics of forest cover (Fig. 8e), the occurrence of shallow 495 landslides is favoured in the deforested areas.

Landslide types and completeness of the inventory
We used a combination of © Google Earth imagery, TanDEM-X hillshades, historical aerial photographs, and 500 field work to compile an extensive and comprehensive inventory of recent and historical landslides. Field validation of more than 25% of the inventory showed that more than 96 % of the inventoried landslides were classified with precision into five types (old deep-seated, recent deep-seated, shallow, road-related, or miningrelated). Nevertheless, despite this high performance, we are aware that the inventory is still incomplete. This is particularly the case for the shallow landslides because their inventory covers a maximum period of 13 years that correspond to the imagery range available in © Google Earth. Furthermore, their scars can quickly be altered by natural vegetation regrowth, land reclamation and erosion (Malamud et al., 2004;Van Den Eeckhaut et al., 2007;Kubwimana et al., 2021). In addition, small landslides frequently happen unnoticed at the resolution of the satellite images (Guzzetti et al., 2012). Finally, field validation showed that a significant proportion of old deep-seated landslides can be missed from image analysis. This is because identifying the exact limits of the failed mass may 510 not be easy for old deep-seated landslides, particularly in forest areas (Malamud et al., 2004). While building the inventory, we remained conservative and mapped only the features for which we had high confidence. As the protocol for landslide identification over the whole region was uniform and the number of identified landslides relatively important, we trust that the inventory is reliable and representative enough for the analysis.
The frequency area distributions of all landslides types (Fig. 4a,c), with the exception of recent deep-seated and 515 mining landslides, are similar to what has been observed in other parts of the world (e.g., Malamud et al., 2004;Guns & Vanacker, 2014;Jacobs et al., 2017;Depicker et al., 2020). For the recent deep-seated landslides, an overrepresentation is noticed at the level of the smallest landslides and the rollover is absent. Since the spectral signature of these landslides is pronounced, we cannot invoke here a problem of subjectivity in the mapping. Additionally, we can give a high trust in the completeness of the inventory as evidenced by field validation that 520 showed that almost no landslides were missed (Table 3). Therefore, we posit that this divergence in size is related to a lower influence of successive slope failure in the increase of landslide area through time; in other words, recent landsides did not have the time to growth (Tanyaş et al., 2018). This process of successive failures has been well documented for the Ikoma landslide, south of Bukavu (Figure 1b; Dille et al., 2019). The distribution of the mining landslides is irregular and different from what is typically observed, with a rollover that is flattened and a sudden 525 increase in the frequency of the smallest slope failures. Similarly, to the inventory of the recent deep-seated landslides, the completeness and the reliability of the mapped features cannot be much questioned. We suggest that this unusual area distribution is the result of the human-induced alteration of the environmental conditions (see Section 4.4). To our knowledge, there are no similar studies that have been carried out on artificial mining slopes. Further investigations on other cases would be needed to verify our hypothesis.

530
The presence of a rollover in the frequency-area distribution of the shallow landslides in the anthropogenic environment (Fig. 6b,d) is in opposition to what we could have expected considering the study by Van Den Eeckhaut et al. (2007). Although this study by Van Den Eeckhaut et al. (2007) was also conducted in a populated rural environment, they did not find a positive power-law relation for the smaller landslides which is separated from the larger landslides by a rollover. This difference probably lies in the fact that our study area is much more 535 landslide-prone. The research by Van Den Eeckhaut et al. (2007) was indeed carried out in a hilly region of Belgium where the temperate climate is much less favourable to the yearly occurrence of shallow landslides, especially when they occur in clusters associated with intense convective rainfall. When the weight of the landslide clusters is removed from our inventory, i.e. when a maximum of 30 landsides per cluster is considered (see Section 2.1), the rollover of this distribution is also present (Fig. 6d). Furthermore, the fact that our inventory covers a 540 smaller time period than that of Van Den Eeckhaut et al. (2007), that our region is not altered by mechanized farming, and that human activities such as works associated with building and road construction and drainage systems are much less present, i.e. factors that are highlighted as causes of landslides in Belgium, are issues that can also be invoked to explain this divergence in the frequency area distribution of shallow landslides.
Under permanent forest, we do not observe a rollover point in the shallow landside distribution, (Fig. 6b). We

545
hypothesize that the smallest landslides may be hidden under the canopy and therefore less visible on satellite images. A second explanation is that the presence of trees and their roots increases slope stability and therefore the minimal critical area for landsliding (Milledge et al., 2014).

550
The old deep-seated landslide susceptibility model is the first model proposed for the region that focuses only on deep-seated processes. The model shows a good quantitative prediction performance, both in terms of AUC and prediction rate. The model shows that the hillslope curvature (planar or profile), elevation, distance to faults, slope angle, and TWI are the most important predictive factors. In other words, terrain morphology and seismic activity seem to play a dominant role in deep-seated landslide activity in the study area. The frequency ratio analysis ( Fig.   555 8b,d) further supports this as it highlights the association of landslides with steep slopes (> 25°) and higher elevations, i.e. in topographic contexts nearer to the ridge crests that are known to amplify seismic shaking https://doi.org/10.5194/nhess-2021-336 Preprint. Discussion started: 17 November 2021 c Author(s) 2021. CC BY 4.0 License. (Meunier et al., 2008). The role of elevation as a driver of more humid conditions should, however, not be ignored as rainfall is also known to trigger deep-seated landslides (LaHusen et al., 2020). Also, the role of the long-term weathering of the landscape and the occurrence of non-triggered landsides should not be underestimated (Dille et 560 al., 2019). Lithology is of lesser importance in our study area; which is in agreement with the findings of Depicker et al. (2021b) that show that the various lithologies in the region have similar rock strength properties.
The lower prediction rate of the recent deep-seated landslides using the old deep-seated landslide model could be related to the fact that the observations are made on a period that is too short to apprehend the full panel of 565 environmental conditions that led to old deep-seated landslides. For example, no earthquake-induced recent deepseated landslides were observed (Dewitte et al., 2021), whereas seismicity is an important component of the old deep-seated landslide model. In addition, the climatic and seismic conditions have evolved over the past tens of thousands of years (Felton et al., 2007;Wassmer et al., 2013;Ross et al., 2014;Smets et al., 2016). For example, the region experienced an abrupt shift from drier conditions to more humid conditions around 13,000 BP (Felton 570 et al., 2007;Wassmer et al., 2013). In addition, about 10,000 BP, Lake Kivu water highstands were ~100 m above the current level, which could have triggered few large landslides (Ross et al., 2014;Dewitte et al., 2021). This change in the lake level was not only due to a shift in the climatic conditions but also to the formation of the Virunga Volcano Province that created a dam on the upstream part of the Rift basin that used to drain northwards (Figure 1b; Haberyan and Hecky, 1987). During that period of volcano formation, the regional geodynamics and 575 the seismicity pattern were different (Smets et al., 2016). Hence a large part of the old deep-seated landslides may have been triggered under different conditions (Dewitte et al., 2021).
Old and recent deep-seated landslides differ also in terms of size (Fig. 4). There have not been any major events during the past 60 years that caused large landslides comparable to the largest old deep-seated landslides (of area 580 10 6 m 2 ). We identify five possible factors to explain this difference. First, our window of observation is too narrow to apprehend the impact of forcing events of high-magnitude such as large earthquakes (Marc et al., 2019). Second, the past environmental conditions may have been more favourable to large slope failures. A third factor explaining the size difference between old and recent deep-seated processes is that larger landslides are less frequent but have a longer-lived morphology legacy; therefore smaller old deep-seated landslides may no longer be visible. The

585
fourth factor is that old landslides have a size that is the legacy of a history of phases of slope deformation, and not one single slope failure (Tanyaş et al., 2018) as evidenced in the analysis of the nearby Ikoma landslide (Fig. 1b;Dille et al., 2019). Fifth, amalgamation must not be excluded (Marc and Hovius, 2015), especially for the eldest features. Overall, our current knowledge does not allow to give more credit to one factor in particular. The common sense is certainly to assume that the difference in landslide size is the reflection of a combination of 590 factors.

Drivers of shallow landslides
Rainfall is the trigger of the shallow landslides that we have identified in this study, which is in agreement with 595 the other studies in the region (Dewitte et al., 2021;Kubwimana et al., 2021). The spatial distribution of shallow landslides differs strongly from the distribution of deep-seated landslides. This is mainly due to the anthropogenic factors such as deforestation that influence shallow processes (Table 4). The regional susceptibility model also indicates that deforestation is the most important factor in their occurrence (Table 5). Similarly, the analysis of frequency ratios shows that landslides disproportionately occur within areas that were deforested in the past 60 600 years, demonstrating the role of the forest in slope stabilization (Grima et al., 2020). Shallow landslides in forest loss areas (Fig. 6a,b) have, on average, a smaller size compared to landslides in forest. This observation is in line with the findings of Depicker et al. (2021b) and is attributed to the decrease of regolith cohesion due to forest loss, which allows for a smaller minimum critical area for landsliding (Milledge et al., 2014). In short, human-induced land cover change is associated with an increase in the number of landslides and 605 a shift of the frequency-area distribution towards smaller landslides (Guns and Vanacker, 2014).
In permanent anthropogenic environments (Fig. 6a,c), shallow landslides are less frequent, larger, and occur on less steep slopes as compared to shallow landslides in forest. Firstly, the steepest slopes in the anthropogenic environments have been subject to increased landslide erosion the first few years after the original forest cover 610 was removed (prior to 1955-1958) (Depicker et al., 2021b). As a result, we can assume that steep slopes in anthropogenic environments have less regolith available for landsliding compared to steep slopes in permanent forest areas. This process of regolith depletion is further exacerbated in cropland. Wilken et al.(2021) have measured in the region that erosion in cropland sites can reach up to about 40 cm in 55 years. Similarly, Heri-Kazi and Bielders (2021a) measured mean erosion rates of the order of 11 mm/year on cropland. Regolith erosion has 615 therefore the consequence of reducing the spatial extent of areas where landslides can occur. A second process that may explain the landslide pattern in the anthropogenic environments is that, in parallel to regolith erosion, one also has sedimentation and the formation of colluvium (Wilken et al., 2021); which results in local accumulation of material. The material forms a loose sedimentary deposit usually in places with lower slope angles. This could be extra material available for the formation of landslides. Hence, we have less areas available for landslides, but 620 a concentration of the susceptible places. A third explanation is probably related to soil management practices that influence erosion and water infiltration. In the region, usually on the less steep terrain, drainage ditches that favour water infiltration and hence an increase in pore-water pressure are widely applied by farmers (Heri-Kazi and Bielders, 2021b).

Drivers of mining landslides and road landslides
Mining and road landslides are poorly predicted by the shallow and old deep-seated landslide susceptibility models (Fig.7), showing that they respond to different environmental factors. Road construction and mining activities are commonly associated with the presence of slope cuts and an increase of slope angle. These altered local 630 topographic conditions cannot be constrained in the covariates derived from the SRTM or similar available products. In addition, the disturbances induced by roads and mining activities are not limited to the sole change of slope angle conditions. For example, this also implies changes in water runoff and infiltration, debrutressing, presence of fills and eventual overloading, excess stress from engine/digging, i.e., conditions that can influence the size and frequency characteristics of landslides (Brenning et al., 2015;Arca et al., 2018;Froude and Petley, 635 2018;McAdoo et al., 2018;Vuillez et al., 2018).
Road landslides are mostly shallow. While it is obvious that roads create favourable conditions for the initiation of landslides, as observed in other studies in the region (Dewitte et al., 2021;Kubwimana et al., 2021), an accurate spatio-temporal regional pattern of these human-induced slope failures cannot be assessed here. A substantial proportion of road landslides can only be observed in the field (Table 3). In addition, landslides along roads can 640 easily disappear due to maintenance works. Furthermore, many of the main roads were already present in the 1950's, their current impact therefore being altered.
Overall, mining conditions seem to lead to landslides whose smallest features are more frequent than what would occur under natural conditions as attested in the frequency area distribution (see Section 4.1). The area of mining landslides is significantly larger than that of road landslides and their regional distribution is slightly more in 645 agreement with the characteristics of deep-seated landslides (Fig. 7d), which is logical as mining activities are related to the lithological characteristics of the landscape.
Considering the recent development of the mining activities in the region (Butsic et al., 2015;Tyukavina et al., 2018;Musumba Teso et al., 2019), we can assume with confidence that the associated landslides represent slope instabilities that have occurred over a period of about 20 years whereas the recent deep-seated landslides represent 650 slope failures that have occurred over the last 60 years. The distribution of the mining landslides is also restricted spatially to some lithologies. With these specificities in mind and the fact that the number of inventoried mining and recent deep-seated landslides is relatively similar, respectively 152 and 159 (Table 2), this study confirms that mining activities increase the odds of landsliding. It has implication not only in terms of hazard assessment but also in assessing the population at risk, knowing that mined sites are populated. This is to be put in parallel with 655 the findings of Depicker et al. (2021a) that show that the risk of shallow landslides has increased significantly in the region during the last decades in the places where mining activities are found due, notably, to an increase in population.

Conclusions
Our study improves the understanding of landslide processes and the human impact thereon in tropical rural 660 mountainous environments. The use of several sources of data allowed to build a very detailed landslide inventory in time and space for the region. This inventory enabled the grouping of landslides into five types: old and recent deep-seated landslides, (recent) shallow landslides, mining landslides and road landslides. Among deep-seated landslides, historical aerial photographs from the 1950's were an added value in the sense that they were used for differentiating between old and recent slope processes. We deduce the differences in the driving factors and area 665 distribution for old and recent deep-seated landslides, suggesting that factors of landslide occurrence are either different or change over time depending on geodynamic and/or climatic conditions. The role of anthropogenic factors has been established in the occurrence of shallow landslides. Deforestation initially increases landsliding, but in the long term, when forest is permanently converted into agricultural land, landslide frequency appears to be lower compared to permanent forest lands. However, the exact impact of forest and forest cover changes 670 depends on topographic conditions. The factors of occurrence of mining landslides significantly increase landsliding in areas that, under natural conditions, would be less prone to slope failures. The importance of human activities needs to be considered when investigating landslide occurrence in regions under anthropogenic pressure. Our analysis also demonstrates the importance of considering the timing of landslides in susceptibility and distribution assessments.

685
The authors declare that they have no conflict of interest.