Geologic and geodetic constraints on the seismic hazard of Malawi’s active faults: The Malawi Seismogenic Source Database (MSSD)

Active fault data are commonly used in seismic hazard assessments, but there are challenges in deriving the slip rate, geometry, and frequency of earthquakes along active faults. Herein, we present the open-access geospatial Malawi Seismogenic Source Database (MSSD), which describes the seismogenic properties of faults that have formed during East African rifting in Malawi. We first use empirical observations to geometrically classify active faults into section, fault, and 20 multi-fault seismogenic sources. For sources in the North Basin of Lake Malawi, slip rates can be derived from the vertical offset of a seismic reflector that is estimated to be 75 ka based on dated core. Elsewhere, slip rates are constrained from advancing a ‘systems-based’ approach that partitions geodetically-derived rift extension rates in Malawi between seismogenic sources using a priori constraints on regional strain distribution in magma-poor continental rifts. Slip rates are then combined with source geometry and empirical scaling relationships to estimate earthquake magnitudes and recurrence 25 intervals, and their uncertainty is described from the variability of outcomes from a logic tree used in these calculations. We find that for sources in the Lake Malawi’s North Basin, where slip rates can be derived from both the geodetic data and the offset seismic reflector, the slip rate estimates are within error of each other, although those from the offset reflector are higher. Sources in the MSSD are 5-200 km long, which implies that large magnitude (MW 7-8) earthquakes may occur in Malawi. Low slip rates (0.05-2 mm/yr), however, mean that the frequency of such events will be low (recurrence intervals 30 ~10-10 years). The MSSD represents an important resource for investigating Malawi’s increasing seismic risks and provides a framework for incorporating active fault data into seismic hazard assessment in other tectonically active regions. https://doi.org/10.5194/nhess-2021-306 Preprint. Discussion started: 16 November 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction
Earthquake hazards are most frequently quantified as the probability of exceeding a specific ground motion intensity in a given time period through probabilistic seismic hazard analysis (PSHA; e.g., Cornell, 1968;Gerstenberger et al., 2020;35 McGuire, 1995). The main components of a PSHA are seismogenic sources, which cumulatively describe the magnitude and frequency of earthquakes within the assessed region, and a ground motion model, which describes the ground motion intensities earthquakes will induce. Typically, seismogenic sources were developed by considering the historical and instrumental records of earthquakes, however: (1) the relatively short duration of these records imply they are not necessarily representative of a region's long term seismicity (e.g., Hodge et al., 2015;Stein et al., 2012), and (2) it is often unclear how 40 this record should be used to parameterize a seismogenic source's spatial extent (Helmstetter and Werner, 2012) and maximum expected earthquake magnitude (Poggi et al., 2017). Therefore, it is now common to also incorporate fault-based seismogenic sources into PSHA, which combine geologic, paleoseismic, and/or geodetic information to describe the magnitude and frequency of earthquakes on known active faults (e.g., Gómez-Novell et al., 2020;Morell et al., 2020;Pace et al., 2016;Pagani et al., 2020;Stirling et al., 2012). 45 To assess earthquake frequency on active faults, an estimate of the earthquake recurrence interval and/or slip rate on each fault is required (e.g., Molnar, 1979;Wallace, 1970;Youngs and Coppersmith, 1985). Typically, slip rates are derived from: (1) planar or linear geologic features that have been offset by a fault and have a known age (McCalpin, 2009) and/or (2) geodetic measurements of surface interseismic strain accumulation using Global Navigation Satellite Systems, and from 50 which fault slip rates are constrained using 1D velocity profiles (Bendick et al., 2000), 2D block models (Wallace et al., 2012;Zeng and Shen, 2014), or by partitioning regional geodetically measured strain across multiple faults (Cox et al., 2012;Williams et al., 2021b). However, whilst geodetic measurements have been made only over the past few decades, offset geologic markers sample the displacement accrued by a fault over timescales of 10 2 -10 5 years. This is problematic as earthquakes along a single fault may temporally cluster (Cowie et al., 2012;DuRoss et al., 2020;Wedmore et al., 2017;55 and the proliferation of seismically vulnerable building stock Giordano et al., 2021;Goda et al., 2016Goda et al., , 100 2021Hodge et al., 2015;Ngoma et al., 2019;Novelli et al., 2019). The geospatial, kinematic, and earthquake source data in the MSSD are freely available, and we suggest that the database will be an important resource for seismic hazard planning in the region.

Tectonic setting of Malawi 105
A ~900-km-long section of the East African Rift's (EAR) Western Branch passes through Malawi (Fig. 1). Geodetic models imply that this section of the EAR accommodates 0.5-1.5 mm/yr ENE-WSW extension between the San and Rovuma plates ( Fig. 1; Wedmore et al., 2021). In central and northern Malawi, the EAR has been flooded by Lake Malawi, whilst in southern Malawi, the rift floor and associated faults are subaerially exposed (Fig. 1b). South of the Rungwe Volcanic Province in southwestern Tanzania, there is no reported surface volcanism and only minor, if any, melts in its lower crust 110 (Accardo et al., 2020;Hopper et al., 2020;Njinju et al., 2019;Wang et al., 2019). The Malawi section of the EAR is therefore considered to be magma-poor.
A multidisciplinary dataset of 113 fault traces was compiled by Williams et al., (2021aWilliams et al., ( , 2021c in the Malawi Active Fault Database (MAFD). The MAFD includes 90 basement-involved faults that were mapped from geological maps, high 115 resolution digital elevation models, and 2D seismic reflection surveys (Scholz et al., 2020;Shillington et al., 2020) and that have demonstrably shown evidence of displacement during EAR activity in Malawi. The remaining 23 faults in the MAFD are buried intrarift faults inferred from aeromagnetic (Kolawole et al., 2018a or gravity data (Chisenga et al., 2019), and hence no definitive evidence of displacements associated with East African rifting, but are well-oriented for reactivation in the regional stress field (Dawson et al., 2018;Williams et al., 2019Williams et al., , 2021c. The MAFD contains basic geomorphic and 120 mapping attributes following the format of the Global Earthquake Model Global Active Faults Database (Styron and Pagani, 2020). In keeping with practice elsewhere (Faure Walker et al., 2021;Styron et al., 2020) the MSSD contains data that are considered to be more subjective and may be liable to change (e.g. earthquake recurrence intervals).
Using the instrumental record of seismicity, PSHA indicates there is a 10 % probability of exceeding (PoE) ~0.15 g in 50 years in Malawi (Poggi et al., 2017). However, when basic geodetic and geologic data were combined to develop seven 155 fault-based seismogenic sources around Lake Malawi, hazard levels around the fault sources were much higher (10% PoE 0.25 g in 50 years), particularly at low probabilities of exceedance and long vibration periods (Hodge et al., 2015). Scenariobased seismic risk assessment indicates that a full MW 7.7 rupture of the Bilila-Mtakataka fault in southern Malawi would result in 160,000-440,000 collapsed buildings (Goda et al., 2021).

The Malawi Seismogenic Source Database 160
The Malawi Seismogenic Source Database (MSSD) is a geospatial database that documents the geometry, slip rate, and seismogenic properties (i.e., earthquake magnitude and frequency) of active faults in Malawi (Fig. 2, Table 1). Each geospatial feature represents a potential earthquake rupture or 'source' and is classified based on its geometry into one of three types: section, fault, or multi-fault. Source types are mutually exclusive, and so if incorporated into a PSHA, they should be assigned relative weightings. The MSSD is comparable to the Database of Individual Seismogenic Sources in Italy 165 (Basili et al., 2008), the Taiwan Earthquake Model (Shyu et al., 2016), or the New Zealand Community Fault Model (Van Dissen et al., 2021). The MSSD is the first seismogenic source database in central and northern Malawi, and represents an update of the South Malawi Seismogenic Source Database (SMSSD; Williams et al., 2021b) because it incorporates new active fault traces Williams et al., 2021c), new geodetic data (Wedmore et al., 2021) and a statistical treatment of uncertainty in the logic tree approach (Sect. 3.4). 170 The MSSD itself consists of two components: (1) a 3D geometrical model of seismogenic sources in Malawi, and (2) the mapped trace of each source, which is associated with the source attributes (Table 1) and on Github (https://github.com/LukeWedmore/malawi_seismogenic_source_database/tree/v1.0). Future iterations will be 175 released on both and so we encourage users to consult these pages for the most up-to-date version. Multifault sources will have multiple ID's.

MSSD Source Length
For each fault trace in the MAFD, we first assess whether it may host shorter discrete along-strike section ruptures, participate in multi-fault ruptures, and/or exhibit a branching geometry. 'Section' sources in the MSSD are bounded by displacement minima along fault strike, or a >20º bend in fault strike at a scale >5 km (Fig. 2), as these features may be indicative of barriers to dip-slip lateral rupture propagation (Biasi and Wesnousky, 2017;Wedmore et al., 2020bWedmore et al., , 2020a. Geometrical complexities that are <5 km long (e.g., relay zone-breaching structures) are interpreted to be 'hard-linking' sections (Peacock et al., 2016), and the insignificant length means they are not considered as distinct sources in the MSSD.
'Fault' seismogenic sources are those that are bounded by the fault tips mapped in the MAFD (Fig. 2). In their compilation of dip-slip surface ruptures, Biasi and Wesnousky, (2017) noted only 10% of earthquakes exhibited branching 'Y' 190 geometries in map view, and the paucity of branching earthquakes is consistent with numerical modelling (Bhat et al., 2007;Geist and Parsons, 2020). Therefore, where we identify fault branches, we consider these as distinct, partially overlapping fault seismogenic sources (Fig. 2). 'Multi-fault' seismogenic sources are identified in the MSSD where the tips of synthetic faults are closely spaced acrossstrike, as this may indicate that these faults interact through soft linkages via Coulomb stress changes (Biasi and Wesnousky, 2016;Hodge et al., 2018b;Mildon et al., 2016). Evidence for this behaviour in Malawi is indicated by the segmented nature 205 of the 2009 Karonga earthquake sequence (Biggs et al., 2010;Fagereng, 2013;Macheyeki et al., 2015) and the bell-shaped along-strike displacement profiles of en-echelon faults in Lake Malawi (Contreras et al., 2000;Mortimer et al., 2016;Shillington et al., 2016). Empirical observations and Coulomb stress modelling indicate that en-echelon synthetic normal faults interact when the across-strike distance between two faults is <20% of the combined length of the faults, up to a maximum separation of 10 km (Biasi and Wesnousky, 2016;Hodge et al., 2018b) and we use this to determine whether two 210 or more distinct faults in the MSSD could rupture together (Fig. 2). Slip on a fault that is close to an across-strike antithetic fault exerts a negative Coulomb stress change on the antithetic fault (Mildon et al., 2016), and so these cases are not considered as multi-fault sources in the MSSD (Fig. 2d).
For fault sources, source length (Ls) is the straight-line distance between fault tips (for unsegmented faults), or the 215 cumulative straight-line distance between the individual section boundaries for segmented faults (Table 1; Fig. 2). Multifault source length is the sum of the length of each participating fault (Table 1). These estimates imply shorter lengths than a fault's mapped trace in the MAFD. However, the simplified geometries in the MSSD is consistent with other fault-based seismic hazard assessments (Basili et al., 2008;Faure Walker et al., 2021;Stirling et al., 2012), and with the hypothesis that complex surface fault traces in Malawi root onto sub-planar deep-seated (depths > 5 km) weaknesses (Hodge et al., 2018a;220 Wedmore et al., 2020b). Following Christophersen et al., (2015) the minimum length of a MSSD source is 5 km.

MSSD Source Width
We define the MSSD source geometry as 2D planes in 3D space by projecting the fault sources down-dip, and, in the case of faults in Lake Malawi that were mapped from the offset of the synrift basement surface (Scholz et (Gaherty et al., 2019;Kolawole et al., 2018a;Stevens et al., 2021;Wedmore et al., 2020a;Wheeler and Rosendahl, 1994) and these are applied when projecting faults down-dip. The moderately-steeply dipping (40-65º) planar faults indicated by these studies are also used to justify placing a 53º dip estimate for sources in Malawi where no direct 230 evidence for dip is currently available. The dips and kinematics of linking sections in Malawi have not been directly measured, however, they show distinct dip-slip scarps, and do not coincide with along-strike minima in scarp height or footwall relief (Wedmore et al., 2020b). These linking sections are therefore interpreted as dip-slip planes that dip at the same angle as the adjoining sections, rather than vertically dipping strike-slip sections (Acocella et al., 1999). Width (W) in the MSSD represents the width of an earthquake a source may host. For relatively short section sources, W will therefore be less than the width of the larger fault or multi-fault structure they are contained within in the MSSD geometrical model (Fig. 3). In practice, this implies that section ruptures can float at a range of depth intervals on a larger fault plane (Pagani et al., 2014), and so do not necessarily propagate to the surface; indeed, the possible blind rupture of a northern section of the Bilila-Mtakataka Fault during the MW 6.3 1989 Salima Earthquake may be an example of such an event 240 (Hodge et al., 2018a;Stevens et al., 2021). In the first instance, W is assigned based on an empirically-derived scaling relation between W and Ls (Leonard, 2010), which are self-consistent with earthquake magnitude and average single event displacement estimates (Sect. 3.3). For dipslip faults, Leonard, (2010) relations assume that W is unlimited by the thickness of the seismogenic layer. In central and northern Malawi however, faults and multi-fault systems may reach lengths >140 km, which assuming fault dips of ~50-60º, 250 would imply ruptures at depths >40 km. This would be deeper than the 30-40 km thick seismogenic layer in Malawi (Ebinger et al., 2019;Stevens et al., 2021) and would imply that ruptures propagate into the upper mantle. Although upper mantle earthquakes have been recorded in Malawi (Yang and Chen, 2010), our preferred interpretation is that ruptures along https://doi.org/10.5194/nhess-2021-306 Preprint. Discussion started: 16 November 2021 c Author(s) 2021. CC BY 4.0 License. faults in the MSSD will not exceed depths of 30-40 km since: (1) mechanically, it is easier for dip-slip ruptures to propagate up-dip rather than down-dip (Das and Scholz, 1983) and (2) estimates of fault width in earthquake scaling relationships are 255 derived from aftershock distributions, and for dip-slip faults, these events do not generally nucleate below the portion of the crust that is seismogenic (Henry and Das, 2001). In the MSSD, W is therefore calculated as: where c1 is an empirically derived parameter (for interplate dip-slip faults >5 km long; Leonard, 2010), δ is fault dip 260 (assigned 53º, unless otherwise measured), and z is the thickness of the seismogenic layer, for which we use an intermediate estimate of 35 km.
Following these first estimates for W, we then test whether the down-dip extent of a MSSD source implies that it will intersect with another source at depth (Fig. A1). In this way, we accommodate observations from Malawi and elsewhere that 265 such dip intersections can pose significant barriers to earthquake rupture and/or one of the intersecting faults is truncated by the intersection (Gaherty et al., 2019;King, 1986;Plesch et al., 2007;Walters et al., 2018). In the case where two 2D planes in the MSSD intersect at depth, we assume that the shorter -and presumably lower displacement-source has been truncated and locked by the longer source ( Fig. A1; Scholz and Contreras, 1998). Furthermore, if the across-strike distance at the surface between two intersecting sources is <6 km, which is the maximum across-strike distance that two sources dipping at 270 53º and with widths <5 km will intersect, we omit the shorter of the two sources in the MSSD. Following these criteria, and the removal of other sources <5 km long (Sect. 3.3.1), 22 faults in the MAFD are not included in the MSSD (Fig. 3, Table   S1). This does not imply that these structures cannot host earthquakes but instead that: (1) there are few historical observations of surface ruptures <5 km long (Baize et al., 2019), and this increases the uncertainty in applying earthquake scaling relationships to these faults (Christophersen et al., 2015;Stirling et al., 2013), and (2) there are many hitherto 275 unmapped short (<10 km) faults in Malawi (Williams et al., 2021c), and so during PSHA, it may be more appropriate that moderate magnitude seismicity along them is incorporated using off-fault distributed sources (e.g., Hodge et al., 2015;Stirling et al., 2012).

Slip Rates
For the MSSD sources in the North Basin of Lake Malawi, slip rates are derived from estimates that were previously made 280 using the vertical offset of a 75 ka megadrought horizon in seismic reflection data (Scholz et al., 2007;Shillington et al., 2020). The offset-reflector slip rate estimates are preferred in the MSSD instead of the geodetic-based estimates (described below), as: (1)  cycles, and so are more representative of a source's long term behaviour (Cowie and Roberts, 2001;DuRoss et al., 2020).
The uncertainty in using the offset seismic reflector to derive slip rates is discussed in Sect. 3.4. 285 Slip rates are derived from geodesy using a 'systems-based' approach that partitions the regional geodetic extension rate onto rift faults in a manner consistent with observations and theory of regional strain distribution in narrow magma-poor continental rifts (Williams et al., 2021b). We first group the MSSD sources in central and northern Malawi into the North, South, and Central Basins (Scholz et al., 2020;Shillington et al., 2020), and in southern Malawi into the Makanjira, Zomba, 290 Lengwe (previously referred to as the "Mwanza"), Lower Shire, and Nsanje basins ( Fig. 1b; Williams et al., 2021b). We then divide the MSSD sources depending on whether they are part of an intra-rift or border fault system. Border faults are classified geometrically in the MSSD as the faults at the edge of the rift (Ebinger, 1989;Muirhead et al., 2019;Williams et al., 2021b). The slip rate for each MSSD source, s, is then estimated through: where θs is the source's slip azimuth, v and φ are the geodetically-derived horizontal rift extension rate and azimuth, chwf is a correction factor for hanging-wall flexural extension, α is a weight that depends on whether the source is hosted on a border (αbf) or intrarift (αif) fault system, and it is divided by the number of mapped border (nbf) or intrarift (nif) fault or multi-fault systems in each basin. Uncertainty in these parameters is discussed in Sect. 3.4. 300 In the MSSD, the rift extension rate (v) and azimuth (φ) are derived from the geodetic model developed by Wedmore et al., (2021) in which southern Africa is divided into two microplates (San and Rovuma) that move independently of the Nubian Plate (Fig. 1). The Euler Pole for the relative motion between San and Rovuma (as defined by a location and rotation rate) and associated uncertainties are used to calculate the plate motion and its uncertainty at the centre of each basin following 305 the methods of Robertson et al (2016) (Table 2, Fig. 1). The MSSD sources are assumed to exhibit pure normal dip-slip, which is consistent with fault slickensides and focal mechanisms (Delvaux and Barth, 2010;Hodge et al., 2015;Wedmore et al., 2020a;Williams et al., 2019), and so the slip azimuth (θ) is parallel to the source's dip direction.  Wedmore et al., (2021) and the 310 coordinates from which it was derived. The uncertainties associated with each vector are derived using the methods presented by Robertson et al., (2016). For basins in southern Malawi, the Nubia-Rovuma plate motion vectors obtained from the Saria et al., (2013) geodetic model (S13) and used in the South Malawi Seismogenic Source Database are also reported.  (Shillington et al., 2020;Wedmore et al., 2020a), elsewhere along the EAR (Kolawole et al., 2021b;Muirhead et al., 2016Muirhead et al., , 2019Wright et al., 2020), and in analogue and numerical models (Agostini et al., 2011;Gupta et al., 1998). The South Basin is bound onshore by the Metangula Fault (Laõ-Dávila et al., 2015). However, Flannery and Rosendahl, (1990) have previously interpreted that the 320 South Basin 5-13 multi-fault system, which lies 5-20 km across strike under Lake Malawi (Fig. 2b), is also a border fault given its relatively large length-scale (>200 km) and high throw (>~2 km, as derived from variations in the thickness of synrift sediments across it; Scholz et al., 2020). We acknowledge this in the MSSD by interpreting the South Basin 5-13 multi-fault system by distributing αbf equally between it and the Metagula Fault.

325
The considerable throw (>5 km) along border fault systems in central and northern Malawi induces a significant amount of downward flexure within the rift floor, which is accommodated by intrarift faults (Muirhead et al., 2016;Olive et al., 2014; extensional strain and local flexural strain must be considered. The latter, however, is not sampled by far-field geodetic measurements (Muirhead et al., 2016;Shillington et al., 2020). In Eq. 2, we therefore apply a correction factor (chwf) to 330 account for the flexural strain that intrarift sources in Malawi are accommodating, and which is not directly incorporated into v. We define chwf as: where Tif-ext is the estimated total cumulative extension across a basin's intrarift sources (Appendix A), and hwfext is the flexural extension across the basin as modelled following a broken-plate model (Figs. 4 and A3; Tables A2 & A3; Billings 335 and Kattenhorn, 2005;Muirhead et al., 2016;Shillington et al., 2020;Turcotte and Schubert, 1982). The calculated profiles across these basins cannot determine which intrarift sources will accommodate disproportionately more or less flexural strain ( Fig. 4), and so each intrarift source in a given basin is assigned the same range of chwf values. Hanging-wall flexural modelling in the basins south of Lake Malawi indicates negligible flexural extension due to the much lower throws (<1 km) on the region's border faults (Fig. A3), and so chwf is set to one for these basins. 340

Earthquake magnitudes and recurrence intervals
We apply empirically derived earthquake scaling relationships to estimate the magnitude and average single event displacement of an earthquake along a MSSD source. For consistency with estimates of a source's area, we use the Leonard, (2010) relations to calculate these parameters. Inherent in the Leonard, (2010) magnitude scaling relationships for dip-slip faults are that Ls scales with W following Eq. 1, however, this scaling breaks down for MSSD sources whose down-dip 355 extent is limited by an intersecting source or the thickness of the seismogenic layer (Sect. 3.1.2). We therefore adapt the model that Leonard (2010) applied for width-limited strike-slip ruptures, which indicates that seismic moment (M0) ∝ Ls 1.5 and ̅ =c2√ , where As is source area and equals Lsz/sinδ, c2 is an empirically derived constant, and ̅ is average single event displacement. The earthquake magnitude of source s in the MSSD therefore equals: where μ is the shear modulus (33 GPa; Leonard, 2010), and z is 35 km, as used in Eq. 1. Estimates of MW and slip rates are then combined to calculate recurrence intervals (R) through the relationship R= ̅ /slip rate (Wallace, 1970). 365

Uncertainty in the MSSD
There is considerable uncertainty in the variables used to calculate the slip rate and recurrence interval estimates in the MSSD, which is captured as described below. For the slip rates derived by Shillington et al., (2020) in the North Basin of Lake Malawi from the offsets on the 75 Ka. megadrought horizon in seismic reflection data, the primary source of uncertainty is, at these shallow depths, associated with the vertical resolution of the seismic reflection data, which is 370 controlled by the frequency content of the data and the signal -to -noise ratio. The vertical resolution of seismic reflection data is typically estimated to be a quarter of the wavelength (/4) of the seismic data (Widess, 1973), though some authors report detecting faults with much smaller offsets in data with low noise (e.g., /30; Brown, 2011;Faleide et al., 2021). The Hz, and so ~25-37.5 m. For the purposes of this study, we apply the /4 rule, a velocity of 1500 m/s and 50 Hz, which gives 375 an uncertainty of 7.5 m; however, we consider this a very conservative estimate since we can identify much smaller fault offsets in some places. In addition, the reflector's age, which was obtained from Optically Stimulated Luminescence (OSL) dating of a drill-core interval that was tied to the reflector (Scholz et al., 2007), has a ± 5,290 year uncertainty associated with it, and there a range of plausible fault dips the vertical offset measurement could be projected into (40-65º).

380
To quantify the uncertainties of these slip rate estimates, we follow the probabilistic framework of Zechar and Frankel, (2009). Specifically, we treat the OSL drill-core date as a normal distribution, and the slip measurement uncertainty (i.e., the combination of the vertical offset and fault dip uncertainties) as a boxcar function. Where multiple offset measurements of the reflector have been made for the same fault, a single offset probability distribution function (pdf) is derived from normalizing the sum of the individual offset pdfs (Zechar and Frankel, 2009). The resulting slip rate of each fault is then also 385 treated as a normal pdf, albeit with a truncation for slip rates <0 (Zechar and Frankel, 2009). For multi-fault sources whose slip rate is measured from the offset reflector, the slip rate and slip rate uncertainty is derived from the area-weighted average slip rate of the participating fault sources.
Uncertainty in the parameters used to estimate slip rates and earthquake recurrence intervals from the systems-based 390 approach is addressed through a logic tree (Fig. 5). A common interpretation of a logic tree is that all possible branch combinations represent a mutually exclusive and collectively exhaustive (MECE) set of events (Bommer and Scherbaum, 2008). However, it is difficult to interpret the results of logic trees using an MECE approach, as strictly speaking it implies that only one (unknown) outcome is correct, and all other branches provide no other information (Bommer and Scherbaum, 2008;Marzocchi et al., 2015). In the MSSD, we therefore sample epistemic uncertainty by incorporating the "relaxed view" 395 of logic trees (Cramer et al., 1996;Gerstenberger et al., 2020;Marzocchi et al., 2015). In this context, uncertainty is defined nonparametrically by the variability of outcomes from the logic tree itself. Specifically, we calculate a slip rate and recurrence interval for each MSSD source in 10,000 Monte Carlo simulations of the logic tree in Fig. 5. We then fit a normal distribution, truncated at values <0, to the slip rate simulation results (Fig. 6a), and since it is calculated through a log function in Eq. 4, a log normal distribution to the recurrence intervals R (Fig. 6b). 400 When sampling the MSSD logic tree, we treat parameters that have been described by standard deviations (σ) about a mean value as a continuous normal distribution in the simulations (Fig. 5). Parameters assigned based on a range of observed values in Malawi (e.g., fault dip) are discretized into three equally weighted values based on an expert judgement (Fig. 5).
We note that there are pitfalls with using expert judgements in logic trees, however, for a tree with many branches, the 405 outcomes are generally insensitive to the weightings, and it is the values at each logic tree step that are of importance (Bommer and Scherbaum, 2008). For simplicity, the slip rate and R reported for each source are the mean values from the distributions fitted to the simulation results, and the upper and lower reported values represent 1σ uncertainty (Fig. 5, Table 1). In this context, the upper and 410 lower values of slip rate and R represent our certainty in these parameters at a 68% confidence level. However, should a user of the MSSD wish to derive the uncertainty in slip rate and R at different confidence levels, they will be able to do so through the reported values.

Slip rate comparison
There are 11 MSSD fault sources in the north basin of Lake Malawi in which slip rates can be derived from the offset of a 75 415 Ka seismic reflector (Shillington et al., 2020) and from the systems-based approach. Since in both cases, the slip rates are expressed as normal distributions that are truncated for values <0 (Sect. 3.4), we performed the following statistical tests to test how well these independent estimates of fault slip rates compare: (1) a two sample t-test for the null hypothesis that 10,000 values randomly drawn from the two slip rate distributions come from a distribution with the same mean, but since the offset reflector and systems-based slip rates uncertainties are not necessarily the same, unequal variances, and (2) (Fig. 7, Table 3). There is an overall increase in slip rates from south to north across Malawi (Fig. 7d-f) due to higher EAR extension rates as distance from the San-Rovuma Euler Pole increases ( Fig. 1; Wedmore et al., 2021) and, for intrarift sources, the contribution of hanging-wall flexure to slip (Shillington et al., 2020). There are more multi-fault sources in central and northern Malawi (Fig. 7d-f), although we cannot distinguish whether this reflects how fault tips are mapped in the DEMs and seismic reflection data, or if this reflects that previously 440 distinct faults are beginning to interact and coalesce in this more evolved part of the EAR cases of the logic tree (SMSSD) and from 10,000 Monte Carlo simulations through the logic tree (Fig. 5) and then fit to a 445 normal distribution truncated at zero (MSSD). For the MSSD, results can also be discretized by the mean value ± 1 standard deviation (σ). For the SMSSD, no weighting was formally assigned to either estimate and so is depicted here as three equal weightings. (b) Equivalent to (a) but for the Zomba Fault recurrence interval (R), which follows a log normal distribution.
Comparison of (c) mean slip rate and (d) mean recurrence interval estimates for all faults in the Zomba Graben between the SMSSD and MSSD. Error bars represent extreme values (SMSSD) and 1σ (MSSD). 450 The mean and range of intermediate earthquake magnitude estimates for section sources in the MSSD is MW 6.3 and MW 5.4-7.6, MW 6.8 and MW 5.4-7.6 for fault sources, and MW 7.3 and MW 6.7-8.1 for multi-fault sources (Fig. 7, Table 3).
Twenty-eight sources are identified that are capable of hosting MW >7.5 earthquakes with the largest magnitude source (MW 8.1) being the 268 km long South Basin Fault 5-13 multi-fault system (Fig. 2b). Smaller source lengths imply shorter 455 intermediate recurrence intervals for section sources (~500-30,000 years) than on fault and multi-fault systems (1,000-40,000 years). The standard deviation (1σ) uncertainties for slip rates are 0.05-0.3 mm/yr and for recurrence intervals, 1σ uncertainty is approximately one order of magnitude (Fig. 6).

Slip rate estimate comparisons in Lake Malawi
Of the 11 intrarift fault sources in the North Basin of Lake Malawi whose slip rate estimate could be compared, the mean slip rate from the 75 Ka offset reflector is within 2σ of the mean slip rate derived from the systems-based approach for 9 faults (Fig. 8). However, in the case of the t-test, we reject the null hypothesis that the two slip rate estimates are from probability distribution functions with the same mean value at a 5% significance level for all faults (Fig. 8). This reflects that 465 slip rate estimates are higher for 9 out of 11 cases when they are derived from the offset reflector (Fig. 8).
We find that the overlapping coefficient (OVL) between the two slip rate probability distributions is >0.5 for 9 out of 11 faults. For the cases where OVL <0.5, one is for a fault interpreted as the northern tip of the Usisya border fault system, and so this result may reflect along-strike reductions in the slip rate of this multi fault system (Accardo et al., 2018;Contreras et 470 al., 2000). The other case is for Fault 1 of Shillington et al., (2020) (North Basin Fault 14 in the MSSD, Fig. 2c), which considering its 2.5 km total throw, is a particularly high slip-rate intrarift fault. In both instances, these comparisons indicated that there is more along-and across-strike variation in the slip rate of intrarift faults in Malawi than suggested by the systems-based approach, where the only parameter that results in slip rate variations is the fault slip azimuth with respect to the regional extension direction (Eq. 2). 475

Assessment of fault slip rate estimates in the MSSD
The MSSD uses a new geodetic model for East Africa (Wedmore et al., 2021) compared to that used in the South Malawi Seismogenic Source Database (SMSSD; Saria et al., 2013;Williams et al., 2021b). Overall, the rift extension rates inferred from these models are broadly similar, so using the Wedmore et al., (2021) model does not significantly change the mean slip rate estimate (Fig. 6). However, there is a significant reduction in the regional extension rate uncertainties (from ± 1.5 485 mm/yr to ± 0.3 mm/yr, Table 2). This demonstrates the importance of collecting new geodetic data in East Africa to reduce epistemic uncertainty in seismic hazard assessment. coefficient between the two probability distributions (Eq. 6) and the result of the t-test to determine if the rates are from a probability distribution with the same mean value are also indicated. The t-test is rejected when p < 0.05.
By using the variability of logic tree outcomes to describe slip rates and recurrence intervals in the MSSD, we also provide a 495 more thorough description of the epistemic uncertainty in these parameters than the SMSSD, which considered the extreme and intermediate logic tree branches only (Fig. 6c&d). This approach could be used to model uncertainty in other regions where alternative hypotheses for slip rates and recurrence intervals have been explored using logic trees (Beauval et al., 2018;Vallage and Bollinger, 2020). Nevertheless, no MSSD slip rate estimates are 'well-constrained' under the test that a well-constrained slip rate is one where the median estimate is greater than the width of its 95% confidence interval (Bird and 500 Liu, 2007;Zechar and Frankel, 2009).
For 9 out of 11 intrarift fault sources in Lake Malawi's North Basin, the mean slip rate estimate is higher when obtained from the measured offset of a 75 ka seismic reflector (Shillington et al., 2020) than from a systems-based approach, which are contingent on geodetically derived regional extension rates (Fig. 8). The relatively low systems-based slip rate estimates 505 may reflect the inadvertent inclusion of inactive faults when defining nif in Eq. 2 for the North Basin. All offshore faults in this basin have been active within the past 75 Ka (Shillington et al., 2020), however, we cannot exclude the possibility that some onshore faults are now inactive, even though they show evidence for EAR activity and/or are well oriented for reactivation in the regional stress state (Dawson et al., 2018;Kolawole et al., 2018a;Williams et al., 2021c). Alternatively, the proportion of regional extensional strain that is partitioned on to intrarift sources (αif in Eq. 2) may be too low. The values 510 we applied (0.5-0.9, Fig. 5) are consistent with observed cumulative intrarift and border fault extension in northern Malawi (Accardo et al., 2018;Shillington et al., 2020), however, it is possible, that over the lifetime of the EAR, disproportionately more strain is migrating onto intrarift faults (Biggs et al., 2010;Kolawole et al., 2018a;Wedmore et al., 2020a). The discrepancy between geologic and systems-based slip rates does not reflect temporal slip rate variations across an individual fault (Beanland and Berryman, 1989;Hetland and Hager, 2006), as we are considering the slip rate across the entire fault 515 network in northern Malawi, and at this spatial scale, the cumulative slip rates of faults in continental rifts are generally stable over millennial timescales (Nicol et al., 2006). Nevertheless, although there is a discrepancy between the mean slip rate estimates from the offset reflector and systems-based approach, the high overlapping coefficient (OVL >0.5 for 9 out of 11 faults) between the two slip rate probability distribution, suggests that the latter approach is an appropriate method to estimate faults slip rates elsewhere in Malawi where no other constraints are currently available. With the collection of more 520 geologic and geodetic data in Malawi, these slip rate estimates can be refined, and the existence, or not, of temporal slip rate variations clarified.

Earthquake magnitude estimates in the MSSD
There are 28 sources in the MSSD that, given their geometry and the Leonard, (2010) scaling relationships (Eq. 4), can host MW >7.5 earthquakes. If such an event was to occur, it would be amongst the largest recorded continental normal fault 525 earthquakes (Middleton et al., 2016;Valentini et al., 2020;Xu et al., 2018). Indeed, it has been questioned whether MW >7.5 continental normal fault earthquakes are physically possible due to the constraints imposed by smaller differential stresses and rupture widths in continental crust where the seismogenic layer is typically 10-20 km thick (Neely and Stein, 2021;Xu et al., 2018). However, we suggest that these factors do not limit earthquake magnitudes in Malawi given its cold, anhydrous, frictionally strong, and thick seismogenic layer (35 km; Ebinger et al., 2019;Fagereng, 2013;Hellebrekers et al., 530 2019;Blenkinsop, 1993, 1997;Stevens et al., 2021). Furthermore, geomorphic analysis of the Billila-Mtakataka Fault scarp indicates high single event displacements (~5-10 m), which is consistent with it hosting MW 7.4-8.0 earthquakes (Hodge et al., 2020). We also note our magnitude estimates are contingent on the hypothesis that source width will saturate at Ls >140 km so that M0 ∝ Ls 1.5 (Leonard, 2010;Sect. 3.1.2). However, we cannot exclude the possibility that very long ruptures propagate below 35 km. If true, then the MSSD underestimates magnitudes for sources with lengths >140 km 535

Future directions for the MSSD
Although the basic feature of the MSSD is an earthquake 'source,' it is not an exhaustive list of potential earthquake ruptures in Malawi as: 1) the MAFD is not a complete database of active faults in Malawi; particularly faults <10 km long, or faults that do not show evidence for EAR displacement but that are still active (Williams et al., 2021c), 2) uncertainty in how faults intersect at depth in Malawi is not explored in the MSSD, and 3) the MSSD does not contain information about potential 540 earthquakes that rupture multiple sections but not the whole length of a segmented fault. Indeed, earthquakes are not necessarily predisposed to conform to fault segment boundaries identified from empirically derived geometrical criteria (Kagan et al., 2012). This could be explored in future in the MSSD by distributing various event magnitudes across a wider fault system for a given moment rate and magnitude-frequency distribution (Visini et al., 2020;Youngs and Coppersmith, 1985). 545 It is implicit in the MSSD approach that the slip rate assigned to each source is released seismically. This is consistent with observed patterns of seismicity (Ebinger et al., 2019;Stevens et al., 2021) and the velocity weakening behaviour of representative basement samples from Malawi in deformation experiments at lower crustal pressures and temperatures (Hellebrekers et al., 2019). However, some shallow (depths <6 km) aseismic deformation was observed in northern Malawi 550 following the 2015 MW 5.2 earthquake (Zheng et al., 2020). This could be addressed by dividing the MSSD recurrence intervals by a representative estimate of Malawi's crust's coupling coefficient (Bird and Liu, 2007). https://doi.org/10.5194/nhess-2021-306 Preprint. Discussion started: 16 November 2021 c Author(s) 2021. CC BY 4.0 License.

Conclusions
The Malawi Seismogenic Source Database (MSSD) is a freely available database that documents the geometry, slip rate, and earthquake magnitude and recurrence intervals of 248 possible earthquake sources in Malawi and neighboring Tanzania and 555 Mozambique. It is distinct, but complementary to the Malawi Active Fault Database (Williams et al., 2021c). The MSSD also represents an update of the South Malawi Seismogenic Source Database (Williams et al., 2021b) due to the application of a new geodetic model (Wedmore et al., 2021), new active fault mapping , and a more robust description of uncertainty.

560
The >100 km length-scale of faults and multi-fault sources in the MSSD imply that Malawi may experience earthquakes MW >7.5. Such magnitudes, although rare for continental normal faults, are consistent with the crust's rheology in Malawi.
Regional extensional rates of 0.5-1.5 mm/yr imply the occurrence of such large magnitude events will be low (10 3 -10 4 years); however, the MSSD also documents the possibility of M W 5.5-6.5 earthquakes with recurrence intervals of ~10 3 years, and such events can also cause significant loss in Malawi (Goda et al., 2016;Gupta and Malomo, 1995). The data 565 contained within the MSSD would allow the hazard of such events be formally assessed through probabilistic seismic hazard analysis.
Slip rates in the MSSD are estimated from either a systems-based approach that derives these rates from partitioning regional geodetic extension rates across faults, or, in Lake Malawi, direct measurements from the offset of a 75 Ka seismic reflector 570 (Shillington et al., 2020). Where it is possible to compare these estimates, we find that although those inferred from the offset reflector are higher, the two estimates are within error of each other. This suggests that the slip rates (~0.05-3 mm/yr) estimated elsewhere in Malawi are meaningful. Hence, combining geodetic data with geological theory on regional strain distribution, active fault maps, and earthquake scaling relationships can provide important insights into the seismic hazard of other regions lacking historical or paleoseismic records. 575

Appendix
Below we provide an additional table and figure that provide extra detail to this study. Then in Appendix A, the hangingwall flexural analysis in Malawi is summarized.  Figure A1: Examples of faults in the MSSD that are projected to intersect and where the across strike distance at the surface is sufficient (>6 km) that they are interpreted to represent distinct sources. In this case the longer Chingale Step fault (green) 585 is interpreted to have cut off the shorter Mlungusi (red) and Liwawadzi (cyan) faults, so that their geometry does not extend below the intersection, as indicated by transparent polygons. The revised cut off area of these faults is then used in the earthquake magnitude and single event displacement scaling relationships (Eqs. 4 and 5 in the main text).

Appendix A: Hanging-wall flexure in Malawi
The considerable amounts of throw (>1000 m) along a rift bounding fault can induce a significant amount of flexure within 590 the lithosphere either side of the fault (Muirhead et al., 2016;Olive et al., 2014;Petit and Ebinger, 2000;Shillington et al., https://doi.org/10.5194/nhess-2021-306 Preprint. Discussion started: 16 November 2021c Author(s) 2021. CC BY 4.0 License. 2020). In the case of the hanging-wall, this is a downward flexure that can result in intrabasinal faults accommodating additional slip to that imparted by regional extension alone (Muirhead et al., 2016). This additional flexural strain must therefore be accounted for when considering the slip rate of faults in Malawi (Sect. 3.2,main text).  The influence of flexural strain on basement profiles across the Lake Malawi basins has been previously assessed (Shillington et al., 2020) using the Broken Plate model (Billings and Kattenhorn, 2005;Muirhead et al., 2016;Turcotte and Schubert, 1982) and we report here the values used to generate representative profiles across these basins in Fig. 4 in the main text. In addition, we apply the Broken Plate model to provide the first estimates of hanging-wall flexural strain in southern Malawi. Unlike in Lake Malawi, there is no subsurface data to validate the resulting profiles in this region, and 605 there is additional complexity due to intrarift topography (e.g. Shire Horst, Kirk Range) and possible rift-widening events such as when the Lower Shire Basin was reactivated during East African Rifting (Castaing, 1991). Therefore, the purpose of these profiles is not to precisely model the across-rift basement geometry, but to estimate the range of hanging-wall flexural extension that may have occurred in southern Malawi given the uncertainty of each parameter we must test. This analysis is conducted only for the Makanjira, Zomba, and Lower Shire basins, as no intrarift faults have been identified in the Lengwe 610 and Nsanje basins (Williams et al., 2021c). The Broken Plate model calculates flexure by considering a vertical line-load at the point of maximum deflection (i.e., at the upper contact of the border fault hanging wall, Fig. S2). The deflection (ω) across a border fault hanging wall can then be estimated as: where ω0 is the maximum deflection, x is the position along a hanging wall profile from the deflecting fault (Fig. S2), and α is: (3 0 (1− 2 )) ] 1 4 (A2) 620 where E is Young's Modulus, v is Poisson's ratio (0.25), g is acceleration due to gravity (9.8 m/s 2 ), h is the thickness of elastic crust, which is assumed here to be the equivalent to the thickness of Malawi's seismogenic layer. and ρ0 is crustal density, for which the average crustal density (2816 kg/m3) from a Malawi three layer model is used (Fagereng, 2013;Nyblade and Langston, 1995). Shillington et al., (2020) applied a value of E (3 ± 1.5 GPa) such that the hanging wall deflection is restricted to a distance comparable to the actual width of Lake Malawi's basins, and we apply this value to 625 south Malawi.  Accardo et al., (2018). c Thickness of sediments in the Bwande-Liwawadze Valley based on electrical resistivity surveys (Walshaw, 1965) and borehole data. (Fig. A3; Bloomfield and Garson, 1965). g Maximum proven thickness boreholes in the Lower Shire Basin, though this is also comparable to other boreholes that did penetrate basement ( Fig. A3; Habgood et al., 1973). 640 h See text.
In Eq. A1, ω0 can be derived through the observation from real and modelled normal faults that the ratio (r) of upthrow to downthrow along a normal fault is typically 0.2 (Muirhead et al., 2016). Therefore: where BFthrow is border fault throw and is equivalent to the sum of the footwall escarpment height and hanging wall sediment thickness. There are significant uncertainties in estimating sediment thickness within southern Malawi, hence a range of values are used (Table A2) A3d) and there is ambiguity in whether the Thyolo fault was a bounding fault during Karoo-age (i.e. Mesozoic) rifting (Castaing, 1991;Habgood, 1963;Habgood et al., 1973;Wedmore et al., 2020b). We therefore model both scenarios. For the 650 case where the Thyolo fault has only been active during East Africaan rifting, we estimate throw from combining an escarpment height of 750±250 m with a sediment thickness of 65 m (Table A2). This represents the maximum proven thickness of sediments in the Lower Shire (Fig A3d; Habgood et al., 1973), and although the true thickness of East African sediments in this basin may be greater, such a scenario would be accounted for in our Karoo rifting model. In this scenario, we combine our EAR throw estimates for the Thyolo fault with the 1 km throw that is reported for Karoo bounding faults in 655 the Lower Shire (Castaing, 1991).
Given a profile of hanging wall deflection, it is possible to derive the resulting flexural extensional strain (ε) within a halfgraben (Billings and Kattenhorn, 2005;Muirhead et al., 2016) where y is the vertical distance from the centre of the plate (downward is positive, Fig. A2). Following Muirhead et al., (2016) and Shillington et al., (2020), we report the flexural strain in terms of the average strain across each basin, and multiply this by basin width to get extension (Table A2). For the Makanjira graben, we calculate the mean strain from the contribution of each side of the graben over its 90 km width (i.e., for the Chirobwe Ncheu and Makanjira faults, Fig A3, Table S2). 665 Results of this analysis are shown in Figs. 4 (Lake Malawi basins), A3 (south Malawi basins), and Table A2. These demonstrate that regardless of the simplifications, uncertainties and assumptions in this analysis, hanging-wall flexure in southern Malawi is negligible (strains <1%) compared to the Lake Malawi basins. Furthermore, unlike the Lake Malawi basins, the flexural profiles in southern Malawi do not match the observed topography (Fig. A3), which further indicates 670 minimal flexural extension in these basins. This result reflects the significant differences in total rift extension between the South Basin and Makanjira Graben and resulting reduction in border fault throw between these basins (Table A2). We therefore do not consider hanging-wall flexure further when considering the slip rate of intrarift sources in southern Malawi  Billings and Kattenhorn, 2005;Muirhead et al., 2016;Turcotte and Schubert, 1982) and parameters listed in Table A2. Solid hanging-wall profile and strain line indicates median estimates, dashed line indicates maximum and minimum estimates. (c) Assumes a profile where the Thyolo fault has only been active during EAR rifting. Solid black line and gray shading represents mean and one standard deviation 680 topography from TanDEM-X 12 m DEM in 10 km swath centred on lines shown in (d) (Schwanghart and Scherler, 2014).
Labelled faults indicate border faults. In (d)), the location and depth to basement in boreholes in south Malawi are also shown (Bloomfield and Garson, 1965;Habgood, 1963;Habgood et al., 1973;Walshaw, 1965;Walter, 1972). Map underlain by 30 m resolution Shuttle Radar Topographic Mission digital elevation model.

685
The higher hanging-wall flexural strain in the Lake Malawi basins (~1-3%, Table A2) suggest that the hanging-wall flexural extension correction factor (chwf) should be applied when estimate slip rates of their intrarift sources in the MSSD (Eqs. 2 and 3 in the main text). This factor is derived by combining a basin's hanging-wall flexural extension (Table S2 and S3) with the total cumulative extension its intrarift faults (T if-ext , eq. 3 in the main text). However, this parameter is poorly constrained apart for intrarift sources, and so we make the following assumptions when deriving Tif-ext: 690 https://doi.org/10.5194/nhess-2021-306 Preprint. Discussion started: 16 November 2021 c Author(s) 2021. CC BY 4.0 License.
• For intrarift faults in the North Basin, the total observed cumulative extension is 2 ± 0.4 km, however, it is estimated that 30% of the extension in the basin may be accommodated by faults below the resolution of the seismic survey (Shillington et al., 2020). Therefore, the total extension of intrarift faults under Lake Malawi's North Basin is estimated to be 2.6 ± 0.5 km. There are three onshore intrarift fault/multifault sources in the North Basin (Fig.   2a). If it assumed that they have accommodated a similar amount of extension as the four offshore fault/mulitfault 695 sources, then their total extension is 1.5 ± 0.3 km, and hence Tif-ext, for the North Basin is 4.1 ± 0.8 km.
• No estimates exist for the total observed cumulative extension of intrarift faults under Lake Malawi in the Central and South Basins. However, we note that the Central Basin's age, and flexural and total extension (7.0 vs 6.3 km; Scholz et al., 2020) are very similar to the North Basin. We therefore assume that the Central Basin's sub-lacustrine intrarift faults have accommodated the same amount of extension as the North Basin's, and then apply the same 700 workflow to calculate Tif-ext, although in this case there are two and ten onshore and offshore intrarift fault/multifault sources respectively (Table A3).
• Flexural and total extension estimates in the South Basin are approximately 50% of the values for the Central and North Basins (~6-7 km vs 3.7 km; Scholz et al., 2020). We adjust the total extension of sub-lacustrine intrarift faults in the South Basin accordingly and note there are seven and eight onshore and offshore intrarift fault/multifault 705 sources respectively (Table A3).
• Within the uncertainty of the hanging-wall flexural profiles across the Lake Malawi basins, it is possible that all the intrarift fault displacement can be accounted for by hanging-wall flexure (i.e., chwf →∞). However, we do not consider this a realistic scenario since other factors (e.g., structural inheritance) can cause intrarift faults to accommodate regional rift extension prior to significant flexural extension (Kolawole et al., 2021b;Wedmore et al., 710 2020a) and so chwf is truncated at values >5.