The impact of hydrological model structure on the simulation of extreme runoff events

Hydrological extremes affect societies and ecosystems around the world in many ways, stressing the need to make reliable predictions using hydrological models. However, several different hydrological models can be selected to simulate extreme events. A difference in hydrological model structure results in a spread in the simulation of extreme runoff events. We investigated the impact of different model structures on the magnitude and timing of simulated extreme highand low-flow events, by combining two state-of-the-art approaches; a modular modelling framework (FUSE) and large ensemble meteoro5 logical simulations. This combination of methods created the opportunity to isolate the impact of specific hydrological process formulations at long return periods without relying on statistical models. We showed that the impact of hydrological model structure was larger for the simulation of low-flow compared to high-flow events and varied between the four evaluated climate zones. In cold and temperate climate zones, the magnitude and timing of extreme runoff events were significantly affected by different parameter sets and hydrological process formulations, such as evaporation. In the arid and tropical climate zones, 10 the impact of hydrological model structures on extreme runoff events was smaller. This novel combination of approaches provided insights into the importance of specific hydrological process formulations in different climate zones, which can support adequate model selection for the simulation of extreme runoff events.

ical models within the FUSE framework. The different model structures will be used to evaluate various hydrological process formulations, to determine which process formulations have the largest impact on the simulated magnitude and timing of extreme high-and low-flow events in different climate zones. Due to the length of the forcing time series, the extreme runoff events in the tail of the distribution can be evaluated using simulated values. Hence, we do not rely on statistical models to extrapolate extreme events. As such, this study contributes to the understanding of the impact of model structural uncertainty 70 in hydrological models on simulated extreme runoff events.

Methods
We assessed the impact of hydrological model structural uncertainty on extreme runoff events by using large ensemble meteorological simulations in combination with the hydrological modular modelling framework FUSE. We examined four different climate zones, because hydrological processes vary considerably between climates (Pilgrim, 1983), which leads to different 75 processes being of importance in controlling the extreme events (Di Baldassarre et al., 2017;Eagleson, 1986). In the R-version of FUSE (Vitolo et al., 2015), ten different model structures were employed, and to capture ::::::: represent : the complete parameter space, 100 parameter sets were used for every model structure. The simulated extreme runoff events were compared based on their magnitude and timing. 80 We employed a 2,000 year time series of meteorological data, generated by the EC-Earth global coupled climate model (v2.3, Hazeleger et al., 2012). This 2,000 year time series originally consisted of a large ensemble of 400 sets of 5 year runs. In this study, these 400 sets were assumed to be one long time series, which enables extensive return period analysis. This time series represents a period with a simulated absolute Global Mean Surface Temperature (GMST) equal to the observed GMST in the years 2011-2015 based on HadCRUT4 data (Morice et al., 2012). The time series thus represents present-day climatic 85 conditions. In Van der Wiel et al. (2019), this data-set was used to evaluate the benefits of the large ensemble technique for hydrology. Further details on the design of the meteorological forcing data are provided in that paper.

Meteorological forcing data
The 2,000 year meteorological time series as employed by Van der Wiel et al. (2019) has global coverage :: at ::: 1.1 : • ::::::::: horizontal :::::::: resolution. However, for this study we restricted ourselves to four climate zones, represented by one grid cell for each climate  Peel et al. (2007), their Figure 10). The climate graphs show simulated climatological monthly precipitation sums (blue bars) and monthly average temperatures (red lines). The hydrological conditions are visualised using simulated monthly average runoff levels. The different line colours represent the ten evaluated model structures and the spread induced by the different parameter sets is shown using grey bands. zone ( Figure 1). Simulated monthly averaged 2 m temperatures and precipitation sums were obtained from the EC-Earth model to classify grid cells based on the Köppen-Geiger criteria, and allow the selection of appropriate grid cells for this study. We selected four grid cells to represent the arid (BWh), cold (Dfc), temperate (Cfb) and tropical (Aw) climate. This set of climate zones offers a comprehensive representation of the global climate zones (Kottek et al., 2006;Peel et al., 2007). 95 Daily 2 m temperature, precipitation and potential evapotranspiration data for the full 2,000 years were then acquired for the four selected grid cells. The 2 m temperature and daily precipitation fluxes were directly available from the EC-Earth model. Potential evapotranspiration fluxes were calculated following the Penman-Monteith method (Zotarelli et al., 2015). The precipitation and potential evapotranspiration fluxes were used as input in the FUSE models, the 2 m temperature was used to force the snow module (see Section 2.2). FUSE is a modular modelling framework, which can be used to diagnose differences in hydrological model structures (Clark et al., 2008). FUSE is developed based on four parent models; the U.S. Geological Survey's Precipitation-Runoff Modelling System (PRMS, Leavesley, 1984), the NWS Sacramento model (Burnash et al., 1973), TOPMODEL (Beven and Freer, 2001) and different versions of the Variable Infiltration Capacity (ARNO/VIC) model (Liang et al., 1994). This framework enables 105 the assessment of intermodel differences in another way compared to other model intercomparison studies (Henderson-Sellers et al., 1993;Reed et al., 2004). In FUSE, each model component can be adapted in isolation and therefore the effect of specific hydrological process formulations can be investigated. In the next subsection we further discuss which model structures we selected and which process formulations were tested.

110
All model structures used in this study were lumped hydrological models, which were run at a daily time step. We employed a spin-up period of five years, before forcing the hydrological models with the 2,000 year meteorological time series. The simulated monthly average runoff varied among the evaluated model structures and parameter sets ( Figure 1). Therefore, it is essential to select an adequate hydrological model for the simulation of runoff levels, and it will likely be of larger importance when simulating extreme runoff events.

115
FUSE as implemented in R (Vitolo et al., 2015) does not include a snow module. However, snow storage and snow melt might be important components in the hydrological cycle of the colder climate zones. Therefore, a snow module was implemented.

Selected model structures
In total, 1248 different model structures can be constructed in FUSE as implemented in R (Vitolo et al., 2015) by combining 130 different hydrological process formulations from the parent models. The architecture of the upper and lower layer can be altered, and the process formulations for simulating base flow, evaporation, percolation, surface runoff, interflow and routing can be changed. The lower layer architecture is intimately tied to the process formulation of base flow. Therefore, they need to be changed simultaneously and only a few combinations are possible.
135 Table 1. The model structures that were employed in this study. Each letter refers to a specific hydrological process formulation as in Clark et al. (2008), the model IDs are described by Vitolo et al. (2015). The model abbreviations are related to the alteration in the model structure and are used throughout this paper.

Model Component
Model Number 1 2 3 4 5 6 7 8 9 10 Upper Layer Ten different model structures were evaluated in this study. Table 1 provides an overview of the selected hydrological model structures. In the odd model numbers, new model structures were constructed and in the even model numbers, a single hydrological process was altered in the model structure relative to the preceding odd model number. By comparing the extreme runoff events simulated between consecutive odd and even numbered model structures, we analysed the impact of a specific hydrological process on extreme event simulation, indicated by the alteration in Table 1.

140
For our synthetic experiment, we decided to apply a fixed routing scheme. The effect of routing parameters on the discharge signal is delay and attenuation. As such, the main effect of the routing scheme would be to decrease the peak height. Since we evaluate our model results on (amongst others) peak height, the routing would dominate the results without providing insights on the underlying runoff-generating processes. Besides routing, the process formulation of interflow was left unchanged 145 throughout this study, as it was not explicitly parameterised in TOPMODEL and ARNO/VIC (Clark et al., 2008).
In contrast to other studies that evaluate different model structures (Atkinson et al., 2002), this study evaluated differences among model structures that are deemed to be equally plausible. Hence, there were no prior expectations of specific models to outperform other models. This means that the emphasis in FUSE is not on the lacking parts of hydrological models, but on the 150 intermodel differences that are caused by different representations of the real world (Clark et al., 2008).

Parameters
In this synthetic experiment, the parameters of the hydrological models were not calibrated to real catchment observations.
Instead, the parameters of the models were sampled over their full range. Since in calibrated experiments it is always difficult to differentiate the effect of parameter values from the effect of model structure, the parameter sampling approach also created 155 the opportunity to assign the effect on extreme events either to parameter values or to model structure.
To investigate the appropriate and feasible number of parameter sets required to sufficiently capture parameter space, the Kolmogorov-Smirnov test was employed (Massey Jr, 1951). With the Kolmogorov-Smirnov test, we compare the difference in the distribution of the hydrological model output between a small parameter sample and a large benchmark sample. Our  We found that the optimal trade-off between computer time and sufficiently capturing parameter space was at 100 parameter sets, as the D-statistic stabilised at this value ( Figure 2). Since there are different process formulations, the number of sampled parameters varied between eleven and fifteen for the different model structures. Nevertheless, for justification we used 100 170 parameter sets for all model structures, independent of the number of parameters. The parameter sets were generated using Latin Hypercube Sampling, based on the parameter ranges provided by Vitolo et al. (2015), as given in Table A1.

Magnitude of extreme runoff events
The magnitudes of the simulated extreme events were evaluated by comparing the distribution of runoff values based on four return periods: 25, 50, 100 and 500 years. The associated runoff levels were determined by sorting the time series of annual 175 maximum and minimum daily runoff values. This resulted in 2,000 sorted runoff values from which events were selected. For instance, for the 500-year return period, the 4th most extreme value in the sorted time series was taken. The different parameter sets resulted in 100 extreme runoff values at a specific return period for every model structure. In order to test whether the projected difference in the distributions of these runoff values (Figure 3d) was significantly different from the paired model, a two-sample t-test was applied. This test was used to evaluate related model structures based on a change in one single hydrological process formulation (Table 1). By comparing related model structures, the impact of corresponding 185 hydrological process formulations could be isolated for specific climate zones and return periods.
For the magnitude analysis of low-flow events, we encountered that some combinations of model structures and parameter sets led to a very low fixed value (in the order of 10 −4 and less), which we refer to as hard-coded lower limits. These lower limits varied between model structures, dependent on the configuration of different storage reservoirs. These limits assure numerical stability, but could obfuscate our analysis, because the difference between distributions simulating lower limits 190 would be significant if the lower limits between two model structures had different values. Conceptually, the lower limits represent zero discharge: the river has run dry. As such, no significant different should be found when two models reached this lower limit. Therefore, in all simulations the lower limit in discharge was set equal to zero.

Timing of extreme runoff events
An asset from the ensemble approach for return period evaluation compared to GEV statistics, is that it also allows us to 195 evaluate the timing of the 500-year events based on the entire 2,000 year time series. Extreme hydrological events do not always result from extreme meteorological conditions, but could also originate from a sequence of moderate weather conditions (Van der Wiel et al., 2020). By assessing the timing of extreme runoff events, we investigated whether the timing of the extreme runoff events is controlled by different model structures and parameter sets or determined by the meteorological forcing. Each ::: row :: in :: (c) :::::::: represents ::: one ::::: colour :: in ::: the coloured bars represent : in ::: (d), : the values :::: total ::::: height of ::: each ::: bar :: is :::::::: determined ::: by the :::: value :: in :: the : blue cells for different model structures as shown in panel (c), and : . :::: Grey :::::: shading ::::: behind : the grey bars indicate :::::: indicates the theoretical maximum for 500-year events: four runoff events with 100 % model agreement.
The timing of extreme runoff events with 500-year return periods were compared. This was done in four steps, as depicted in Figure 4.
First, we sorted all the monthly maxima and minima daily runoff values and their corresponding simulation month ( Figure   4a). The four most extreme events in this sorted 2,000-year data-set represent the extreme events equal or greater than the 500-

205
year events. These four most extreme events were determined for each model simulation, so for each combination of model structure and parameter sets.
Then, we evaluated to what extent the same events were selected for different parameter sets, but with the same model structure. If one event was for instance selected for all 100 parameter sets, this particular event would have a score of 100 in 210 the red row of Figure 4b. If this event was only selected for half of the parameter sets, it would have a score of 50. If across all parameter sets the same four events would be identified, this would result in four times a score of 100 in Figure 4b. This indicates that the influence of hydrological parameters on the timing of the extreme event is negligible.

Magnitude of extreme runoff events
This section describes the impact of model structures on extreme event magnitude for different climate zones, hydrological process formulations and return periods. We compared the distribution of the magnitudes of the extreme high-and low-flow events for related model structures, based on four different return periods, and for four different climate zones. Alterations in 240 the hydrological process formulations lead to a difference in the magnitude of extreme runoff events, as depicted in Figure   3a, :::::: which :: is :: an ::::::: example :::::::: showing ::: the ::::::: principle. Figure 5 shows the same information, but now based on actual simulations of high-flows ( Figure 5a ::: &c) and low-flows ( Figure 5b ::: &d) in the tropical climate zone. We then employed a two-sample t-test to calculate the p-values (Figure 6), which were used to distinguish the statistically significant (p < :: ≤ : 0.05) and non-significant (p > : > : 0.05) differences in the distribution of extreme event magnitudes as in Figure 3d.
Panel c and d of Figure 5 highlight four models :: in ::: the :::::: tropical ::::::: climate :::: zone : for comparison. The model structures that are related by an alteration in the process formulation of percolation, simulate an increasing : a difference in extreme high-flow magnitude for longer :: all : return periods (red lines in Figure 5c). Based on the t-test conducted on the distributionsof the 500-year return period ::::: these :::::::::: distributions, this results in a significant impact of alterations in the process formulation of percolation for 250 this return period :: all :::::: return :::::: periods : (as displayed in Figure 6). In contrast, the model structures related by an alteration in the process formulation of evaporation, simulate comparable runoff values across all return periods for high-flows (blue lines in Figure 5c). Therefore, there is no significant impact on the magnitude of extreme high-flow events caused by this hydrological process formulation (Figure 6). For the low-flows, :: an alteration in the percolation formulation (Figure 5d) does not lead to statistically significant differences in the low-flow distribution (Figure 6), whereas an alteration in the evaporation formulation ally to changes in the process formulation of evaporation. In the tropical climate zone, the low-flow events are less sensitive to alterations in the architecture of the lower layer and the process formulation of evaporation, but still sensitive to alterations in the upper layer architecture. In most climate zones, formulations :::::::: changing ::: the :::::::::: formulation of multiple hydrological processes significantly impact ::::::: impacts the simulation of the magnitude of low-flow events, which implies that the model structure is an 285 important source of uncertainty. The meteorological forcing is clearly not the only factor controlling the magnitude of simulated low-flow events.
A process :::::::::: phenomenon that does play an important role in the evaluation of low-flows is that eventually in some cases, the simulated runoff goes to zero, indicating that no more water is flowing through the river. For instance in the arid climate zone,

290
for the two models where percolation is altered, 100% of the simulations have zero discharge already for the 25-year return period events. Differences in low-flows as a consequence of changing the percolation formulation can then no longer be traced 13 and thus do not lead to a significant difference.
Both differences and similarities can be identified between the distributions of runoff values for the high-and low-flow 295 events. Alterations in hydrological model structures more often result in significant differences in low-flows (45 %) compared to high-flows (24 %), which implies a larger model structural uncertainty in the magnitude of low-flow events. High-flow events mainly depend on precipitation, i.e. meteorological forcing, while the influence of other runoff generating processes such as soil moisture and base flow is marginal (Zhang et al., 2011). This is not to say that these processes are not relevant: merely, our results demonstrate that the way these processes are formulated in the model has limited impact on the model 300 result. The situation during high-flow events is often characterised by a precipitation surplus. Therefore, there will be more or less continuous groundwater recharge by percolation in the unsaturated zone (Knutsson, 1988), which explains why the formulation of percolation appears as a relevant hydrological process to estimate the magnitude of high-flow events.
Hydrological models are traditionally designed to simulate the runoff response to rainfall and therefore, it seems to be more challenging to simulate low-flow events (Staudinger et al., 2011). The low-flow events are mainly sensitive to alterations 305 in the architecture of the upper and lower layer. Earlier research indicates the importance of the lower layer architecture and the process formulation of base flow in simulating low-flow events (Staudinger et al., 2011). The architecture of the upper and lower layer defines the water content in these layers (Clark et al., 2008). This water content is controlling the runoff-generating processes during low-flow events due to a precipitation deficit and reduces the importance of the percolation process (Andersen et al., 1992). Therefore, alterations in the process formulation of percolation mainly affect high-flow events 310 in the wet climate zones ( Figure 6).
Besides these differences, there are also similarities in the simulation of high-and low-flow events. The magnitudes of high-and low-flow events in the cold and temperate climate zone have a similar response to alterations in all hydrological process formulations. Furthermore, alterations in the process formulation of surface runoff have no significant impact on the magnitude of both types of extreme runoff events. This might be due to the lacking implementation of infiltration excess 315 overland flow in FUSE (Clark et al., 2008). This could be an important factor for surface runoff, especially in arid climate zones (Reaney et al., 2014). Another factor might be the temporal resolution of the model runs: the models are run at a daily time step, while surface runoff is especially relevant at shorter time steps (Morin et al., 2001;Melsen et al., 2016).

Timing of extreme runoff events
The timing of extreme high-flow events is evaluated using stacked bar charts. Figure 7 shows the percentage of model agree-320 ment on the timing of extreme high-flow events with a return period equal or greater than 500-years, as earlier depicted in Figure 4d. For the low-flow events, the timing evaluation could not be conducted, because of the nature of low-flow events to persist longer. This will be further discussed in this section.
The impact of different hydrological process formulations and parameter sets on the timing of extreme high-flow events 325 varies between the selected climate zones. In the arid and tropical climate zones, there are multiple events with a model agree- ment exceeding 99 %. In these cases, almost all model simulations agree on the timing of these extreme events. Just ten and eight runoff events were selected (out of a total of 24,000 potential events) as extreme high-flow events in the arid and tropical climate zones, respectively (Figure 7). This means that there are only a few model simulations that deviate by simulating the most extreme runoff events at a different point in the time series. For these climate zones, this implies that the timing is mainly 330 prescribed by the meteorological forcing. This might be explained by the precipitation climatology in these climate zones.
On average, in the arid climate zone the daily precipitation sum exceeds 1 mm only during eleven days a year. Precipitation is therefore scarce and characterised by short events of high-intensity (Goodrich et al., 1995), which propagate into extreme runoff events. In the tropical climate zone, there is a high precipitation rate throughout the complete time series. However, there is a pronounced wet season from October until April (Figure 1). There are multiple extreme precipitation events larger than 335 150 mm/d. The 500-year extreme runoff events are initiated by these extreme precipitation events.
In both the cold and temperate climate zone, there is only one event with a model agreement exceeding 99 % (Figure 7). In the cold and temperate climate zones, there are 20 and 38 different events selected as extreme events, respectively. The selected runoff events with the highest model agreement are initiated by the most extreme precipitation events, whereas the selected 340 extreme runoff events with a low model agreement are most likely initiated by compound events (Van der Wiel et al., 2020;Zscheischler et al., 2018). Hence, the timing of extreme high-flow events may depend more on hydrological processes, and consequently vary across hydrological model structure :::::::: structures and parameter values in these climate zones. The stacked bar charts indicate which model structures lead to the selection of events with low agreement. Some model structures seem to show deviant behaviour, but there is no convincing pattern visible; most model structures seem to be represented in low-agreement 345 events. Therefore, there is no clear relationship between the extreme runoff events with a low model agreement and specific model structures. We hypothesise that this uncertainty can be assigned to the difference in parameter sets.
To evaluate the timing of extreme low-flow events, a similar approach was applied compared to the high-flow events. However, for several combinations of model structures and parameter sets, zero runoff was simulated (Figure 5b). These periods of
4.1 :::::: Climate :::::::: synthesis 395 The magnitude and timing of the extreme high-flow events in the arid climate zone are mainly controlled by the meteorological forcing. This is contrary to previous studies in which the runoff in dry catchments was more sensitive to different hydrological models (Jones et al., 2006;Lidén and Harlin, 2000), but here we specifically refer to high-flow events in arid climates. In this climate zone, precipitation is scarce and often characterised by extremely variable, high-intensity and short-duration events (Goodrich et al., 1995). Consequently, runoff in arid climate zones is characterised by a dominance of Hortonian overland flow 400 (Segond et al., 2007). This runoff-generating process is not included in the implementation of FUSE, which might reduce the impact of different model structures (Clark et al., 2008). Also the temporal resolution at which we ran the model and evaluate the high-flow events might be relevant. The extremely flashy precipitation patterns can cause flash floods that occur over the course of a few hours. We evaluate the model results at the daily time step, which can cover up the occurrence of flash flood events. For the low-flow events, we found more spread in the magnitude as a consequence of altering process formulations.

405
Alterations in multiple hydrological processes result in significant differences.
In the cold and temperate climate zones, there is more spread in the simulations regarding the magnitude and timing of extreme runoff events. The magnitudes of extreme high-and low-flow events are sensitive to alterations in multiple hydrological process formulations, which implies that several hydrological processes are important in the runoff-generating processes in 410 these climate zones, as also discussed by Scherrer and Naef (2003). In different model simulations different high-flow events are identified as most extreme runoff events, which leads to a spread in the timing of these events. This spread is partly assigned to the difference in parameter sets.
We only tested a limited amount of processes and process formulations. However, especially in the cold and temperature 415 climate zones, extreme events related to snow melt can potentially occur. Therefore, the process formulation of snow melt could have significant impact on the simulations. This was, however, not tested because we only used a single degree-day snow formulation. The results are therefore conditional on the processes that we altered, and that were available within the FUSE framework.

420
In the tropical climate zone, the spread in the magnitude and timing of extreme runoff events is small, which indicates that the extreme events are mainly controlled by the meteorological forcing. There is only one process formulation that simulates a significant impact on the magnitude; percolation for the high-flows and the upper layer architecture for the low-flow events.
The formulation of the percolation process controls the high-flow events in the tropical climate zone, as there are months with large amounts of precipitation ( Figure 1). Due to these large amounts of precipitation, water is subjected to percolation through 425 the succeeding layer (Bethune et al., 2008;Savabi and Williams, 1989). The role of the upper layer architecture in the simulation of low-flow events might be related to evaporation dynamics -although the evaporation formulation has less significant impact (0.05<p< :: ≤0.1).
We found no distinct relationship between the length of return periods and the degree of uncertainty in the magnitude of 430 extreme runoff events. There are situations in which the difference between related distributions of high-flow events become significant when the length of the return period increases, e.g. the percolation process formulation in the arid climate zone.
There are also distributions of related model structures that are significantly different at shorter return periods, e.g. the evaporation process formulation in the temperate climate zone. This contrast might be explained by the difference in importance of specific hydrological processes or parameters for events at different return periods.

Study design
We designed a synthetic experiment to conduct controlled experiments on the role of model structure on the simulation of extreme runoff events. There are, however, a few implications when using a synthetic approach. In this study, the models were not calibrated in order to isolate the impact of different model structures. It is however common practice to use a pre-defined 440 model structure, which is fitted to the local circumstances via parameter calibration (McMillan et al., 2011b). In this study the complete parameter range was sampled: all combinations of parameter values were considered equally plausible and interdependence of parameters was not considered since we used the Latin Hypercube Sampling approach (Clarke, 1973;Helton and Davis, 2003). Tuning the parameters to a specific location could reduce the parameter range, and smaller parameter ranges could lead to more realistic runoff values (Cooper et al., 2007), which might have revealed a relatively higher impact of model 445 process formulation on model results. This, however, comes at a loss of generality. Also, when calibrating hydrological models to simulate extreme runoff events, other challenges remain. Especially the limited availability of historical observations can create a problem for the reliable calibration of extreme events (Wagener et al., 2010); since many observation records do not exceed a length of 50 years, models are forced to simulate outside of their calibration range. This will negatively influences model performance, as for instance demonstrated by Imrie et al. (2000).

450
The 2,000-year meteorological time series used in this study originally consists of a simulated large ensemble of 400 sets of 5-year runs. These 400 sets were concatenated artificially. This concatenation might lead to strange transitions of meteorological conditions once every 5 years, as the December month is followed by the next January month of a new 5-year set.
Nevertheless, we decided to treat this large ensemble as a single time series, in order to allow for extensive return period analy-455 sis. We consider the effect of the concatenation limited since we only evaluate the annual and monthly maximum and minimum daily runoff levels. The employed time series does not allow for the evaluation of multi-year low-flow events, despite these events being extremely relevant considering their societal impact.
Besides choices in the sampling strategy and choices in the treatment of the meteorological forcing, we also made choices 460 in the characteristics of high and low flow events that we evaluated. Because this is a first extensive exploration of the role of model structure on the simulation of extreme events with long return periods, we evaluate high-and low-flows for their most straight forward characteristic: the maximum and the minimum runoff. There are, however, ample other characteristics that could be of relevance in the context of hydrological extremes. For high-flow events, besides peak height and timing, also volume is a frequently evaluated characteristic (Lobligeois et al., 2014), while for low-flow events duration and volume deficit 465 are other frequently applied characteristics (Tallaksen et al., 1997). Our approach, being a combination of long-term meteorological simulations and a modular modelling framework, can easily be extended to these characteristics.
Model selection is a crucial step in hydrological modelling. Different hydrological models might lead to substantially different outcomes (Melsen et al., 2018). When hydrologists are familiar with a certain model, they tend to stick to this model, 470 even though other models might be more adequate for a specific objective (Addor and Melsen, 2019). Model intercomparison studies can provide guidance for model selection and improve model adequacy in the future. This study evaluates the impact of alterations in model structures on extreme runoff events. Some alterations in the model structure lead to significant impacts in the simulation. For example, in the tropical climate zone, the formulation of the percolation process is important. This in-formation can be regarded in model selection of future studies, which will result in more adequate model selection. It should, 475 however, be noted that the framework employed in this study (FUSE) is only representative for a particular suit of bucket-based models. Whereas these models are suitable for long term simulations due to their low data demand and high computational efficiency, results might look different when a more process-based framework, such as SUMMA (Clark et al., 2015a, b), would have been employed.

Societal impact
This study evaluated the translation of meteorology to hydrological extreme impact events. Return periods were used to sort runoff events based on their extremeness, as return periods are frequently used in policy design (Marco, 1994;Read and Vogel, 2015). However, this study does not translate hydrological impact events to the societal impact, which implies that fatalities and economic losses are not examined. This relationship might be affected by non-linear effects, similar to the meteorology-485 hydrology relationship (Van der Wiel et al., 2020). Therefore, a direction for future research is to link societal impact to return periods of extreme runoff events. The accurate assessment of vulnerability and societal impact requires information related to exposure and sensitivity (Cardona et al., 2012).

490
Hydrological extremes are natural hazards that affect a large number of people on a global scale. Several hydrological models were employed to simulate these extremes, with the aim to investigate the impact of hydrological model structure on the simulation of extreme runoff events. The combination of two state-of-the-art approaches, the hydrological modular modelling framework FUSE and large ensemble meteorological simulations to study extreme events, provided insights into uncertainties of the simulations. Parameters of the hydrological models were sampled in a synthetic experiment, which enabled the exami-495 nation of the impact of different hydrological process formulations on the magnitude and timing of extreme high-and low-flow events, independent of calibration.
The impact of hydrological process formulations on magnitude and timing of extreme runoff events varies among different climate zones (Figure 8). In the arid climate zone, the magnitude and timing of the extreme high-flow events are not affected 500 by changing process formulations or parameter sets. The magnitudes of the low-flow events are significantly affected by alterations in the architecture of the upper and lower layer. In the cold and temperate climate zones, we found a larger spread in the simulations of the extreme runoff events. Multiple hydrological processes significantly affect the magnitude of the high-and low-flow events, which implies that the model structure is an important source of uncertainty. Therefore, it is essential to select an adequate hydrological model when simulating extreme events in cold and temperate climate zones. Besides that, there is a 505 spread in the timing of high-flow events, caused by different parameter sets in these climate zones. The magnitudes of the high- The timing of these events is hardly affected by hydrological model structure or parameter sets, which implies that the timing of these events is dictated by the meteorological forcing. The timing of low-flow events is not evaluated in this study, as many simulations resulted in zero runoff for extended periods.

510
The results revealed a spread in the simulation of extreme runoff events as a consequence of different hydrological model structures. The impact of different model structures is larger for the simulation of low-flow events compared to high-flow events. For the low-flow events, hard-coded lower limits were found, implemented for numerical stability. This revealed the numerical challenge that comes with simulating extremely low values. In this study, we interpreted these hard-coded lower 515 limits as zero runoff. The extreme events were assessed at different return periods. However, no clear relationship was found between the model structural uncertainty in the magnitude of extreme runoff events and the return period length.
Insights provided by this study contribute to a better understanding of the importance of the hydrological model formulation of specific processes in different climate zones. These insights can be used in future studies, which will result in more adequate 520 model selection leading to improved understanding and more reliable predictions of extreme runoff events.