Droughts in Germany: performance of regional climate models in reproducing observed characteristics

. Droughts are among the most relevant natural disasters related to climate change. We evaluated different regional climate model outputs and their ability to reproduce observed drought indices in Germany and its near surroundings between 1980–2009. Both outputs of an ensemble of six EURO-CORDEX models of 12.5 km grid resolution and outputs from a high-resolution (5 km) Weather Research and Forecasting (WRF) run were employed. The latter model was especially tailored for the study region regarding the physics conﬁguration. We investigated drought-related variables and derived the 3-month standardized precipitation evapotranspiration index (SPEI-3) to account for meteorological droughts. Based on that, we analyzed correlations, the 2003 event, trends and drought characteristics (fre-quency, duration and severity) and compared the results to E-OBS. Methods used include Taylor diagrams, the Mann– Kendall trend test and the spatial efﬁciency (SPAEF) metric to account for spatial agreement of patterns. Averaged over the domain, meteorological droughts were found to


Introduction
In the recent past, Germany and other parts of central Europe have been hit by dryness in the summer periods. Especially the severe drought events in 2015 (e.g., Hoy et al., 2017;Ionita et al., 2017;Laaha et al., 2017), 2018 (e.g., Bastos et al., 2020;Thompson et al., 2020) and 2019 (e.g., Boergens et al., 2020;Hari et al., 2020;Ziernicka-Wojtaszek, 2021), which occurred in combination with heat waves, have contributed to this. In addition, 2020 was also categorized as too dry, mainly in the spring and summer months (DWD, 2020;Umweltbundesamt, 2021). These events have contributed to increased awareness of climate extreme events in the affected regions.
There are studies that suggest an increasing trend (e.g., Dai, 2011Dai, , 2013Sheffield et al., 2012;Trnka et al., 2016), a decreasing trend (e.g., Spinoni et al., 2014) and no trend (e.g., Spinoni et al., 2019;Oikonomou et al., 2020;Vicente-Serrano et al., 2021) for droughts for the past decades in the central European region. The discrepancies in the findings are due to the complex characteristics and several different ways of defining (Mishra and Singh, 2010;D. Petrovic et al.: Droughts in Germany Lloyd-Hughes, 2014;Crausbay et al., 2017) and quantifying (Wilhite and Pulwarty, 2007;Vicente-Serrano, 2016) a drought event. Moreover, different analysis periods (Hannaford et al., 2013) and a broad range of usable meteorological variables (Vicente-Serrano et al., 2021) lead to uncertainty in drought trends. Economically, however, there has been a clear increase in the costs caused by drought events in the past in the EU (EEA, 2010).
In this study, we conduct a drought analysis for the time period 1980-2009 in Germany and its near surroundings by employing an ensemble of regional climate models (RCMs). We are constrained to that time period because of the data availability in the RCM runs.
For Europe, the availability and reliability of RCM simulations have evolved rapidly in the last few years (Štepánek et al., 2016). Concerted downscaling projects and initiatives like PRUDENCE (Christensen and Christensen, 2007), EN-SEMBLES (van der Linden and Mitchell, 2009) and most recent CORDEX ) have contributed to this development. Several studies using drought-related data from CORDEX outputs have been conducted in the past for different parts of the world, the majority with focus on future development of drought under climate change and some with focus on past events. For the EURO-CORDEX domain, there have been studies dealing with the evaluation of the EURO-CORDEX RCM's capability in historical drought reproduction in Italy (Peres et al., 2020); the comparison and evaluation of drought indices in Poland (Meresa et al., 2016); and the future development of drought conditions under different scenarios for the Czech Republic (Štepánek et al., 2016;Potopová et al., 2018), Romania (Dascȃlu et al., 2016), Poland (Meresa et al., 2016) and all of Europe (Spinoni et al., 2018). Regarding the rest of the globe, studies have been carried out focusing on the evaluation of the CORDEX RCM's ability in simulating historical droughts and their characteristics over West Africa (Diasso and Abiodun, 2017), East Asia (Um et al., 2017) and Bangladesh (Chowdhury and Jahan, 2018). Furthermore, there have been analyses of climate change impacts on droughts and their characteristics in the future for the Mediterranean region (Marcos-Garcia et al., 2017), India (Das and Umamahesh, 2018), Iran (Senatore et al., 2019) and Vietnam (Nguyen-Ngoc-Bich et al., 2021) as well as for the entire globe (Spinoni et al., 2020). In these studies, different drought indices have been used to identify droughts and describe their characteristics. Among the most common ones are the standardized precipitation index (SPI) and standardized precipitation evapotranspiration index (SPEI) (Meresa et al., 2016;Diasso and Abiodun, 2017;Marcos-Garcia et al., 2017;Um et al., 2017;Das and Umamahesh, 2018;Potopová et al., 2018;Spinoni et al., 2018Spinoni et al., , 2020, the Palmer drought severity index (PDSI) (Dascȃlu et al., 2016;Chowdhury and Jahan, 2018;Nguyen-Ngoc-Bich et al., 2021), and the self-calibrated PDSI (scPDSI) (Senatore et al., 2019). Additionally, some self-developed or less commonly used indices have been applied: the standardized runoff index (SRI) (Meresa et al., 2016), the standardized flow index (SFI) (Marcos-Garcia et al., 2017) and the reconnaissance drought indicator (RDI) (Spinoni et al., 2018).
So far, according to our knowledge, there has been no study that presents an evaluation of the capability of EURO-CORDEX RCMs to reproduce droughts and their characteristics with focus over Germany, which we therefore would like to address in this study. There are a large number of studies dealing with the performance of RCMs in terms of the correct reproduction of meteorological variables. Emphasis is often on temperature and precipitation, and effects of different model resolutions and physics parameterizations are investigated. There are different findings concerning the effects of increased model resolution on precipitation, the most important variable for droughts. They strongly depend on the season, precipitation amount and region. Regarding extreme events and summer precipitation, especially in complex terrain, higher model resolution usually seems to be beneficial (e.g., Rauscher et al., 2010;Tripathi and Dominguez, 2013;Lee and Hong, 2014;Olsson et al., 2015;Torma et al., 2015;Prein et al., 2016;Rauscher et al., 2016;Dieng et al., 2017;Vichot-Llano et al., 2021). In terms of winter precipitation and annual mean patterns, there are often no distinct differences between coarse and fine resolutions (e.g., Rauscher et al., 2010;Tripathi and Dominguez, 2013;Kotlarski et al., 2014;Casanueva et al., 2016;Dieng et al., 2017;Vichot-Llano et al., 2021). Compared to precipitation, there are fewer studies examining the effects of increased model resolution on simulated air temperature, the second most important variable for droughts. Vautard et al. (2013) employed an ERA-Interim-driven EURO-CORDEX ensemble of 12.5 and 50 km resolution for heat wave analysis over Europe between 1989 and 2008. Increased resolution was shown to induce 90th-percentile temperature warming and cooling for some models. It also led to reduced biases in heat wave reproduction. Zeng et al. (2016) and Vichot-Llano et al. (2021) found that temperature fields are better reproduced with higher resolution, while Di Luca et al. (2013) concluded a low potential for added value of increased resolution. They saw the highest added value mostly in regions with important surface forcing like complex topography or land-water contrasts.
Every model simulation requires a suitable setup regarding the domain configuration and physical parameterizations for the selected target region (e.g., Stoelinga et al., 2003;Kumar et al., 2010). To find appropriate settings, usually the skill of different parameterizations for temperature and precipitation is evaluated with respect to observations. Vautard et al. (2013) also analyzed possible sources of model spread. The simulation of hot temperatures was shown to be primarily sensitive to convection and microphysics schemes, which has effects on the incoming energy and the Bowen ratio. They further found that a large part of the model spread can be attributed to parameterizations and that parameterizations can have different impacts depending on the spatial resolution. Mooney et al. (2013) tested the effects of 12 combinations of physical parameterizations in the Weather Research and Forecasting (WRF) model over Europe on surface temperature, precipitation and mean sea level pressure. They utilized two longwave radiation schemes, two land surface models (LSMs), two microphysics schemes and two planetary boundary layer (PBL) schemes. They found that temperature shows the greatest sensitivity to the LSMs, some sensitivity to the radiation schemes in winter, and little sensitivity to the microphysics and PBL schemes. Precipitation showed sensitivity to the LSM especially in summer. This is also valid for the radiation and the microphysics schemes but to a lesser extent. There was only negligible sensitivity to the PBL schemes. They concluded there was a strong dependence on the region and season of the optimal parameterization combination. Kotlarski et al. (2014) emphasize the high importance of model configurations by describing that, in the case of temperature, the "bias spread across different configurations of one individual model can be of a similar magnitude as the spread across different models".
In this study, we accordingly investigate the effects of increased model resolution and model settings on the reproduction of a drought index and thereby fill another research gap. For this reason, we analyze a variety of RCM simulations, i.e., a 5 km three-domain WRF run and an ensemble of six EURO-CORDEX realizations at 12.5 km horizontal resolution. Ideally, computational resources could be saved if RCMs with coarser grids were able to yield equivalent performance to their better-resolved counterparts. The WRF model setup was thoroughly determined for Germany. The physics combinations were chosen so that the combined biases of air temperature and precipitation are as small as possible (Wagner and Kunstmann, 2016;Warscher et al., 2019), while the configurations of the EURO-CORDEX RCMs were set up for the entire EUR-11 domain of CORDEX . Since the WRF run was concerted at the study region and has a higher resolution, one may expect better performance regarding the reproduction of air temperature and precipitation and thus likely also of drought indices compared to the EURO-CORDEX runs. To attribute possible better WRF performances to resolution or setting effects, we are able to use the second domain of the WRF run, which has 15 km resolution; hence it is slightly coarser than the EURO-CORDEX simulations. Thus, the main objectives of the study are as follows: 1. The first objective is to evaluate the performance of regional climate models in reproducing the SPEI drought index and related drought characteristics employing a six-member EURO-CORDEX ensemble and a highresolution WRF run. The EURO-CORDEX RCMs and WRF differ in resolution (12.5 km vs. 5 km), while the model physics configurations differ among every single RCM.
2. The second objective is to gain insights into the meteorological drought course for Germany and its near surroundings between 1980-2009. Therefore, the results are evaluated and compared to observations. Specifically, we analyze precipitation and temperature reproduction, SPEI correlations and trends, related drought characteristics, and additionally the drought event in summer 2003. The characteristics include frequency, duration and severity and are based on SPEI time series.

EURO-CORDEX model simulation data
We employed an ensemble of six EURO-CORDEX RCM simulations. The experiments were performed with 0.11 • (≈ 12.5 km) horizontal grid resolution, covering the EUR-11 CORDEX domain. Data from the following RCMs were used: COSMO-CLM, ALADIN 6.3 (hereafter referred to as ALADIN in the text), REMO2015 (REMO), RegCM 4.6 (RegCM), RACMO 2.2e (RACMO) and RCA4 (see Table 1 for more information). At the time of selection, these were all available model runs that cover the study period 1980-2009 and contain the relevant meteorological variables needed for the analysis. All runs obtained their boundary conditions from the global ERA-Interim reanalysis (Dee et al., 2011).

WRF simulation data
Moreover, we incorporated simulation results from Warscher et al. (2019), who conducted simulations with the WRF model (Skamarock et al., 2008). These WRF simulation results are based on a comprehensive search and final identification of optimal model physics and parameterization configuration (Wagner and Kunstmann, 2016). They applied a three-domain nested approach with a parent-grid ratio of 1 : 3 and a horizontal grid resolution of 5 km for the innermost domain, which frames Germany and its near surroundings. The data used in this study are from their ERA-Interimforced reanalysis run and cover the period 1980-2009. Table 2 gives an overview of the physics schemes used in this run as well as in the EURO-CORDEX runs and further information. For more details regarding the model setup, see Wagner and Kunstmann (2016) and Warscher et al. (2019). As mentioned above, we also used the data from the second domain, which are of 15 km grid spacing. For this reason, we will refer to WRF@5km and WRF@15km from here on to distinguish between the two domains.

Observation data
As reference we used the gridded observational data set from E-OBS (Haylock et al., 2008), version 23.1e, with 0.1 • (≈ 11.1 km) horizontal grid resolution. The data con- tain daily values of the relevant meteorological variables and cover the entire European land area. We focused on Germany and its near surroundings as the study region from 47 to 55 • N and 6 to 15 • E. The WRF and E-OBS data sets were regridded using bilinear interpolation to adjust them to the horizontal grid resolution of the EURO-CORDEX RCMs.

Analysis of precipitation and temperature reproduction
Precipitation and temperature are the main meteorological variables determining droughts. Thus, prior to the drought index calculation, we analyzed these variables in every RCM and compared them to the reference using Taylor diagrams (Taylor, 2001). They provide a concise visual statistical summary regarding the agreement between patterns in terms of their correlation, their root-mean-square difference, and the ratio of their variances or standard deviations (Taylor, 2001). Moreover, we calculated the bias values from the spatially and temporally averaged monthly values of the three variables.

Drought index: SPEI
There are a variety of drought indices for analyzing different drought characteristics. For the proper selection of a drought index, its main features like the calculation procedure, input variables, advantages and weaknesses need to be considered (García-Valdecasas Ojeda et al., 2017). The standardized precipitation index (SPI), developed by McKee et al. (1993), is one of the most widely used drought indices and recommended by the World Meteorological Organization (WMO) because of its simplicity, robustness, easy interpretation and especially its multiscalar character. It is comparable across different regions and climates and therefore very suitable for drought detection around the globe (García-Valdecasas Ojeda et al., 2017). Since precipitation is the only input variable for the calculation, a high variability is assumed, while other variables like temperature, surface wind and potential evapotranspiration (PET) are considered temporally stationary. Thus, the SPI does not define droughts based on the water balance (Diasso and Abiodun, 2017). The standardized precipitation evapotranspiration index (SPEI), introduced by Vicente-Serrano et al. (2010), overcomes this issue. Because of its dependence on the water balance (precipitation − PET), it incorporates the effects of hot temperatures. That is why it is considered very useful in terms of global warming (Diasso and Abiodun, 2017; Spinoni et al., 2018). In this context it should be emphasized that the SPEI (as well as SPI) has limitations regarding its practical relevance for climate change, when the focus is primarily on impacts. Apart from an implied lack of soil moisture (agricultural drought) and decline in streamflow, groundwater, reservoir and lake levels (hydrological drought) (Wilhite and Glantz, 1985), which completely rely on the degree of the dry anomaly over a certain time period, impacts going beyond this are not addressed. Due to the complete reliance Table 2. Lopez (2002) Siebesma et al. (2007) Le Moigne (2012) REMO2015 27 Ritter and Geleyn (1992) Tiedtke (1989) Lohmann and Roeckner (1996) Louis (1979) Hagemann (2002), Rechid et al. (2009) RegCM 4.6 23 Kiehl et al. (1996) Tiedtke (1989) Pal et al. (2000) Grenier and Bretherton (2021)  We also decided to use the SPEI for this study. In general, the patterns between the SPI and SPEI are usually similar, and we want to take account of the temperature effect, since droughts in the study region predominantly occur in the summer months.
Similarly to studies like Diasso and Abiodun (2017), García-Valdecasas Ojeda et al. (2017), and Potopová et al. (2018), the SPEI R package (Beguería and Vicente-Serrano, 2013) was used for the index calculation. As mentioned above, the SPEI needs PET as an input variable additionally to precipitation. PET was calculated based on the modified Hargreaves equation (Droogers and Allen, 2002). The method corrects the PET calculated by the Hargreaves equation by using the monthly rainfall amount as a proxy for insolation and applying the hypothesis that this amount can change the humidity levels (Vicente-Serrano et al., 2014). By using this method, the PET values are similar to those obtained from the Penman-Monteith method (Allen et al., 2006). The Penman-Monteith method is adopted and recommended by the Food and Agriculture Organization to approximate PET (García-Valdecasas Ojeda et al., 2017), but the variables required for this method are only included in a limited number of CORDEX simulations. The modified Hargreaves method only requires the maximum and minimum temperatures, so it is applicable to all data sets used in this study.
For the SPEI calculation, the monthly values of the water balance are used. The obtained time series are fitted to a loglogistic distribution. Then the quantiles of the distributions are transformed into standard normal variables. This ensures comparability of the index values across different regions. Negative values indicate drier-than-median and positive values wetter-than-median conditions (Meresa et al., 2016). To categorize droughts, we follow the most popular classification scheme of McKee et al. (1993) (Table 3).
Different aggregation scales for the SPI/SPEI calculation are usually used to define the type of drought. Short timescales of up to 3 months are used for meteorological droughts, medium scales of around 6 months for agricultural 3880 D. Petrovic et al.: Droughts in Germany droughts, and longer timescales of 12 months or more for hydrological droughts (Wilhite and Glantz, 1985;Heim, 2002;Spinoni et al., 2020). We selected the 3-month aggregation scale to focus on meteorological droughts. For this reason we will refer from here on to SPEI-3. SPEI-3 time series were computed for each EURO-CORDEX simulation, the WRF output and the E-OBS reference data set for every grid cell. Correlation analysis between the RCM and reference time series has been conducted as well as a comparison of the index values for the drought event in 2003.
Several metrics are available to assess the spatial agreement between patterns of single RCMs and the reference.
Here we used the spatial efficiency (SPAEF) metric . The SPAEF is a multiplecomponent performance metric developed for the comparison of spatial patterns. While it was originally intended for hydrological studies,  state that it is suitable and beneficial for other modeling disciplines too. The SPAEF is calculated as with the following three components: α as the Pearson correlation coefficient between observed (obs) and simulated (sim) patterns; as the fraction of the coefficient of variation representing spatial variability; and as the overlap between the histograms of the observed (K) and simulated patterns (L), both containing the same number n of bins. For the calculation of γ , the z score of the patterns is used. This enables comparison of two variables with different units. For both histograms of K and L, the number of values in each bin i is counted. Then for each bin the lower (minimum) number of K i or L i is picked, which indicates the number of shared values in the same bin. Afterwards these numbers are summed up and divided by the total number n of values in K or L. The SPAEF has a predefined range between −∞ and 1, where 1 corresponds to the ideal agreement between two patterns. The three components are independent of each other and usually equally weighted, so they complement each other in a useful way and provide holistic pattern information. In this way, global characteristics like distribution and variability instead of exact values at the grid scale are assessed . For more information regarding the SPAEF, see Demirel et al. (2018) and . The metric was used to evaluate spatial agreement for the 2003 event and for the drought characteristics.

Drought trend analysis
To investigate the temporal characteristics of droughts, we used the non-parametric Mann-Kendall trend test approach (Mann, 1945;Kendall, 1975) to detect significant monotonic trends in the index time series at a significance level of 0.05. This approach is based on the correlation between the ranks of a time series and their time order and is commonly used in time series of environmental, climatological or hydrological data (Hamed, 2008;Alhaji et al., 2018). We only considered independent, non-overlapping data. For a time series x 1 , x 2 , x 3 , . . ., x n , the Mann-Kendall test statistic S is given by with where sign represents an indicator function, n the number of data points, and R i and R j their respective ranks. A positive S statistic indicates an increasing trend; a negative one indicates a decreasing trend.

Analysis of drought characteristics
Drought events and their characteristics have been defined in several ways in the past (Um et al., 2017). Using the SPEI time series values on the grid point scale, we detected drought events and their characteristics (frequency, duration and severity) by applying the run theory proposed by Yevjevich (1967), which has been widely employed in droughtrelated studies (e.g., Spinoni et al., 2014;Marcos-Garcia et al., 2017;Peres et al., 2020;Spinoni et al., 2020). A drought event starts when the SPEI value falls below −1 for at least 2 consecutive months. The event ends when the index value returns to positive values. Drought frequency then describes the number of drought events in a given time period. Drought duration corresponds to the number of months between the start and end of an event (last month not included). The drought severity of an event equals the sum, in absolute values, of all the monthly SPEI values during the event (Spinoni et al., 2020). We determined the drought frequency for every grid cell for the whole study period 1980-2009. Drought frequencies between the single grid cells differ, and since drought duration and severity refer to every single drought event, we calculated the mean values for duration and severity for every grid cell to enable a comparison between the single data sets.
4 Results and discussion 4.1 Precipitation and temperature Figure 1 presents the Taylor diagrams of the grid-cell-based monthly values of precipitation and maximum (T max ) and minimum (T min ) temperature. We also added the information of the WRF@15km data set to check if potential WRF benefits are related to increased resolution or model settings. Regarding precipitation, the WRF@5km run has the highest correlation with the reference: it is the only one crossing the 0.75 threshold with a relatively small RMSE score, resulting in the best overall performance compared to the other RCMs. However, the lowest RMSE is found for RACMO, which also holds for the standard deviation, while WRF@5km, ALADIN and RCA4 deviate most. Interestingly, the WRF@15km run has the lowest correlation and highest RMSE values, while its standard deviation is among the closest to the observational one. This means that the increased resolution of WRF@5km leads to improvements in correlation and RMSE scores, but the temporal variability is better captured in the coarser resolution. The T max Taylor diagram clearly shows a benefit of both WRF runs, so we conclude the model setup is the determining factor for the better performance compared to the EURO-CORDEX RCMs. This is underlined by the fact that the WRF@15km run has a higher correlation, has a lower RMSE and matches the reference standard deviation compared to its 5 km counterpart. Only the two WRF runs reach correlation coefficients above 0.99. Here, all EURO-CORDEX RCMs perform on a similar level, which is high. They all reach correlation values above 0.95 and RMSEs below 5. RACMO stands out in this case because it has the most accurate standard deviation ratio with regard to the reference. In the T min Taylor diagram it is obvious that the 5 km WRF run performs best. It has the highest correlation value (above 0.98) and the lowest RMSE, and it is close to the reference standard deviation. Only the 15 km WRF run is closer in this regard. The 15 km WRF run and the EURO-CORDEX perform on a similar level. Similarly to T max , the main difference is the standard deviation ratio when compared to the reference. In this regard, RACMO has the biggest distance. For T min it seems the model setup of WRF leads to benefits compared to the other RCMs and that the increased resolution brings additional benefit.
From the Taylor diagrams we can conclude that especially T max and T min are very well captured by all RCMs. There are benefits of increased resolution for precipitation and for T min , while for T max mainly the model setup of the WRF runs is beneficial. The WRF@5km run performs relatively well for all three variables.
Regarding the resolution effect on the precipitation reproduction, our results are in accordance with findings from, e.g., Tripathi and Dominguez (2013) and Prein et al. (2016), who found that higher resolution leads to better reproduction. Our results are in contrast to findings from, e.g., Rauscher et al. (2010), Casanueva et al. (2016) and Dieng et al. (2017), who could not identify a benefit of increased resolution both for a general pattern and on an annual mean basis. It must be noted though, that in all the studies mentioned, the differences between the two resolutions analyzed were much bigger than in our case, whereas both resolutions (12.5 and 5 km) are usually already considered high resolution in the literature. In the studies mentioned, there is always a resolution of 50 km compared to 25 km (Rauscher et al., 2010), 12.5 km (Casanueva et al., 2016;Prein et al., 2016;Dieng et al., 2017) and 10 km (Tripathi and Dominguez, 2013). From our results, we obtain that, if existent, the benefits of a resolution increase from 12.5 to 5 km are less distinct. One must also keep in mind that the studies were conducted in different regions, which certainly plays a role too, and that often different resolutions of the same RCM were compared. The results from this section further show that RCMs with reasonable performance in simulating one or both temperature variables do not necessarily reproduce precipitation equally well, which is in accordance to findings from Peres et al. (2020). They further found that COSMO-CLM and RACMO showed good performance in reproducing precipitation, while RCA4 and WRF struggled the most. Regarding mean temperature, COSMO-CLM and REMO showed the best performances and RCA4, ALADIN and RACMO the worst. This could in part be confirmed by the results here for the precipitation reproduction: COSMO-CLM and RACMO perform relatively well, while RCA4 showed a relatively poor performance. It must be noted that Peres et al. (2020) analyzed the mean temperature instead of T max and T min and that they looked at different temporal scales. Moreover, they employed EURO-CORDEX RCMs with different general circulation models (GCMs) as forcing, while here all RCMs had the same ERA-Interim forcing. Table 4 shows the bias values from the spatially and temporally averaged monthly values of the three variables compared to E-OBS. The highest spread among the models is found for the precipitation, which was expectable due to the higher variability in this variable. COSMO-CLM is the only RCM with a dry bias and also holds the lowest mean bias value (−1.5 mm), while RCA4 has by far the highest bias value (32 mm). The WRF@5km bias value (16.1 mm) is almost twice as high as that of its 15 km counterpart (8.3 mm). For T max the highest mean bias value is held by RCA4 (−1.8 • C) and the lowest by REMO (0.2 • C). AL-ADIN, REMO and RegCM show a warm bias, while the other RCMs have a cold bias. Regarding T min , RACMO is the RCM with the highest mean bias value (−1.7 • C) and WRF@5km the one with the lowest (−0.2 • C). COSMO-CLM, ALADIN, REMO and RegCM show warm bias and the other RCMs a cold bias.   with values ranging between 0.8 and 1. It is clearly visible that the WRF@15km run outperforms the WRF@5km run and thereby has the overall best performance. Our findings indicate that the WRF benefits can be attributed to the WRF model settings and not to the increased resolution. The higher agreement of the WRF@15km run with E-OBS may be due to the relatively coarse resolution of E-OBS (12.5 km) compared to the 5 km of the innermost WRF domain. For certain aspects, the structures from E-OBS may be better represented in resolutions closer to it with otherwise the same settings.

Drought event August 2003
In the following the SPEI-3 scores for the drought event in August 2003, one of the major drought events in central Europe in the last few decades (e.g., Fink et al., 2004;Rebetez et al., 2006;Ionita et al., 2021), are analyzed. Because of the results in the previous section, here we focus on the values from the two WRF runs in direct comparison to the reference values from E-OBS (Fig. 3). Relevant scores of the other RCM runs are given in Table 5. The E-OBS spatial pattern reveals that especially the southern half of the domain was mostly under extreme (SPEI ≤ −2; see Table 3) drought conditions, while in the northern half moderate to severe (−1 to −2) drought conditions were predominant. This pattern is not well reproduced by WRF@5km, which is also reflected by the low SPAEF value (0.21) in Table 5. Its domain is predominated by values between −1 and −2, so the biggest accordance with E-OBS can be found in the northern half. Some areas of the domain range between 0 and −1, indicating mild drought conditions. There are some spots with extreme drought values as well, though these do not match with E-OBS values. The WRF@15km domain shows more similarity with the E-OBS domain re-  garding the values, but the spatial distribution is different. This is underlined by the close SPEI-3 domain mean value (−1.81 compared to −1.90 of E-OBS), the almost exact areaunder-drought (AUD) value (81.5 % compared to 81.7 %) and the lowest mean bias value (−0.08 SPEI units) but the low SPAEF value (0.10) in Table 5. In all three domains the entire area is covered nearly only with negative values, which underlines the distinct drought conditions of that period. The mean SPEI-3 values in Table 5, which are all negative, further confirm this. It is striking that E-OBS holds the lowest mean value (−1.90), which corresponds to severe drought conditions. The highest mean value is held by RACMO (−0.97), corresponding to mild drought conditions. This highlights the big differences among the RCMs. The AUD is defined as the percentage of grid cells with values of ≤ −1 in relation to the total number of grid cells. Here we see distinct differences between the single RCMs and the reference. While E-OBS, RCA4 and WRF@15km have AUD values of more than 80 %, these values are even below 50 % in REMO and RACMO. These two RCMs also hold the two highest mean bias values (−1.03 and −0.93 SPEI units). All mean bias values are negative, which is a further indication of the drought underestimation of the RCMs. The SPAEF values are either negative or very low. The only exception is ALADIN with It can be concluded that there are distinct differences between the single RCM performances regarding the reproduction of single drought events. None of the RCMs was able to satisfactorily reproduce the spatial patterns of the reference. Also, the correct representation of the mean drought index values and the AUD values turned out to be difficult in most cases. Thus, the results confirm findings from Um et al. (2017), who found that the spatial extents of droughts diverge among the RCMs and that the RCMs are not able to accurately capture drought events with large spatial scales. Since WRF@5km did not perform the best in any of the categories in this case, there does not seem to be any benefit of increased model resolution and model settings in this regard in our results. In fact, it is evident that the WRF@15km run performs better for all scores except the SPAEF value (Table 5), which indicates the higher relevance of the model settings in this respect. This shows that, in some aspects, a lower resolution can also lead to better agreement with the reference compared to the higher resolution of the same model run. It is striking that only WRF@5km is able to reproduce negative-trend signals which are also existent in the reference and indicate a drying trend. None of the EURO-CORDEX RCMs is able to reproduce this. In the WRF@5km domain, the locations of the negative trends are even locally represented accurately, concentrated mainly in the southwestern Most of the domain area of each RCM and the reference shows no trend (Table 6). If there is a trend in the EURO-CORDEX RCMs, it is always positive, indicating increasing SPEI-3 values and thus wetter conditions. This is the case for ALADIN, REMO, RegCM and RACMO. COSMO-CLM and RCA4 show almost entirely white domain areas. Interestingly, the E-OBS domain has only small parts of positive-trend areas, concentrated in the southeastern corner and partly in northeastern parts. There is only slight agreement in ALADIN, RegCM and RACMO in this respect. The WRF@5km domain shows no positive-trend grid cells at all (Table 6).

SPEI trend analysis
To answer the question of whether the agreement of WRF and E-OBS regarding the negative-trend areas is due to the increased resolution or to the model settings, we also applied the Mann-Kendall trend test to the WRF@15km run (Fig. 5). There is clear indication that the reproduction is not primarily linked with the increased resolution, since the negative trends are represented here too. Compared to the WRF@5km run, the negative-trend areas are much more spacious. This is also reflected in Table 6: more than one-third of the domain (34.2 %) is covered by negative index values, which is  more than double compared to E-OBS (16.9 %) and more than 3 times compared to WRF@5km (10.8 %), underlining the big overestimation of negative-trend areas. There are no positive-trend values in the WRF@15km domain either.
From this section it is concluded that there are clear benefits of the WRF runs in the appropriate trend reproduction. As seen, these benefits are not primarily due to increased resolution but to the model settings, highlighting the high importance of model configurations tailored to the target region for our case. However, the increased resolution brings further benefits and leads to higher agreement with the reference. The EURO-CORDEX RCMs completely fail in this aspect. Nasrollahi et al. (2015) applied the Mann-Kendall trend test to the outputs of 41 CMIP5 models to evaluate their ability to replicate observed drought trends on the global scale between 1901-2005. They used the SPI-6 as drought index (and SPI-3 in the supporting material). Their results revealed that about 75 % of the models reproduce the global drying trend, but most models fail to reproduce regional wetting and drying trends (at most about 40 % with agreement). In most locations, less than 10 % of the models showed agreement with the observations. Greater agreement was found in higher latitudes. Um et al. (2017) also performed the Mann-Kendall trend test on grid-cell-based SPEI-12 time series from outputs of four (HadGEM3-RA, MM5, RegCM4 and RSM) RCMs from CORDEX East Asia and of their ensemble mean for the time period 1980-2005 over East Asia. They found distinct differences among the single model outputs regarding their capability to reproduce observed drying and wetting trends. While HadGEM3-RA and MM5 generally captured the proper trends, RegCM4 and RSM were only partially successful. This is why the ensemble mean showed relatively poor performance compared to the two former RCMs. These results highlight the spread in the models' capability in reproducing observed trends of wetting and drying, which is found in this study as well.  Figure 6 presents the E-OBS drought frequency pattern for the time period 1980-2009 based on SPEI-3 along with the grid-cell-based differences between each RCM and E-OBS. Table 7 gives more detailed information including the scores from the WRF@15km domain. The drought frequency gives the number of drought events in the given time period for each grid cell. The meteorological drought frequency pattern in E-OBS shows that every single grid cell experienced at least eight drought events within the 30-year time span. The mean value for the whole domain is 15.5 (Table 7). The highest number of droughts occurred in the northeastern part with some grid cells reaching values of up to 24. This may appear relatively high at first. It needs to be kept in mind that events with an SPEI-3 value equal to or below −1 are already considered droughts (see Sect. 3.4), meaning that even moderate droughts (see Table 3) are taken into account. This does not necessarily imply drought events are severe or extreme. Due to the definition of the SPEI, this can also imply just a drier-than-normal period which is then considered a drought event. This can also happen in any season other than summer. Generally, the eastern half of the domain has higher values and towards the southwest the number of drought events decreases. The RCM difference patterns differ among each other. Relatively high positive bias values (between 3 and 12) are often found in the northern and northeastern parts of the domain, especially in COSMO-CLM, RegCM and RCA4. The southern half of the domain is rather predominated by negative bias values in all RCMs. There is a similarity between the patterns of ALADIN and WRF@5km. All in all, bias values of ±9 are rare in all RCMs; the major part of the RCM domains rather ranges between ±6. For the drought characteristics we used the mean absolute error (MAE) as a measure for the domain mean bias (third column in Table 7), since values with opposite signs can balance each other out, thus making the information less meaningful. AL- From this section it is concluded that there is no benefit of WRF's increased resolution and model setup regarding the reproduction of the drought frequency, since neither of the two WRF domains shows apparent benefits. The 5 km run's domain mean value is a little closer to the reference's one, and its SPAEF value (−0.14) is clearly higher compared to its 15 km counterpart (−0.29), while there is no big difference in the mean bias values. In fact, all the RCMs performed on a similar level. Furthermore, the mean conditions of the drought frequencies are sufficiently well reproduced. Focus should therefore be put on the information retrievable from the mean conditions and not on spatial accuracy. Figure 7 shows the SPEI-3-based mean drought duration pattern for the period 1980-2009 from E-OBS and the grid-cellbased differences with the RCMs. Relevant scores, including for the WRF@15km run, are given in Table 8.

Mean drought duration
The E-OBS mean meteorological drought duration pattern is quite uniform: almost the entire domain is covered by values ranging between 2.5 and 3.5 months. The domain mean value (3.1 months) in Table 8 underlines this. The vast   It is concluded that WRF has no real benefit due to increased resolution or model setup. The benefit is perhaps somewhat present regarding the spatial agreement with the reference, but although the SPAEF achieved by the WRF runs is distinctly higher than that from the EURO-CORDEX RCMs, it is still not reliable. Nevertheless, as for the drought frequencies in the section above, all RCMs provide a satisfying reproduction of the mean conditions. Here, there is also a lack of spatial accuracy, but this deficiency is less pronounced. Figure 8 displays the E-OBS SPEI-3-based mean drought severity pattern for the time period 1980-2009 and the gridcell-based differences with the RCMs. Relevant scores, including for the WRF@15km run, are given in Table 9.

Mean drought severity
The E-OBS domain shows a pretty uniform pattern with the majority of the values ranging between 0.4 and 0.5 SPEI units. The domain mean value (0.47 SPEI units) in Table 9 confirms this. This value further implies that, if all droughts beginning from an SPEI value of −1 are considered, the mean severity is −1.47 SPEI units. This means that the mean drought severity can still be classified as moderate according to Table 3, but it is very close to the severe threshold. In general, all RCMs show overall low bias values, which is also displayed in the mean bias values in Table 9: the maximum mean bias value is 0.07 SPEI units and is held by RegCM. Especially RACMO and WRF@5km show domains with only a few dark-color-shaded areas, which is also reflected in the lowest mean bias values (0.04 SPEI units). Considering all RCM domains, it is not possible to determine areas of preferably positive or negative bias values as the same areas have different signs in different RCMs. Nor is it possible to determine regions of preferably high bias values across all RCMs. The domain mean severity values are very close to each other, all around 0.5±0.04 SPEI units with a range of 0.07 SPEI units between the maximum (RegCM) and minimum (REMO and WRF@5km). Regarding the spatial agreement between E-OBS and the single RCMs (not displayed here), there are again overall low SPAEF values, pointing towards a low level of agreement. WRF@5km holds the highest value (0.14), and WRF@15km is the only other one exceeding the 0.1 threshold. COSMO-CLM holds by far the lowest value (−0.15). The values of ALADIN and RCA4 are also negative.
Similarly to the two previous sections, it is concluded that the mean drought severity conditions are captured reasonably well by the RCMs in terms of domain mean values, while the spatial accuracy is overall not satisfying. Regarding the former, all RCMs perform on a similar level. This means that there is no benefit of WRF due to its increased resolution or model setup detectable in this regard here either. The results of the two WRF runs are very similar: the 5 km run performs slightly better regarding the mean bias and SPAEF. Peres et al. (2020) found that the RCMs with the best performance for precipitation mostly performed well regarding the reproduction of drought characteristics, too. This cannot really be confirmed here in our findings. As stated in Sect. 4.1, COSMO-CLM and RACMO perform overall especially well for precipitation. Regarding the drought characteristics, these two RCMs did not stand out overall. Only in some aspects were there marginal benefits. It must be noted that Peres et al. (2020) used another methodology regarding the definition and calculation of drought characteristics, since they worked with precipitation threshold values instead of drought indices.
From an overall perspective, it can be stated that no specific physics scheme of the RCMs (Table 2) considered on its own turned out to be superior to the others for the reproduction of the drought characteristics. Moreover, to corroborate our findings, we present additional results for the longer-timescale SPEI-6 and SPEI-12 indices in the Supple- ment (Figs. S1-S13 and Tables S1-S10) that lead us to the same conclusions as those found for SPEI-3.

Conclusions
A drought analysis for Germany and its near surroundings for the period 1980-2009 is conducted in this study. We address the influence of increased model resolution and appropriate model configuration on the reproduction of the SPEI drought index for the 3-month aggregation scale. For that purpose, an ensemble of six ERA-Interim-driven EURO-CORDEX RCMs of 12.5 km horizontal grid resolution and an ERA-Interim-driven high-resolution (5 km) WRF run, whose setup was tailored to the target area, are employed. The outputs are evaluated regarding their ability to reproduce precipitation, T max and T min as well as SPEI-3-based correlations and trends, the drought event in 2003, and overall drought characteristics (frequency, duration and severity). E-OBS data serve as reference.
WRF with its increased resolution and tailored model setup is shown to not be beneficial regarding the reproduction of the overall drought characteristics. In terms of reproducing the drought event in 2003, the model settings of WRF are determining for the highest agreement with the reference, since the 15 km run performs better than its 5 km counterpart. The event is not well captured by any of the other RCMs. As for the domain mean conditions of the overall characteristics, they are reasonably well reproduced in all cases. The spatial agreement with the reference, though, is not satisfactory for any RCM. This is especially the case for the drought frequencies. In general, despite the same forcing, the RCMs exhibit a large spread in their outputs. Meteorological droughts are found to occur approximately 16 times in the study period with an average duration of 3.1 months and average severity of 1.47 SPEI units. No specific physics scheme or configuration can be shown to be especially beneficial for the reproduction of the drought characteristics. Furthermore, there seems to be no correlation between the RCM bias values (Table 4) and the respective SPEI performances. These results suggest that, depending on the goal in drought analysis, a resolution of 12.5 km or even 15 km, as shown with the WRF@15km run, may be sufficient to reach similar findings to those obtained with higher resolutions. This can save computational resources. WRF's increased resolution and setup turn out to be beneficial in the analysis of the monthly values of the meteorological variables and the correlations of the SPEI time series. The latter can primarily be attributed to the model setup. However, the greatest benefit of WRF is found in the reproduction of the SPEI trends. It is the only RCM that captures the negative trends of the reference, while all EURO-CORDEX RCMs fail in this aspect. This is primarily due to the better model optimization for the area of interest compared to the larger-extent EURO-CORDEX runs, which highlights the importance of such tailored physics settings.
Higher resolution additionally leads to greater spatial accuracy. These findings can be of high relevance, since appropriate reproduction of drought index trends is an important feature of RCMs, especially in the context of climate change analysis. Furthermore, the results may guide the selection of suitable RCMs for certain aspects of drought analysis in Germany and similar regions in a historical context and also for future projections.
Author contributions. DP, BF and HK developed the methodology for the study. DP carried out the data analysis and drafted the manuscript, with support of BF and HK. HK provided grant funding and supervised the research.
Competing interests. The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Special issue statement. This article is part of the special issue "Past and future European atmospheric extreme events under climate change". It is not associated with a conference.
Acknowledgements. The authors gratefully acknowledge the work of the WRF modeling community, the European Centre for Medium-Range Weather Forecasts for the reanalysis data ERA-Interim, the contributors to the EURO-CORDEX projects used in this study, the ECA&D group for the E-OBS data set and Warscher et al. (2019) for providing the WRF simulation data. Great thanks also to Gerhard Smiatek for his support. Moreover, big thanks to the two anonymous reviewers for their valuable feedback and comments.
Financial support. This work is funded by the ClimXtreme project of the BMBF (German Federal Ministry of Education and Research) under grant "Förderkennzeichen 01LP1903J".
The article processing charges for this open-access publication were covered by the Karlsruhe Institute of Technology (KIT).
Review statement. This paper was edited by Jens Grieger and reviewed by two anonymous referees.