the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using high-resolution regional climate models to estimate return levels of daily extreme precipitation over Bavaria
Benjamin Poschlod
Download
- Final revised paper (published on 25 Nov 2021)
- Supplement to the final revised paper
- Preprint (discussion started on 29 Mar 2021)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on nhess-2021-66', Anonymous Referee #1, 14 Apr 2021
In the presented study, the author chose two different regional climate models (CNRM and WRF) in three different spatial resolutions (12km, 5km, and 1.5km) driven with two different reanalysis data sets (ERA-Interim and ERA5). The author pointed out the difficulty of a correct and spatial representative estimation of return levels from observational data sets due to their limitations and uncertainties. Using RCM data could fill this gap as return levels of precipitation are of great importance for stakeholders or the insurance industry, also with respect to possible changes in the future regarding climate adaptation. The author used different types of extreme value approaches and validated each by comparing model output with observations and theoretical quantiles which is adequate as each method has specific pros and cons.
Nevertheless, I have some major concerns with the quality of the current version of the manuscript which are listed below followed by minor comments and questions.
Major comments:
1) In the conclusions the author clearly stated the uncertainties arising from different model setups regarding internal climate variability, parametrizations, and further assumptions. Saying so, why did you then choose different RCMs and not only a single one with similar setups, e.g., a COSMO-CLM version in the given (slightly different) resolutions? Furthermore, why did you use ERA-Interim and ERA5 as forcing data and not only the higher resolved and newer ERA5 data for all simulations?
2) The author put lots of effort into the homogenization of pointwise observational data sets. There are several high-res gridded precipitation data sets on the market like REGNIE/HYRAS for Germany (1km, Rauthe et al., 2013), RADOLAN (DWD, 1km), or SPARTACUS (Austria, 1km, Hiebl and Frei, 2017). I agree that even at this high resolution these data sets have limitations when it comes to convection. Nevertheless, DWD and ZAMG put a lot of effort into calibrating these data sets not only with ground measurements but also with radar data and vise versa in the case of RADOLAN. So, I assume these data sets have a higher quality than the homogenized point observations by the author and they have a higher resolution which made the validation of the 1.5km WRF model more robust.
3) When it comes to different extreme value techniques, a proper validation would use every method with every data set and not only a couple of possible combinations like currently presented.
4) The authors conclude that RCMs are better in terms of spatial representativeness of return levels. Saying so I expect cross-validation with existing products like KOSTRA for Germany to clearly point out the benefit of RCMs compared to raw or existing gridded observations.
5) The author concentrated on the return level of 10 years and stated that this is the most important value for the targeted applications. At least for the insurance industry, minimum the 100-year return level better the 200-year values (PML200) are the relevant levels. As all results are specifically related to the 10-year level, I am wondering if the methodology can be adapted/used for higher return levels or if further validation/calibration is necessary in that case. I miss some statements on that in the discussion and conclusions sections.
Additionally to the major comments above, I have some minor comments [page-line/paragraph]:
[Sect. 1] I recommend clearly state the key research questions you are focusing on in this study. For me, it is not clear what the main aims are.
[P3 L81ff] Schröter et al. (2015) analyzed three major flood events in Germany during the past 70 years (1954,2002,2013), which also partly affected your investigation area, concluding that it is not daily/multi-day precipitation amount that triggers major flood events.
[P3 L88ff] “RCM can bridge the gaps” – what about stochastic weather generator or other approaches? Ehmele and Kunz (2019), for example, introduced a semi-physical, 2D, and high-resolved precipitation model mainly based on orographic precipitation which in a statistical sense, gives good results in terms of return levels even for higher return periods.
[Sect. 2] I recommend a reordering of the paragraphs in this section. As your investigation area is restricted to the given data sets, I suggest first describe the data sets and the investigation area afterward.
[Fig.1] Is the study area equal to the model domain? If so, how do you deal with boundary effects?
[P4 L97f] In Fig.2 you give the reference for the data set, I suggest giving it in the text, too.
[Fig.2] Do you have an explanation for the strong “drying” signal in the main Alpine valleys? Please use discrete color separations. See also https://www.nature.com/articles/s41467-020-19160-7
[Sect. 2.2] So I understand that you estimate daily precipitation or at least 24h sums in the moving window by hourly station data, right? If so, please clarify in the text.
[P7 L133] “24h RLs are adjusted to daily values using a reduction”. I do not understand what this reduction is about. Please clarify this in the text.
[Sect. 2.3] Why did you choose exactly these models and not others? There is a huge variety of RCM in 0.11° resolution within the CORDEX project and also high-resolution simulations mainly Germany and Alpine region in the CORDEX FPS convection project. Furthermore, you used WRF v3.6.1 for the 5km and v4.1 for the 1.5km simulations. Are there major differences between the versions? For consistency, the same model version would be better.
[P8 L161ff] For WRF 1.5km, you have 30 simulations with a 1-year length each. Does this have an impact on the comparability with the continuous simulations at coarser resolution?
[Sect.3] I suggest a reordering here, too. Instead of first describing strategies and distributions and then how they are applied in this study, I recommend a structure like 3.1 BM; 3.2 POT, 3.3 MEV each with a short introduction to the method and then directly saying how you will apply it in this study.
[P9 L180ff] It would be helpful for the reader if you can give typical values or magnitude orders of t_wet and t_decluster.
[P9 L192] G is also a CDF, right? Please indicate it.
[P12 L242] Can you explain why the low-res simulations have higher return values than the high-res?
[P13 L277] You mean Fig.5d instead of 5b?
[P15 L289] The 5km WRF seems to have a much stronger orographic signal than the 1.5km, especially the “drying” in the main valleys. Is there any explanation for that?
[P15 293] Fig.5b and later 5e?
[P15 L301] Sure you mean Fig 3d here?
[P15 L305] Fig.5c and 5f, I guess
[P19 L394-398] Maybe I miss something, but I do not get the message from these two paragraphs
[Fig. S5+S6] There is data missing for Switzerland and Austria. Why? I thought you have the data for that regions and time periods.
Citation: https://doi.org/10.5194/nhess-2021-66-RC1 -
AC1: 'Reply on RC1', Benjamin Poschlod, 14 Jun 2021
Dear Referee,
Thank you for your comments and helpful suggestions. I agree with most of your thoughts and I am sure that addressing your suggestions will further improve the article. I will answer the comments point by point below (your comments marked in italic), but first I would like to provide an explanatory introduction about the motivation for this study, which will hopefully allow a better understanding of the model and data selection.
This investigation is motivated by the findings about the representation of extreme precipitation in the CRCM5 large ensemble (CRCM5-LE) about historical (https://doi.org/10.5194/essd-13-983-2021) and future conditions (https://doi.org/10.1088/1748-9326/ac0849). The CRCM5-LE is a single model initial-condition large ensemble featuring 50 members and driven by a 50-member global circulation model large ensemble (CanESM2-LE). The first study has shown that the model is able to reproduce observed extreme rainfall return levels with good skill over Europe and also over Germany and Bavaria (similar to the CRCM5 in this study). The second study has shown the large projected changes of extreme precipitation under the RCP8.5 scenario.
The results of these studies with a focus on Germany and Bavaria were presented to a large selection of institutional experts and users of extreme rainfall return level data from the Bavarian Environmental Agency. For them, the KOSTRA dataset (as legal guideline) is the basis for applications and therefore the “benchmark”. However, KOSTRA is based on historical observation data between 1951 and 2010.
As has already been the case for floods (HF100) since 2004 in Bavaria (see L495-497 in the article), a climate change surcharge for heavy precipitation was discussed. From my perspective as a climate scientist, the results from the CRCM5-LE would be sufficient to recommend adaptive measures such as a climate change surcharge. However, local biases of the CRCM5-LE rainfall return levels have greatly reduced the acceptance of the results by the institutional users (which I have hinted at in L63-66).After this presentation and discussion, I asked myself whether currently available RCM simulations with higher resolution could lower the local biases in order to increase the acceptance of extreme rainfall return levels projections.
Therefore, the CRCM5 was chosen as the "climate model reference" and KOSTRA as the target to be simulated.
I hope that the description of this starting point provides a good understanding of the choice of data (KOSTRA) and model (CRCM5). I will mention this motivation more clearly and in more detail in the article.
Major comments:
1) In the conclusions the author clearly stated the uncertainties arising from different model setups regarding internal climate variability, parametrizations, and further assumptions. Saying so, why did you then choose different RCMs and not only a single one with similar setups, e.g., a COSMO-CLM version in the given (slightly different) resolutions? Furthermore, why did you use ERA-Interim and ERA5 as forcing data and not only the higher resolved and newer ERA5 data for all simulations?
The CRCM5 simulation was chosen as the “reference” RCM simulation due to the previous studies based on this model (see explanation above). However, the higher-resolved and convection-permitting successor CRCM6 is still being developed.
Therefore, I chose higher resolution simulations from freely available data sources covering the study area with a time period of 30 years driven by reanalysis data. The 5km WRF as representative for high-resolution simulations with parametrization of convection, and the 1.5 km WRF as the highest-resolution simulation known to me without any parametrization of deep and shallow convection.2) The author put lots of effort into the homogenization of pointwise observational data sets. There are several high-res gridded precipitation data sets on the market like REGNIE/HYRAS for Germany (1km, Rauthe et al., 2013), RADOLAN (DWD, 1km), or SPARTACUS (Austria, 1km, Hiebl and Frei, 2017). I agree that even at this high resolution these data sets have limitations when it comes to convection. Nevertheless, DWD and ZAMG put a lot of effort into calibrating these data sets not only with ground measurements but also with radar data and vise versa in the case of RADOLAN. So, I assume these data sets have a higher quality than the homogenized point observations by the author and they have a higher resolution which made the validation of the 1.5km WRF model more robust.
I agree with you on your assessment of the gridded data sets. However, the public authorities rely on KOSTRA as legal guideline. RADOLAN, for example shows substantial differences compared to KOSTRA (see Fig. 26 in https://www.dwd.de/DE/leistungen/pbfb_verlag_berichte/pdf_einzelbaende/251_pdf.pdf?__blob=publicationFile&v=2 sorry, only available in German). In a personal communication with a representative of the DWD, the technical suitability and high quality of the KOSTRA data set for daily rainfall return levels was also confirmed to me.
The REGNIE data set is based on the same observational data as KOSTRA, but the interpolation is carried out on a finer grid. Therefore, multiple regression using the elevation and exposition are used as co-variates for interpolation. However, the scope of REGNIE is to provide daily high-resolution precipitation fields, whereas the scope of KOSTRA is to provide rainfall return levels. Hence, the workflow also differs:
For KOSTRA, the order is the following: Rain gauge observations -> extreme value analysis -> spatial interpolation.
For REGNIE: Rain gauge observations -> spatial interpolation. As next step extreme value analysis could be carried out.
As you suggest, a comparison of REGNIE (and similar products in Austria and Switzerland) to the 1.5km WRF return levels would be very interesting as well. However, this comparison would be more of an evaluation of the REGNIE interpolation method versus the 1.5 km WRF return levels, and therefore I would refrain from doing so in this article.
3) When it comes to different extreme value techniques, a proper validation would use every method with every data set and not only a couple of possible combinations like currently presented.
Thanks for this suggestion. I will calculate the remaining combinations and add the results (probably) in the Supplementary Materials. If unexpected or noteworthy findings from the remaining combinations evolve, I will of course show and discuss them in the main article.
4) The authors conclude that RCMs are better in terms of spatial representativeness of return levels. Saying so I expect cross-validation with existing products like KOSTRA for Germany to clearly point out the benefit of RCMs compared to raw or existing gridded observations.
I guess, you are referring to L475ff. I do not conclude that RCMs are generally better than any KOSTRA/ÖKOSTRA/Swiss gauges in terms of spatial representativeness of return levels.
The spatially homogeneous return levels based on RCMs can mostly support areas, where observations are scarce. The study area shows a high rain gauge density, which is necessary to validate the RCM return levels. Therefore, the comparison of RCM return levels to the broad database of KOSTRA/ÖKOSTRA/Swiss gauges implies that RCMs can be used in areas with less observational coverage in order to enhance the spatial representativeness.
Second, the spatial representativeness of each rain gauge differs depending on the topography. Especially in complex terrain, a rain gauge based in the valley may be only representative for a very small area in this valley, but not for the surrounding slopes. That’s why the ÖKOSTRA design rainfall is based on a combination using observed data with limited spatial representativity and simulations of a convection-permitting weather model. The conclusion of this study supports this ÖKOSTRA-approach.
I think that this conclusion can be sufficiently justified on the basis of the existing results (spatial correlation based on the Spearman coefficient). I think a separate cross-validation would be out of the scope of this article.5) The author concentrated on the return level of 10 years and stated that this is the most important value for the targeted applications. At least for the insurance industry, minimum the 100-year return level better the 200-year values (PML200) are the relevant levels. As all results are specifically related to the 10-year level, I am wondering if the methodology can be adapted/used for higher return levels or if further validation/calibration is necessary in that case. I miss some statements on that in the discussion and conclusions sections.
Thanks for this comment. I agree that the even more extreme return levels are very important and very interesting. I will calculate and add the 100-year return levels in the revised version of the article.
Additionally to the major comments above, I have some minor comments [page-line/paragraph]:
[Sect. 1] I recommend clearly state the key research questions you are focusing on in this study. For me, it is not clear what the main aims are.
The main questions are:
Are there high-resolution RCM setups available which lead to lower local biases of rainfall return levels compared to observations?
How much do the return levels based on different state-of-the-art extreme value theoretical methods differ?
I’ll add them at the end of the introduction.
[P3 L81ff] Schröter et al. (2015) analyzed three major flood events in Germany during the past 70 years (1954,2002,2013), which also partly affected your investigation area, concluding that it is not daily/multi-day precipitation amount that triggers major flood events.
Thanks for this hint. I did not want to state that daily extreme precipitation events are the only relevant trigger for floods in the whole study area. I will add the Schröter et al. (2015) reference and state that daily extreme precipitation is not alone responsible for all floods in the area. Of course, the antecedent wetness state plays a major role as well. Further, the character and size of the respective river catchment determines, which precipitation duration is more probable to cause floods.
[P3 L88ff] “RCM can bridge the gaps” – what about stochastic weather generator or other approaches? Ehmele and Kunz (2019), for example, introduced a semi-physical, 2D, and high-resolved precipitation model mainly based on orographic precipitation which in a statistical sense, gives good results in terms of return levels even for higher return periods.
Thanks for this reference: I will include and discuss stochastic weather generators, such as Ehmele & Kunz, and other statistical methods e.g. the French SHYPRE as representatives of different methods. Of course, there are many more methods to investigate extreme precipitation.
However, RCMs incorporate the advantage of including climate scenarios in their boundary conditions.
[Sect. 2] I recommend a reordering of the paragraphs in this section. As your investigation area is restricted to the given data sets, I suggest first describe the data sets and the investigation area afterward.
Thanks for this suggestion, I rearrange the paragraphs accordingly for better readability.
[Fig.1] Is the study area equal to the model domain? If so, how do you deal with boundary effects?
No, the whole model domain of the 1.5 km model setup covers 351x351 grid cells, whereas the study area is reduced to 271x271 grid cells by Collier & Mölg (2020). 40 grid cells at each edge of the domain are discarded for the analysis. Hence, boundary effects can be assumed to be excluded.
I'll add an according statement in the text.
[P4 L97f] In Fig.2 you give the reference for the data set, I suggest giving it in the text, too.
Will be added in L98.
[Fig.2] Do you have an explanation for the strong “drying” signal in the main Alpine valleys? Please use discrete color separations. See also https://www.nature.com/articles/s41467-020-19160-7
Thanks for this suggestion. The figure will be re-drawn with appropriate colors.
The underestimation of annual precipitation in the Alpine valleys is also stated by Warscher et al. (2019), but without any explanation. Maybe it would be better to show an observational product - however the EURO4-APGD (Isotta et al., 2014 https://doi.org/10.1002/joc.3794) does not cover the northern part of the study area.
Even though E-OBS with a spatial resolution of 0.11° (~12 km) cannot fully resolve the spatial heterogeneity, I will re-draw Figure 2 based on E-OBS.
[Sect. 2.2] So I understand that you estimate daily precipitation or at least 24h sums in the moving window by hourly station data, right? If so, please clarify in the text.
The KOSTRA and ÖKOSTRA data are partly based on (sub-)hourly station data using a moving window and partly based on daily observations. However, the 24h return level values are provided representing the value of extreme precipitation for a 24-hourly moving window.
I transferred these values to "daily estimations" (fixed window) reducing them by 14%, as this relationship between “daily” and “24-hourly moving window” has been found stable (Boughton & Jakob 2008, Barbero et al. 2019 ref. in the article, own calculations).
I'll enhance the description in the text to clarify this. The Swiss return levels are provided as daily estimates, and therefore no reduction is applied.
[P7 L133] “24h RLs are adjusted to daily values using a reduction”. I do not understand what this reduction is about. Please clarify this in the text.
see comment above; will be clarified.
[Sect. 2.3] Why did you choose exactly these models and not others? There is a huge variety of RCM in 0.11° resolution within the CORDEX project and also high-resolution simulations mainly Germany and Alpine region in the CORDEX FPS convection project. Furthermore, you used WRF v3.6.1 for the 5km and v4.1 for the 1.5km simulations. Are there major differences between the versions? For consistency, the same model version would be better.
The motivation to use the CRCM5 as “12-km RCM reference” is explained in the introduction of this answer. Further, the CRCM5 has shown a relatively good performance at the reproduction of 10-year return levels (https://doi.org/10.5194/essd-13-983-2021) compared to EURO-CORDEX models (see Berg et al., 2019; ref in article).
I agree that the same WRF model versions would be better to investigate the added value of higher resolution. The two WRF setups have been chosen as they are publicly available and cover 30 years driven by reanalysis data. The 5km WRF represents a setup with a very high resolution and parametrization of convection, whereas the 1.5 km setup is the highest-resolution setup known to me. Further it explicitly resolves deep and shallow convection.
The main differences between the WRF versions are summarized here: https://github.com/wrf-model/WRF/releases/tag/v4.1
In the newer version, additional schemes are available (microphysics, radiation, cumulus). However, these have not been applied in the 1.5 km setup. Further, some minor improvements and bug fixes have been implemented. Hence, I conclude that the differences of the model version do not play a major role.[P8 L161ff] For WRF 1.5km, you have 30 simulations with a 1-year length each. Does this have an impact on the comparability with the continuous simulations at coarser resolution?
Of course, the model initialization impacts the simulations due to the model representation of internal climate variability. In that sense, transient simulations would yield slightly different results than the sliced simulations. As the WRF domain is forced by the lateral boundary conditions of the ERA5 reanalysis data at 3-hourly resolution, I would not assume that slicing the simulation period does have a systematic impact on the magnitude of rainfall return levels.
For other variables in the WRF with longer lag times such as deeper soil moisture, transient simulations would be more appropriate. Hence, I conclude that the slicing does not have an impact on the comparability of daily rainfall return levels with the transient simulations.
[Sect.3] I suggest a reordering here, too. Instead of first describing strategies and distributions and then how they are applied in this study, I recommend a structure like 3.1 BM; 3.2 POT, 3.3 MEV each with a short introduction to the method and then directly saying how you will apply it in this study.
I will revise the text according to your suggestion.
[P9 L180ff] It would be helpful for the reader if you can give typical values or magnitude orders of t_wet and t_decluster.
Values are given in L248 and L261. The reordering of the section (see comment above) will provide these values directly in the respective subsection.
[P9 L192] G is also a CDF, right? Please indicate it.
Will be corrected.
[P12 L242] Can you explain why the low-res simulations have higher return values than the high-res?
There is no simple relationship between spatial resolution and extreme precipitation intensity. Of course, GCMs show smaller rainfall intensities than RCMs, and there is a general tendency that higher spatial resolution leads to higher precipitation intensity. However, the chosen model, the model setup, and the chosen parametrization schemes can overlay this tendency.
[P13 L277] You mean Fig.5d instead of 5b?
yes, will be corrected.
[P15 L289] The 5km WRF seems to have a much stronger orographic signal than the 1.5km, especially the “drying” in the main valleys. Is there any explanation for that?
Warscher et al. (2019) also note this strong orographic signal in their setup compared to observational data. I don't have an explanation for this behavior, but I will describe this behavior in the revised text.
[P15 293] Fig.5b and later 5e?
Yes, will be corrected.
[P15 L301] Sure you mean Fig 3d here?
It should read 4d.
[P15 L305] Fig.5c and 5f, I guess
Yes, will be corrected.
[P19 L394-398] Maybe I miss something, but I do not get the message from these two paragraphs
These two paragraphs discuss the differences in the driving data (75km ERA-Interim vs. 30km ERA5) and the temporal coverage (1980–2009 and 1988–2017) as sources of uncertainty and discrepancy between the three RCM setups.
The message is: Even though driving data, model (version) and time period differ, the resulting return levels are quite similar in terms of intensity and spatial patterns.
L399ff: However, for the evaluation of single extreme events, the setup can result in large differences.
[Fig. S5+S6] There is data missing for Switzerland and Austria. Why? I thought you have the data for that regions and time periods.
I only have the return level data as described in section 2.2.
For Fig. S5&6 REGNIE data is used for Germany, as it is publicly available. For Switzerland and Austria, similar products are not publicly available as far as I know. Only for the event in August 2005, Meteo Swiss provided daily precipitation here: https://www.meteoswiss.admin.ch/home/climate/swiss-climate-in-detail/extreme-value-analyses/high-impact-precipitation-events/19-23-august-2005/precipitation-and-temperature.html
I hope that my answers can satisfyingly address your comments and suggestions.
Kind regards,
Benjamin Poschlod
Citation: https://doi.org/10.5194/nhess-2021-66-AC1
-
AC1: 'Reply on RC1', Benjamin Poschlod, 14 Jun 2021
-
RC2: 'Comment on nhess-2021-66', Anonymous Referee #2, 20 Apr 2021
The author estimates 10-years return levels with the Generalized Extreme Value (GEV) distribution from three different Regional Climate Models (RCMs), namely the Canadian Regional Climate Model version 5 (CRMC5) at 12 km resolution, the Weather and Forecasting Research model (WRF) at 5 km resolution and the WRF model at 1.5 km resolution, showing that the finer spatial resolution of the WRF-5km with respect to the 12 km CRCM5 reduces the bias of GEV. Moreover, he investigates uncertainties due to the use of three different extreme value models (GEV with fixed shape parameter, Generalized Pareto (GP) and Metastatistical Extreme Value (MEV) distributions) to estimate the 10-year return period quantiles using the WRF model with the finest (1.5 km) resolution. Through this analysis he concludes that GEV and GP distributions are equivalently biased (~ +1%), while MEV tends to underestimation (~ -6%) and that high-resolution RCMs provide promising results for the estimation of spatially homogeneous rainfall return levels.
The study is interesting and shows potential for evaluating extremes in a changing climate, the manuscript is well written and easy to follow. I have some major comments, and minor comments follow.
Major comments:
1) The use of the high-resolution products (REGNIE, RADOLAN, SPARTACUS) would avoid to homogenize the gauge precipitation values and would make possible a more accurate validation of the RCMs with the finest resolution. Why not considering them?
2) Why only return level of 10 years? I understand the concern of the author that 30 years of data are few for estimating higher quantiles, but return periods higher than 10 (e.g., 100) years are more relevant for engineering applications/(re)insurance purposes and the challenge is indeed to estimate them with the availability of short time series. How would the estimation of higher return levels compare e.g. with the official ones from KOSTRA? As the manuscript is presented now, the conclusion stated in the abstract “it follows that high-resolution regional climate models are suitable for generating spatially homogenous rainfall return level products” is not fully supported by the analysis, since only the 10-years return levels have been evaluated.
3) The study area is characterized by some high-elevated regions affected by orographic precipitation. I'm wondering if using all the values as “ordinary events” in the MEV might not respect the independence hypothesis required by the MEV framework. See for example Marra et al. (2018) and Miniussi et al. (2020) for some discussion on temporal correlation.
4) Why using a GEV distribution with a constant shape parameter and not, for example, a Gumbel? Previous studies (e.g., Grieser et al. (2007)) have shown that the Gumbel distribution is a good model for precipitation in the Bavarian area, and its location parameter has a strong correlation with altitude, while its scale parameter has a noisy pattern (except for the Bavarian Alps). Moreover, you say that the shape parameter based on all the three RCM setups is centered around a value close to 0.114, in line with the one recommended by Papalexiou and Koutsoyiannis (2013): is this really a fair comparison, as these shape parameter values are already affected by estimation uncertainty?
Minor comments.
Section 3.
L225: Another title for section 3.3 would be more appropriate
L226-227: please add a couple of words about the adjustment, so that the reader understands it directly from here without the need to go looking at the reference.
L239: you state that “the location and scale parameter are governed by the topography”. From Figure 3 one can notice that the spatial pattern of the location parameter is somehow coherent with topography, but the noise for the scale parameter does not make its pattern straightforward to understand. Maybe also the colors scale is not helping.
L240: why a chaotic pattern for the shape parameter? Is it related to the uncertainty that one can get due to the limited series available to estimate it?
L259: you mention you made a “goodness of fit” (despite its limitation in prediction) for the GEV and the GP distributions. Have you made a similar analysis also for the Weibull distribution?
L264: in L253-255 you mention that for sample sizes > 50 estimation via ML is recommended. Why then using PWM for the Weibull distribution in the MEV framework?
Section 4.
L287 and 310 (captions of Figures 4 and 6): “difference calculated as climate model return level minus observational return level” -> difference between the return level from the climate model and the observational one. Why using of the absolute error instead of the relative error?
L454-457: in Zorzetto et al. (2016) the analysis has been made by means of a cross-validation approach, so that the sample used for parameter calibration is independent from the one used for testing the performance of GEV and MEV distributions. When GEV is fitted and tested on the same sample (unless the sample is shorter, i.e. 10-20 years, when issues in the parameter estimation –especially for the shape parameter- might arise), it usually outperforms MEV, but it is not flexible in prediction.
A curiosity: are you considering or discarding snow events?
Supplementary material.
FigS2: how are the 95% confidence intervals computed?
You have the example for the Munich grid cell, and only for GEV-LMOM and GP models, why not for GEV-ML and MEV? Moreover, a comprehensive validation of all the extreme value models for the whole area would add value to the analysis.
FigS5-S6: now the REGNIE product is shown; why not showing the observation-based product used in the analysis? It would be also useful to evaluate differences among the products (even if for some events only).
References
Grieser, J., Staeger, T., & Schonwiese, C. D. (2007). Estimates and uncertainties of return periods of extreme daily precipitation in Germany. Meteorologische Zeitschrift, 16(5), 553–564. https://doi.org/10.1127/0941-2948/2007/0235
Marra, F., Nikolopoulos, E. I., Anagnostou, E. N., & Morin, E. (2018). Metastatistical Extreme Value analysis of hourly rainfall from short records: Estimation of high quantiles and impact of measurement errors. Adv. Wat. Res., 117, 27–39. https://doi.org/10.1016/j.advwatres.2018.05.001
Miniussi, A., Villarini, G., & Marani, M. (2020). Analyses Through the Metastatistical Extreme Value Distribution Identify Contributions of Tropical Cyclones to Rainfall Extremes in the Eastern United States. Geophysical Research Letters, 47(7). https://doi.org/10.1029/2020GL087238
Citation: https://doi.org/10.5194/nhess-2021-66-RC2 -
AC2: 'Reply on RC2', Benjamin Poschlod, 14 Jun 2021
Dear Referee,
Thank you for your comments and helpful suggestions. I really think that your thoughts and additional references help to improve the quality of this study and encouraged me to dive deeper into EVT and the MEV framework.
I will address the comments point by point below (your comments marked in italic), but first I would like to provide an explanatory introduction about the motivation for this study, which will hopefully allow a better understanding of the model and data selection.
This investigation is motivated by the findings about the representation of extreme precipitation in the CRCM5 large ensemble (CRCM5-LE) about historical (https://doi.org/10.5194/essd-13-983-2021) and future conditions (https://doi.org/10.1088/1748-9326/ac0849). The CRCM5-LE is a single model initial-condition large ensemble featuring 50 members and driven by a 50-member global circulation model large ensemble (CanESM2-LE). The first study has shown that the model is able to reproduce observed extreme rainfall return levels with good skill over Europe and also over Germany and Bavaria (similar to the CRCM5 in this study). The second study has shown the large projected changes of extreme precipitation under the RCP8.5 scenario.
The results of these studies with a focus on Germany and Bavaria were presented to a large selection of institutional experts and users of extreme rainfall return level data from the Bavarian Environmental Agency. For them, the KOSTRA dataset (as legal guideline) is the basis for applications and therefore the “benchmark”. However, KOSTRA is based on historical observation data between 1951 and 2010.
As has already been the case for floods (HF100) since 2004 in Bavaria (see L495-497 in the article), a climate change surcharge for heavy precipitation was discussed. From my perspective as a climate scientist, the results from the CRCM5-LE would be sufficient to recommend adaptive measures such as a climate change surcharge. However, local biases of the CRCM5-LE rainfall return levels have greatly reduced the acceptance of the results by the institutional users (which I have hinted at in L63-66).After this presentation and discussion, I asked myself whether currently available RCM simulations with higher resolution could lower the local biases in order to increase the acceptance of extreme rainfall return levels projections.
Therefore, the CRCM5 was chosen as the "climate model reference" and KOSTRA as the target to be simulated.
I hope that the description of this starting point provides a good understanding of the choice of data (KOSTRA) and model (CRCM5). I will mention this motivation more clearly and in more detail in the article.
Major comments:
1) The use of the high-resolution products (REGNIE, RADOLAN, SPARTACUS) would avoid to homogenize the gauge precipitation values and would make possible a more accurate validation of the RCMs with the finest resolution. Why not considering them?
Yes, these data sets provide values at higher spatial resolution. However, for rainfall return levels, the public authorities rely on KOSTRA as legal guideline (as described in the introduction of this answer). Furthermore, in a personal communication with a representative of the DWD, the technical suitability and high quality of the KOSTRA data set for daily rainfall return levels was confirmed to me.
The radar product RADOLAN, for example shows substantial differences compared to the observation-based KOSTRA (see Fig. 26 in https://www.dwd.de/DE/leistungen/pbfb_verlag_berichte/pdf_einzelbaende/251_pdf.pdf?__blob=publicationFile&v=2 sorry, only available in German).
The REGNIE data set is based on the same observational data as KOSTRA, but just the interpolation is carried out on a finer grid. Therefore, multiple regression using the elevation and exposition are used as co-variates for interpolation. However, the scope of REGNIE is to provide daily high-resolution precipitation fields, whereas the scope of KOSTRA is to provide rainfall return levels. Hence, the workflow also differs:
For KOSTRA, the order is the following: Rain gauge observations -> extreme value analysis -> spatial interpolation (~ 8km).
For REGNIE: Rain gauge observations -> spatial interpolation (1km). As next step extreme value analysis could be carried out.
As you suggest, a comparison of REGNIE (and similar products in Austria and Switzerland) to the 1.5km WRF return levels would in general be very interesting. However, in my opinion this comparison would be more of an evaluation of the REGNIE interpolation method versus the 1.5 km WRF return levels, and therefore I would refrain from doing so in this article.
2) Why only return level of 10 years? I understand the concern of the author that 30 years of data are few for estimating higher quantiles, but return periods higher than 10 (e.g., 100) years are more relevant for engineering applications/(re)insurance purposes and the challenge is indeed to estimate them with the availability of short time series. How would the estimation of higher return levels compare e.g. with the official ones from KOSTRA? As the manuscript is presented now, the conclusion stated in the abstract “it follows that high-resolution regional climate models are suitable for generating spatially homogenous rainfall return level products” is not fully supported by the analysis, since only the 10-years return levels have been evaluated.
I totally agree. I will add the 100-year return levels in the revised version of the manuscript. It will also be interesting if the MEV can outperform the GEV and GPD for the longer return period.
3) The study area is characterized by some high-elevated regions affected by orographic precipitation. I'm wondering if using all the values as “ordinary events” in the MEV might not respect the independence hypothesis required by the MEV framework. See for example Marra et al. (2018) and Miniussi et al. (2020) for some discussion on temporal correlation.
Thank you for this comment on the methodology and these two very interesting additional references. I will check the temporal autocorrelation to ensure that all “ordinary events” can be assumed to be independent events (similarly to Fig S3 of Zorzetto et al., 2016). For grid cells, where the independence hypothesis is not respected, I will implement an “event separation method” as in Marra et al. (2018). However, in this study only daily (not hourly) rainfall amounts are available, so the “running parameter” (minimal time between two rainfall events) will amount to [0, 1, 2, 3, …] days.
4) Why using a GEV distribution with a constant shape parameter and not, for example, a Gumbel? Previous studies (e.g., Grieser et al. (2007)) have shown that the Gumbel distribution is a good model for precipitation in the Bavarian area, and its location parameter has a strong correlation with altitude, while its scale parameter has a noisy pattern (except for the Bavarian Alps). Moreover, you say that the shape parameter based on all the three RCM setups is centered around a value close to 0.114, in line with the one recommended by Papalexiou and Koutsoyiannis (2013): is this really a fair comparison, as these shape parameter values are already affected by estimation uncertainty?
Grieser et al. 2007 stated that the Gumbel distribution is an adequate fit for the German rain gauge data – however they also write: “The fitting of this GEV to a data sample allows the identification of which of the three extreme value distributions may adequately describe the data. However, this approach of model identification is not the topic of this paper. It is already known that daily precipitation in Germany can be adequately described by a Gamma distribution (ZOLINA et al., 2005). Since the Gamma distribution is in the basin of the Gumbel distribution we a priori expect the highest value of a sample of daily precipitation to be describable by the Gumbel distribution.”
They also checked the goodness of fit via the value of R² for the empirical annual maxima and the theoretical fitted Gumbel distribution.
The cited underlying paper of Zolina et al. 2005 (https://doi.org/10.1029/2005GL023231), which used the Gamma distribution, only investigated 96 rain gauges over Europe, from which 20 – 30 gauges are in the study area of this paper.
Papalexiou and Koutsoyiannis (2013) found that in the study area the value of the shape parameter amounts to 0.1 – 0.14 based on rain gauge observations. Hence, also their regional investigation in the study area supports their finding that a value of 0.114 is an appropriate choice for most areas in the world.
The finding in this study that the distribution of the values of the shape parameter fitted via L-moments is centered around roughly 0.114 for all three RCMs is of course affected by estimation uncertainty. But it supports the argumentation by Papalexiou and Koutsoyiannis (2013).
Encouraged by your comment, I carried out a similar investigation to Papalexiou and Koutsoyiannis (2013) for over 1100 rain gauges in the study area with more than 30 years of daily precipitation data. I fitted the GEV via L-moments, which results in a distribution of the shape parameter centered around 0.09. Of course, again this value is affected by estimation uncertainty.
Further, I carried out the Anderson-Darling test at 0.05 significance for the GEV with shape=0.114 and the Gumbel distribution for the 1.5 km WRF data. For the GEV with shape=0.114, less grid cells have a p-value < 0.05 than for the Gumbel distribution (GEV114: 4097 versus Gumbel: 9180 grid cells from 73441 grid cells in total). When applying the adjusted p-values following Wilks (2016) the GEV with shape=0.114 is not rejected at any grid cell, whereas the Gumbel fits are rejected at 870 grid cells. Therefore, I think that the GEV with shape=0.114 is more appropriate for the study area.
Minor comments.
Section 3.
L225: Another title for section 3.3 would be more appropriate
Section 3 will be reordered according to the suggestion of the other referee. Hence, this title will be discarded.
L226-227: please add a couple of words about the adjustment, so that the reader understands it directly from here without the need to go looking at the reference.
I will add an explanation of the general problem of false positives and the principle of the approach by Wilks (2016).
L239: you state that “the location and scale parameter are governed by the topography”. From Figure 3 one can notice that the spatial pattern of the location parameter is somehow coherent with topography, but the noise for the scale parameter does not make its pattern straightforward to understand. Maybe also the colors scale is not helping.
For the scale parameter, some topographical features are visible (e.g. the Alpine valleys in Fig. 3e,h; the Swabian Jura, the Odenwald, the Ore Mountains, and the Bavarian Forest in Fig. 3e,h), but as you mentioned, the pattern is noisier.
I will clarify in the text that the patterns of the scale parameter are noisier than for the location parameter and discuss the topographical features, which are visible.
L240: why a chaotic pattern for the shape parameter? Is it related to the uncertainty that one can get due to the limited series available to estimate it?
The explanation is given in L438-441 as I did not want to anticipate the content of the discussion in the results section. Indeed, this explanation refers to the uncertainty that one can get due to the limited series available to estimate it.
L259: you mention you made a “goodness of fit” (despite its limitation in prediction) for the GEV and the GP distributions. Have you made a similar analysis also for the Weibull distribution?
No, I haven’t. I will add such analysis for the Weibull distribution as well.
L264: in L253-255 you mention that for sample sizes > 50 estimation via ML is recommended. Why then using PWM for the Weibull distribution in the MEV framework?
For the MEV Framework, I followed the approach of Zorzetto et al. (2016), who applied the PWM to fit the Weibull distribution to non-zero rainfall events within one year. They claimed that the PWM attributes a greater weight to the tail of the distribution than ML and is also more robust to outliers. As within the MEV framework, the Weibull distribution is fitted to “ordinary” events, the tails are especially important to investigate the extremes.
Section 4.
L287 and 310 (captions of Figures 4 and 6): “difference calculated as climate model return level minus observational return level” -> difference between the return level from the climate model and the observational one. Why using of the absolute error instead of the relative error?
The relative error is shown. I wanted to express that the relative error is calculated as (RL_RCM – RL_OBS)/RL_OBS. I will clarify the captions accordingly.
L454-457: in Zorzetto et al. (2016) the analysis has been made by means of a cross-validation approach, so that the sample used for parameter calibration is independent from the one used for testing the performance of GEV and MEV distributions. When GEV is fitted and tested on the same sample (unless the sample is shorter, i.e. 10-20 years, when issues in the parameter estimation –especially for the shape parameter- might arise), it usually outperforms MEV, but it is not flexible in prediction.
I will correct these sentences accordingly and distinguish between the performance when tested on the same sample and the performance at prediction.
A curiosity: are you considering or discarding snow events?
All daily values of precipitation are considered.
Supplementary material.
FigS2: how are the 95% confidence intervals computed?
The confidence intervals are computed based on the delta method using the covariance matrix. I will add the method to the caption of FigS2.
You have the example for the Munich grid cell, and only for GEV-LMOM and GP models, why not for GEV-ML and MEV? Moreover, a comprehensive validation of all the extreme value models for the whole area would add value to the analysis.
I will add plots for the GEV-FIX and MEV.
FigS5-S6: now the REGNIE product is shown; why not showing the observation-based product used in the analysis? It would be also useful to evaluate differences among the products (even if for some events only).
REGNIE is also observation-based and provides continous daily precipitation values. The observation-based KOSTRA (as used to evaluate the return levels in this study) only provides return levels but no continuous daily precipitation values. Hence, it can’t be used to analyze specific events.
I hope that my answers can satisfyingly address your comments and suggestions.
Kind regards,
Benjamin Poschlod
Citation: https://doi.org/10.5194/nhess-2021-66-AC2
-
AC2: 'Reply on RC2', Benjamin Poschlod, 14 Jun 2021