the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A methodological framework for the evaluation of short-range flash-flood hydrometeorological forecasts at the event scale
Maryse Charpentier-Noyer
Daniela Peredo
Axelle Fleury
Hugo Marchal
François Bouttier
Eric Gaume
Pierre Nicolle
Olivier Payrastre
Maria-Helena Ramos
Download
- Final revised paper (published on 06 Jun 2023)
- Preprint (discussion started on 01 Jul 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on nhess-2022-182', Anonymous Referee #1, 19 Aug 2022
The research work presented in the paper “A methodological framework for the evaluation of short-range flash-flood hydrometeorological forecasts at the event scale” presents a comprehensive evaluation of a short-range hydrometeorological ensemble forecast for an event occurred in October 2018 mainly affecting the Aude River basin (south-eastern France).
The topic is very interesting in the context of use in operational forecasting models; the idea has been deeply investigated and, even if the event analyzed has a really short duration, the proposed methodology seems valid, and the results presented are valuable and of interest for the analysis of other events.
The paper is well written, well-structured and readable; the abstract and conclusions are satisfactory; the scientific methods and assumptions valid and clearly outlined, the figures are clear, and the explanations are exhaustive.
To conclude I believe that the paper is a good candidate for being published in NHESS.
Citation: https://doi.org/10.5194/nhess-2022-182-RC1 -
AC1: 'Reply on RC1', Maryse Charpentier--Noyer, 09 Oct 2022
We thank anonymous referee 1 for this very positive evaluation.
Citation: https://doi.org/10.5194/nhess-2022-182-AC1
-
AC1: 'Reply on RC1', Maryse Charpentier--Noyer, 09 Oct 2022
-
RC2: 'Comment on nhess-2022-182', Anonymous Referee #2, 21 Aug 2022
GENERAL COMMENTSThe manuscript proposes a framework to assess the quality of hydrometeorological forecasts for flash flood events and applies it to the event that affected the Aude basin in October 2019.Conceptually, the proposed framework consists of determining the so-called hydrological focus time and hydrological focus area as the relevant temporal and spatial domains over which the hydrometeorological forecasts are evaluated in terms of the forecasted rainfall accumulations and hydrographs at different points of the river network using existing approaches.The topic is relevant and the application of the methodology for the analysed event produces interesting results. However, the writing and organization of the manuscript need to be significantly improved to make it ready for publication. Also, some further discussion about the hypotheses made and the applicability of the methodology would make the manuscript more interesting.Consequently, the manuscript requires major revisions before I can recommend its publication in Natural Hazards and Earth System Sciences.MAJOR COMMENTS1) The text should be thoroughly revised to improve its clarity, provide a description of all the tools used, avoid repetitions (some aspects appear in several parts of the manuscript), reconsider figures with little discussion (e.g., Fig. 3, Fig. 7), make the text more synthetic (specially sections 4.3 and 5), describe and present all the elements in a sequential way (avoid jumping back and forth), and expand the captions to clearly describe all the figure elements.2) Organization of the manuscript: Right now the manuscript does not read smoothly. In particular, I think that the readability would improve that Appendix A should be included as a subsection. This could be a rough organization of the manuscript:
- Introduction
- Methodology for an event-scale evaluation of hydro-meteorological ensemble forecasts: with the presentation of the 3 steps and the definition of HFA and HFT.
- Case study, data and models
- Application of the methodology to evaluate the Ens-QPF products during the event: describing how the methodology has been applied, including the contents of Appendix A.
- Results
- Discussion and conclusions: combining current sections 5 and 6.
3) The proposed methodology adapts well to the spatio-temporal hydrometeorological features of the analysed event (which shows a quasi-triangular hyetograph in the catchment and mostly single-peak hydrographs). However, I miss some discussion about how it could be applied to longer, more complex events; e.g., with multiple rainfall periods and multiple hydrograph peaks, or showing high variability of the magnitude of the floods within the affected area. In the latter case, I would like the authors to discuss the possibility of using more than one threshold to assess the quality of the hydrometeorological forecasts; in such a case, would the HFA and HFT be threshold dependent?MINOR COMMENTS1) Abstract: the final part of the abstract could be more informative about the results obtained in the study and the conclusions.2) Motivation of the study. The introduction provides an interesting description of the topic of flash flood forecasting systems and some of their limitations. However, I miss a better connection between the general context description and the presentation of the objective of the study that clearly states the motivation of the study and justifies the proposed analysis strategy.3) Page 31, line 685: “to summary” could be “to summarize”.4) Page 31, line 695: at this point the acronym “RS” has not yet been defined.5) Page 31, lines 695-696: The following sentence is not fully clear: “The drastic reduction of the number of considered time steps is compensated by the common consideration of the large number of outlets hit by the event.”6) Page 31, lines 704-706: Please, check the writing.7) Section 4 and Appendix A: Given that RS stands for “Reference Scenario” (page 16, line 360), the expressions “reference RS”, “reference RS simulation” or similar need to be corrected.8) Page 31, lines 708-810: “All the discharge forecasts issued before and covering this date (according to the maximum forecast lead time, i.e 6 runs) are then selected. For a given forecast probability (ensemble percentile), a hit is counted in the contingency table if at least one of the six runs exceed the discharge threshold at any lead time (fig A1 - left) left), and a miss is counted if none of the six forecast hydrographs exceed the threshold at any lead time (fig A1 - right)”. This sentence assumes that the reader is aware about the temporal resolution and lead times of the precipitation ensemble forecasts and how they have been applied to produce discharge forecasts. However, the first reference to Appendix A appears in page 6 (line 161), where none of this information has been provided.Also, in this sentence, the way the probabilistic discharge forecasts are treated should be described better. If I understand well, the rainfall-runoff model is run with each member of the ensemble of precipitation forecasts to generate an ensemble of hydrographs (one per rainfall forecast member); and from these the ROC analysis is based on setting probability thresholds to obtain the associated time series of discharge forecasts. Because these are not necessarily obtained from a single run of the rainfall-runoff model, I would not use the term “hydrograph” when referring to them (page 31, line 711).9) Fig A1. I would expect the oldest forecast to end at the evaluation time, and the newest forecast to be issued 1 hour before the reference time. In the figure, I cannot see this. Also, explain (at least in the figure caption) what the term “anticipation” used in the Figure shows.10) Page 32, line 715: One could think that, if a correct negative occurs in the time range between t-6h and t but the discharge forecasts exceed the threshold in a different time step, this situation should be classified as a false alarm. I would like to know the authors’ opinion about this aspect and how it affects the presented results should be included in the manuscript.11) Page 32, lines 717-719: “as many values (…) as the number of outlets in the HFA”. By combining the results obtained in the different subcatchments one could be masking the quality of the forecasts in the most affected areas with those where the event did not event reach the threshold. This could be quite serious in moderate or very local events. Similarly, how would the method be applied in more complex events (e.g. with multiple flow peaks over a few days or affecting sub-catchments of different catchments)?12) Page 7, line 190: “The Aude River basin is located in southwestern France”. It could be more appropriate to use “southern France”.13) Page 7, line 199: I do not fully understand what is meant by “to be compared to the local 100-year percentile of 200 mm in 6-hours (Ayphassorho et al., 2019)”.14) Figure 2, caption: Please, describe how the rainfall accumulation map was obtained. Could you please verify that this is a 47-h rainfall accumulation map as the caption suggests? Also, it could be interesting to include the location of the 31 stream gauges in the Aude catchment mentioned in section 3.2 (lines 219-220).15) Page 9, line 226, (title of Section 3.3). For consistency, use “AROME” everywhere within the text.16) Page 9, line 240: The sentence “The number of members in the "pepi" product is 18 (respectively 13) for a lead time of 1h (respectively 6h).” needs some rephrasing to guarantee its clarity. Is the 1-h leadtime pepi product used in this study?17) Section 3.3: I suggest finding alternative notation for the terms “pepi” and “pertDpepi” that describes better these two sets of ensemble forecasts. What do these terms stand for? What are their spatial resolution and rainfall accumulation window?18) Pages 9 and 10, lines 231-247: the description of rainfall ensemble forecasts needs to be rewritten to guarantee that it is clear how the forecasts from these 3 products have been applied in the study (not only what are the maximum lead times, but also if some spin-up time has been stablished, how the hourly frequency has been handled in the case of the AROME-EPS…). Also, information about the spatial resolution of the grids and about the rainfall accumulation windows needs to be provided.19) Page 10, lines 245-247: “The spatial shift applied to this product represents an ideal distance because i) it captures the main uncertainties due to the localization of the rainfall event, and ii) it is a shift that does not combine too incompatible areas.” Is there any reference to support such a statement? How could this be verified?20) Page 10, lines 246-247: “it is a shift that does not combine too incompatible areas.” Please, clarify.21) Figure 3: What is shown in a reliability diagram needs to be clearly described to facilitate the interpretation of this figure by the non-expert reader (for the ROC curve, at least, mention that this interpretation can be found in Appendix A). Also, the text in Fig 3a needs to be clearer (ensure the readability of all numbers).22) Page 10, line 254. I suppose that “≈ 2 km2” should be “≈ 2 x 2 km2”. Is this the original resolution of the EPS grids? How were the different resolutions between observations ( ≈ 1 x 1 km2)” and the forecasts treated to do the evaluation (e.g. Figure 3)? Were the observations upscaled to the forecasts grid? Or the forecasts interpolated to the observations grid?23) Page 11, line 271. Please, specify that KGE stands for the Kling-Gupta efficiency, and provide a reference.24) Page 11, lines 271-273: “The KGE calibration (validation) values obtained were of 0.80 (0.71), which indicates good model performance, except for one validation outlet, where a low KGE value of 0.1 was obtained (Figure 4a).” It is unclear where the reported KGE values (0.80 – 0.71) were calculated. At the downstream-most level gauges? Are these the average KGE values at all the gauge stations? Besides the validation gauge with KGE~0.1, Figure 4a shows the KGE is, approximately, between 0.6-1 at the calibration gauges and between 0.3 and 0.8 at the validation gauges.25) Figure 4b. The reference to the “HyMex estimates” is only provided in section 3.2. The reference to the section or to the work of Lebouc et al. (2019) could be added in the figure caption or in the description of CINECAR.26) Page 11, line 291: What is “ANTILOPE J+1”?27) Page 11, lines 296-297: “with some few exceptions that can be explained by the spatial averaging”. What does it mean? Is not the same averaging applied to the 3 ensemble forecasts and over the same area?28) Figure 5. The range of the y axis for the two panels should be the same. In the figure caption, it would be useful to state that the Aude catchment is 6074 km2.29) Page 14, lines 310 – 317. The selection of the HFT seems to be quite subjective. Why is it based on a threshold of the Aude average rainfall intensity of 2 mm/h? The discussion about the analysis of the results being dominated by periods of low rainfall intensities would also apply to the fact that several parts of the catchment registered low rainfall. Similarly, the decision of taking the Aude catchment as the HFA is arbitrary. How much these decisions could have an effect on the obtained results? Could the HFT and HFA be obtained based on more objective criteria? For instance, considering the spatio-temporal structure of the observed and forecasted 1-h rainfall accumulations as depicted by the space-time correlogram or variogram? Discussion about these questions would be very interesting.30) Page 14, lines 329-331. The text gives the impression that some members clearly overestimate the rainfall in the catchment. Although Fig. 5 shows that this is the case by a few mm/h, there are no individual members showing average rainfall accumulations over the catchment similar to those of the 75%- and 95%-percentiles. Instead, the maps of Fig. 6 (second and third rows) are most likely the result of different members showing the largest accumulations in different locations in the catchment. Consequently, to a good extent what is referred in the text as “false alarms” are mostly location errors.31) Page 15, line 339. “largest” could be replaced by “highest”.32) Fig. 6. It would be very useful to provide the values of the event accumulation in the catchment for each panel. My impression is that the 75% percentiles show significantly larger catchment accumulations than those observed, and probably a lower percentile would be closer.33) Page 16, line 346-347: “As a consequence, to produce effective hydrological forecasts based on a good estimate of the rainfall rates…, users would need to work based on a high ensemble percentile value (the 75% percentile in the present case …)” I find this sentence misleading, as it could give the impression that this is the rainfall that has been used in the analysis (which would be contradictory with what is described in Appendix A, page 32, line 347, “for each considered forecast percentile”). Also, the discussion about how using a high percentile might generate false alarms could fit better in the discussion.34) Caption of Fig. 7. Mention the hourly rainfall thresholds for the presented ranked histograms.35) Discussion about Fig. 6 appears before and after the discussion about Fig. 7. Please, combine them (one option could be that Fig. 7 appears before Fig. 6).36) Page 16, lines 360-361: “Hourly rainfall accumulations were uniformly disaggregated to run the model at a 15-min time resolution.” Why is this necessary?37) Page 16, lines 368-369 (“This means that one unique result (either a hit, a miss, a false alarm or a correct rejection) is obtained for each of the 1174 sub-basins”). Please, specify that this is for each probability value (see also comment 33).38) Page 16, line 371: By highlighting the 75% percentile in the ROC curve, it gives the impression that this result is obtained with the rainfall of Fig. 6 (see also comment 33), whereas this is the result obtained from setting a 75% on the forecasted discharges.39) Page 18, lines 385-386: “This is clearly the dominant effect for the 75% percentile of the pertDpepi ensemble product and the 2018 event.” Please, refer to Fig. 9.40) Figure 9, caption: “Maps of anticipation (0-6h) of the 10-year return period discharge threshold”. If I understand correctly, this is not what the figure shows.41) Page 19, lines 387-388: I would expect that the first point of the ROC for the 3 ensemble forecasts should be almost identical to that of RF0 scenario (which is almost the case). My interpretation is that the skill shown by the RF0 point (particularly the hits shown in Fig. 9) is due to the catchments’ response to past rainfall. Do you agree?42) Page 19, line 389: “All ensemble forecasts lead to an increase of the number of hits (9)”. Should “(9)” be “Fig. 9”?43) Sections 4.2 and 4.3. The results of Section 4.2 were obtained with the CINECAR model, and those of Section 4.3 with GRSDi. If no comparison between models is provided, what is the advantage of using 2 different models? At least some discussion about the 4.2-Hydrological anticipation capacity of GRSDi should be provided.44) Page 20-21, lines 424 – 434. Please, add the reference to Figures 11 and 12.45) Page 21, line 441: it should be clarified how both the “spread” and the “skill score” have been calculated. Also, in the y axis of Figs. 11a-16a, it seems that the units spread / skill are mm. Is this correct?46) Figures 11-14: Some of the discharge forecasts show obvious biases with respect to the reference (simulated discharge). Some interpretation about this could be interesting. How do these biases affect the spread / skill results and their interpretation?47) Figure 14 (panels b and c). The legend hides part of the results (observed and simulated discharges).48) Page 27: The title of section 5.2 could be more concise.49) The study focuses on the evaluation of flash-flood hydrometeorological forecasts at the event scale. It could be interesting to add some discussion about how/if the method could be applied to evaluate the performance of the forecasting system on a multi-event framework. Also, it could be interesting to include some discussion about the applicability of the method to other regions and countries.50) The Introduction states that “We adopt the point of view of end-users, who aim at providing resources and assistance for evacuations and rescue operations at a regional scale.” However, I have not found any analyses or results supporting this statement beyond a few statements in sections 5 and 6 that are quite general.Citation: https://doi.org/10.5194/nhess-2022-182-RC2 -
AC2: 'Reply on RC2', Maryse Charpentier--Noyer, 09 Oct 2022
We thank anonymous referee number 2 for the useful comments about this initial version of the manuscript. We provide hereafter our detailed answers and explanations about the modifications introduced in the revised version of the manuscript (which is already available). Thanks to this revision, we think the manuscript is now much easier to follow.
-
RC3: 'Comment on nhess-2022-182', Anonymous Referee #3, 23 Aug 2022
Overview
The manuscript shares its focus between the verification of accuracy of ensemble precipitation forecasts and different ways to convey (and analyse) the information provided in terms of discharge forecast by a meteo-hydrological forecasting chain. On the one hand, the theme of forecasting severe rainfall events is largely discussed in the introduction, but a in-depth analysis on the verification of output by NWP models (and related ensembles) is neglected. On the other hand, it is declared that a new framework for the evaluation of meteo-hydrological model coupling is proposed, but a proper review of past studies about this issue is not provided in the introduction and the proposed analysis recalls (and put together) different approaches commonly used in the operational practice of worldwide flood forecasting centers. In addition, many parts of the proposed evaluation framework appears as unsuitable for real-time applications. The overall feeling about the present manuscript is that it describes a very detailed post-event analysis, where the parts of novelty and originality do not clearly stand out. A clear choice about the main goal of the study should be taken and then properly developed. In my opinion, the strong point of the paper should be the availability of three meteorological ensemble products (even though it is not clear if a performance comparison for these ensemble is a novelty or past studies have already investigated the subject). The performance evaluations in terms of Quantitative Precipitation Forecasts (QPFs) should be based on a larger dataset and taking into consideration the concept of “fuzzy verification”. The analysis of outcomes provided by a meteo-hydrological model chain driven by the available ensemble QPFs is an added value for the study.
General comments
1) The main declared aim of the manuscript is to presents a methodological framework for the event-based evaluation of ensemble forecasts for floods, with respect to the needs of civil protection authorities. But, the proposed analysis is quite complex (several aspects and score to consider), maybe not suitable to the real-time operational practice of flood forecasting centers. The current contents of Section 4 sound more like a post-event analysis. In addition, the use of verification metrics like rank diagrams and ROC curves to analyse a single event has poor significance (Figs. 7 and 8). These metrics are commonly used over large datasets, in order to highlight statistical characteristics of the forecast product. The computation over a single event could be of some interest if compared to “historical” performances based on a long archive (for instance, for real time applications, the spread skill relationship in Figs. 11-16 does not add significant information with respect to the issue of warnings and outcomes shown in the remaining panels).
The statistical analysis in terms of discharge forecast should consider the whole period covered by QPFs (not just a flood event).
The coupling with an hydrological model represents a complementary tool for the verification of QPFs (since catchments can be seen as macro-raingauges with variable interception areas), given that the intermittence of the rainfall signal is dampened by the non-linearity in rainfall-runoff processes. In particular, the dynamics of the overall soil filling and depletion mechanisms and the flood routing play a fundamental role in determining results, as well as the role of the morphology of the basin that determines the time-space scale below which the variability of the rainfall field is dumped. The spatial integrating effect of a watershed filters out some of the spatial and temporal variability that complicate the point-by-point verifications that are more commonly used (Benoit et al., 2000).
2) The proposed evaluation of rainfall forecast is aimed to take into account spatial and temporal variability. The proposed analysis recalls in a some way the concept of the so-called “double-penalty effect” (i.e., the fuzzy verification introduced by Ebert, 2008 and Roberts and Lean, 2008, and discussed by Schwartz and Sobash, 2017). But the subject is treated neglecting specific past literature about this issue. Introduction and Section 4.1 should be revised accordingly.
Why has just one year of ensemble forecasts been used, given that products are available from 2018?
3) AROME-EPS and AROME-NWC with time lagging are merged to build an ensemble. Which are the reasons to merge the two products? Why is AROME-NWC with time lagging just used to build an ensemble?
4) The reasons for using two different hydrological models for different aims should be discussed
5) Contents of Section 4.3 should be reformulated taking into account the response times of the considered catchments. Outcomes depends concurrently by the accuracy of rainfall forecast for the event study as well as by the characteristic of the basin.
Specific comments
Line 7: “peak flood” in place of “flood rising limb”, given that the statistical analysis is focused on the maximum value of the discharge forecast
Lines 15-17: this statement is questionable due to the limited dataset; results do not support “to draw robust conclusions”. A reformulation needs.
Lines 69-72: this content (i.e., point i) ) recalls what has mainly been done in this manuscript
Lines 72-73: this content (i.e., point ii) ) is questionable by the light of general comment 2)
Lines 89-91: this content should be revised taking into account the general comment 2).
Lines 119-121: this subject should be deeper investigated in the introduction.
Lines 130-132: this subject should be deeper investigated taking into consideration the concept of fuzzy verification
Lines 133- 153: the proposed analysis and metrics fits well for a post-event analysis but they are not suitable for real time operational practices, with respect to the point of view of end-users.
Lines 145-146: this statement is questionable, given that an evaluation of performance based on the last hours is not indicative about the performance of hourly QPF in the following future time-steps
Lines 147-153: the use of rank diagrams to analyse a single event appears as no fully proper
Lines 157-162: these contents can be simply summarized by stating that the forecast is verified within a time window useful for the aims of end-users (warning issues)
Line 164: how is the 10-yr return period computed for the ungauged basins?
Lines 215-222: the use of observations which were not available in real time to calibrate the hydrological models limits the operational use of the proposed forecasting chain. As well as the peak discharges estimated at ungauged locations during a post-flood field campaign makes impossible to replicate the proposed framework for real time applications.
Line 233: if AROME-EPS is updated every 6 hours, it is not clear how figs 5, 6, 7, 11-16, B1-B6 show continuous hourly forecast with 1 to 6 hour lead times for each hourly time step.
Line 245: “an ideal distance for the present case study” fits better than “an ideal distance”.
Lines 248-253: is this comparison a novelty with respect to past literature? Why the 1-h lead time is not considered to build Fig.3? Additional rainfall accumulations larger than 5 mm/h should be considered to complete Fig.3.
Line 257: the description of the use of each model within the present study should be here introduced.
Line 270: specify the periods of the calibration and validation processes
Line 271: define the acronym KGE
Line 281: specify the time step at which this model runs
Line 291: specify in the text the period of the temporal evolution; what does J+1 mean?
Line 292: the 1-h lead-time has poor significance for the aims of end-users (i.e., warning issues). The 3-h lead time is more significant.
Caption Fig.4: define the acronym Hymex (or avoid to use it in the caption)
Line 304: it is not clear to what “rising limb” is referred
Lines 305-309: for certain selected outlets, hyetographs for 6-h rainfall amount (for a fixed or moving average time window) should be also useful to evaluate the impact of rainfall forecast on the hydrological forecast, due to the integrating effect of the spatial-temporal variability of rainfall by the rainfall-runoff processes
Lines 312-315: this statement is questionable, given that, in real-time, it is not known which areas will not contribute, even if a nowcasting forecast is available. The different scales involved between model predictions and raingauge measures, coupled with the high variability of the physical events and of the model errors, complicate the use of precipitation observations for atmospheric model validation, particularly in complex terrain endowed with a limited density of instruments. This areal variability enables to diagnose different problems associated with the atmospheric simulations, such as the quality of the larger scales simulated or the reliability of the description of small scale processes. The dependence between basins and sub-basins can be very useful to understand the possible problems of spatial shifting in the modelled atmosphere (Benoit et al., 2000; Jasper and Kaufmann, 2003).
Line 319: the 1-h lead-time has poor significance for the aims of warning issues (observed rainfall plays the major role in the modelled basin response for this lead time). The 6-h lead time is more significant.
Line 332: the comment for line 319 is valid also here
Caption Fig.6: “amount” in place of “rates”
Lines 345-350: these considerations should be done on the discharge ensemble (not on the ensemble QPFs), due to the non-linearity in rainfall-runoff processes
Line 360: which is the need to run the model at 15-min time resolution?
Lines 364-365: how is the 10-yr return period computed for all the sub-basins (I guess that many of them are ungauged basins)?
Line 370: which hydrological runs were used to built Fig.8?
Caption Fig.8: specify what represent the points on each line
Line 372: which is the starting time of RFO? How long is the RFO run driven by observed rainfall before the rainfall is set to zero?
Lines 376-377: specify in the text the number of missed detections (as done for false alarms at line 379)
Line 381: “contrasted effects” is not clear to what refers to.
Lines 383-386: these considerations could be misleading (the non-linearity in rainfall-runoff processes plays a major role; it is not an effect of what percentile to consider)
Caption Fig.8: are river gauge level available every 15 minutes? How can hits be computed everywhere with a 15-min time step?
Line 388: “rainfall forecast products” in place of “ensemble rainfall forecast products” (given the general validity of the sentence)
Lines 390-392: the impact of RFO depends on the concentration time (i.e., the response time of the watershed to the rainfall) of the considered basins. Related false alarms decrease with the lead-time increasing (except for systematic errors in the hydrological simulation).
Fig. 10: in the labels, the word “Ensemble” is not clear to what refer to
Line 394: false alarms and misses should be also evaluated as function of different anticipation times
Lines 403-406: the question is doubtful, given that the concentration time strongly influences outcomes and the corresponding evaluation.
Lines 409-414: which is the sense of the analysis in terms of PC? Is PC computed in the same way used for scores shown in Fig.9?
Lines 416-419: this sentence highlights the limit of the present manuscript, given that the proposed framework cannot be applied in real-time
Lines 419-421: this statement has no sense (with respect the aims of flood warning). The accuracy of the rainfall forecast influences the quality of the hydrological forecast, but the use of RFO cannot be considered an alternative solution.
Line 425: quantify the size of the catchments related to outlets 1 and 2
Line 426: have outlets 1 and 2 weak reaction to rainfall in general or just for this event?.
Line 429: quantify the size of the catchments related to outlets 3 and 4
Line 432: quantify the size of the catchments related to outlets 5 and 6
Line 441: briefly recall the definition of the spread/skill score and specify if it is referred to rainfall or discharge forecast
Line 443: the choice of the lead-time should be appropriate to the concentration time of the investigated catchment to analyze outcomes. Otherwise, the outcomes seems to depend on the lead time of rainfall forecast
Line 445: wrong label for the outlet number in Figs of appendix B (outlet 4 for all the graphs)
Lines 462-465: maybe, the outcome is affected by a spatial scale of the shift which is not optimal for the investigated catchment
Line 471: which is the concentration time for these outlets?
Line 481: typing error
Line 495: this outcome is likely influenced by the concentration time of investigated outlets
Fig.14: in all the graphs, move the legend panel in order to do not cover lines of results
Lines 525-527: the reasons for this outcome are the same cited at line 514 (influence of the concentration time of investigated outlets). Reformulate the sentence.
Fig.16: in the graphs b) and c) move the legend panel in order to do not cover lines of results
Lines 548-554: a map displaying concentration times of investigated outlets satisfy the need. The threshold anticipation maps in Fig.9 describe just a case study related to the specific case study and forecast products. It cannot be used in general terms for flood warning purposes.
Lines 557-561: the meaning of “anticipation time” may be misunderstood. It derives by a combined effect of accuracy of currents forecast and response time of the investigated outlet.
Lines 575-579: this analysis is significant when performed over a long dataset
Lines 584-595: the sense of these considerations is related to the role of QPFs in general, not specifically to ensembles.
Lines 596-601: the gain is due to NWC. Which is the added value to use NWC+EPS rather than just NWC?
Lines 619-621: maybe, the extension of 20 km is not the optimal dimension for the investigated case study. An investigation about this issue is worth to be performed.
Line 639: have authors considered to apply spatial perturbations just to NWC members?
Lines 727-728: these contents may be misleading. The outcome depends specifically on the accuracy of the ensemble for the case study. It is not an information that can be estimated a priori by means of a statistical analysis and generally related to the lead time. It is strictly related to the investigated event and selected run of the ensemble. These contents can be referred just to a post-event analysis (and cannot be inferred for real-time operational practices).
References used in the review comment
Benoit R, Pellerin P, Kouwen N, Ritchie H, Donaldson N, Joe P, Soulis E (2000) Toward the use of coupled atmospheric and hydrologic models at regional scale. Mon Wea Rev 128: 1681–1706
Ebert, 2008: Fuzzy verification of high resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 51–64, doi:10.1002/met.25
Jasper K, Kaufmann P (2003) Coupled runoff simulations as validation tools for atmospheric models at the regional scale. Q J R Meteorol Soc 129: 673–693
Roberts, N. M., and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 78–97, doi:10.1175/ 2007MWR2123.1
Schwartz, C. S. and Sobash, R. A.: Generating probabilistic forecasts from convection-allowing ensembles using neighborhood approaches: A review and recommendations, Mon. Weather Rev., 145, 3397–3418, https://doi.org/10.1175/MWR-D-16-0400.1, 2017.
Citation: https://doi.org/10.5194/nhess-2022-182-RC3 -
AC3: 'Reply on RC3', Maryse Charpentier--Noyer, 09 Oct 2022
We thank referee n°3 for the feedback, showing that the objectives of the paper were probably not presented sufficiently clearly. We managed to improve this in the revised version of the manuscript (which is already available). Our detailed answers are attached.
The main misleading point was probably that the article does not aim at evaluating QPFs per se, but rather the performance of flood forecasts obtained by using these QPFs as input of rainfall-runoff models. The presented evaluation method can only be implemented a posteriori and not in real time. It aims to provide a first detailed and informative diagnosis of the performance of flood forecasts, for single major flood events where such forecasts are needed. Moreover, an immediate post-event analysis is often needed to understand better what went right or wrong during the flood forecasting and response.
-
AC3: 'Reply on RC3', Maryse Charpentier--Noyer, 09 Oct 2022