A new skill score for ensemble flood maps: assessing spatial spread-skill with remote sensing observations
Abstract. An ensemble of forecast flood inundation maps has the potential to represent the uncertainty in the flood forecast and provide a location specific, probabilistic, likelihood of flooding. This gives valuable information to flood forecasters, flood risk managers and insurers and will ultimately benefit people living in flood prone areas. Spatial verification of the ensemble flood map forecast against remotely observed flooding is important to understand both the skill of the ensemble forecast and the uncertainty represented in the variation or spread of the individual ensemble member flood maps. Previously, a scale-selective approach has been used to evaluate a convective precipitation ensemble forecast. This determines a skilful scale of ensemble performance. By extending this approach through a new application we evaluate the spatial predictability and the spatial spread-skill of an ensemble flood forecast across a domain of interest. The spatial spread-skill method computes an agreement scale at grid level between each unique pair of ensemble flood maps (ensemble spatial spread) and between each ensemble flood map with a SAR-derived flood map (ensemble spatial skill). By comparing these we can determine the spatial spread-skill performance. These methods are applied to an example flood event on the Brahmaputra River in the Assam region of India, August 2017. Both the spatial-skill and spread-skill relationship vary with location and can be related to physical characteristics of the flooding event. Routine validation and mapping of spatial predictability in an operational system would allow better quantification of model systematic biases and uncertainties. This would be particularly useful for ungauged catchments and would enable targeted model improvements to be made across different parts of the forecast chain.
Helen Hooker et al.
Status: final response (author comments only)
RC1: 'Comment on nhess-2022-188', Seth Bryant, 04 Aug 2022
- AC1: 'Reply on RC1', Helen Hooker, 28 Nov 2022
RC2: 'Comment on nhess-2022-188', Anonymous Referee #2, 19 Oct 2022
- AC2: 'Reply on RC2', Helen Hooker, 28 Nov 2022
Helen Hooker et al.
Helen Hooker et al.
Viewed (geographical distribution)
The authors present a method for evaluating the accuracy of ensemble flood forecasts which may advance our ability to predict inundation more accurately. This provides a nice advancement from the recently published Hooker et al., (2022) (10.1016/j.jhydrol.2022.128170). However, the manuscript is difficult to read, impeding a full evaluation of the work. I would be grateful if the authors would consider the following:
Two pages are copied verbatim from Hooker et al., (2022). Instead, these should be summarized, and the reader directed to this other publication.
There are numerous grammatical issues, redundant sentences/phrases, imprecise/inaccurate vocabulary, and a confusing overall sequence/structure which make the manuscript difficult to follow. The authors should consider the perspective of the reader, striving to be as concise and logical as possible.
While I’m unfamiliar with the details of flood forecasts, I can imagine and appreciate the motivation for such a metric. However, I’m skeptical the method proposed is appropriate for application against a simulation-library like Flood Foresight. For example, if each inundation raster within the library is monotonically nested (i.e., cells become progressively more flooded), a neighborhood approach seems unnecessary. More information on the Flood Foresight simulation implemented in this study is needed to evaluate this properly.
Similarly, additional details of the application of the permanent water body layer (in both the SAR-derived layer and the Flood Foresight layer) are necessary to evaluate the utility of the proposed method to the case study. For example, if the same source layer pre-filter is implemented in both the ‘observed’ and the ‘simulation’ data, rewarding the simulation for accuracy in these cells seems inappropriate.
To demonstrate the utility of the metric, the authors should consider comparing against some alternative. When is the proposed two-phased sophisticated method more appropriate than existing simple methods?
Additional synthesis of the results would be helpful to demonstrate the utility of the proposed method. For example, the authors suggest the metric can provide some ‘link to physical processes’, but no discussion of this is provided for the case study. How can the metric help us understand the role of dynamic morphology and levee performance in ensemble accuracy? How should we use Figure 9? For emergency response?
The accuracy of the derived SAR layer should be evaluated carefully, and its quality demonstrated to the reader. If this ‘observed’ layer is poor, the case study is not useful.
More specific and detailed comments are provided in the attached pdf.
I thank the authors for their contribution, and I look forward to the revised manuscript.