|This study presents applications and evaluations of a pre-existing flood inundation modeling tool named AutoRoute at seven sites. The authors claimed two specific contributions of their work: (1) previous work focused more on high flow events, but this work evaluated low- and medium-flow events, (2) the post-processing modules of the AutoRoute model was improved in terms of efficiency and accuracy (as claimed in the title). While the work is interesting, I have some major concerns that these contributions may not be significant enough to make this a stand-alone paper.|
First, authors have not presented any accuracy comparison between the old post-processing procedure and the updated post-processing scripts. Only one section mentioning the computation cost reduction (from 20min to 17.5min, L476) was mentioned. Therefore, the “improved accuracy” claim in the title was questioned. I suggest the authors to include detailed comparisons of F statistics using before/after updated post-processing scripts to more objectively document their contribution, which would also be beneficial to AutoRoute users. Otherwise, I would think ARPP is a pre-existing tool, and have limited knowledge on what the authors’ contributions are. Also, it seems ARPP is too simplified, and only deals with the weight calculation? Why not presenting a tool that incorporates the whole AutoRoute post-processing workflow, but only a script is specifically created for weight calculation? For example, users still need to rely on back-and-forth running and ArcGIS processing (L307), which are quite laborious. Why not automating all these processes and include them into the ARPP scripts?
Second, it seems to me the presented results contain uncertainties coming from several subjective decisions. For example, the authors mentioned a user-defined alpha parameter (L319), which can limit the maximum flood extent modeled. In addition, the input discharge values were also uncertain (L362). ni values (L370) follow exactly what’s used in Follum et al. (2017), without questioning its potentially invalidity. Given the moderate statistics presented (although comparable to previous studies), I would highly recommend some more in-depth dig into their parameter settings, to improve the autoroute performance (if it can be improved). Can authors provide some sensitivity analyses showing how the model performance vary with those parameters? In L372 there are some arguments but some plots would be more ideal to inform readers.
Third, have the authors consider using even higher-resolution DEM data in their Section 4.2? The reasons of not seeing improvement may be partly because the resolution is still not high enough. Another possibility is related to my previous comment #2, that the model performance may be very sensitive to the parameters used, which is not attributed to their DEM sources. So I have some reservations in authors’ conclusions without trying higher resolution DEM, or presenting more comprehensive parameter sensitivity analyses.
Lastly, was the computational cost reduction (described in Section 4.3) purely a result of the use of GDAL I/O functions (L477)? If so, I would think this is too simple of a fix to the post-processing tool in considering computational aspect of this tool. This simple fix would not progress this tool to a new version, in my view. Please justify.
L186: should be millions instead of thousands
L188: I don’t think citations to “Streamflow Prediction Tool (SPT) (Snow et al. 2016; Wahl 2016)” are relevant/appropriate here. Authors are only talking about NWM, and citations to these tools are wrong. Authors need to delete these wrong citations here.
L204: unclear as to what does “AutoRoute is operated in an ad-hoc basis” mean. Authors need to clarify this statement. Also, why only 70% of the world has SPT implementations?
L345: a figure showing the geographic locations of these seven sites would be helpful to readers. Also, more descriptive sentences can be provided to accompany the presented statistics in Section 4.1. Currently the authors only plainly presented statistics, but I suggest the authors discuss more about their difference. Is elevation the only difference that explain their different model performance?