the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improving Pluvial Flood Simulations with Multi-source DEM Super-Resolution
Abstract. Due to the limited availability of high-resolution topographic data, accurate flood simulation remains a significant challenge in many flood-prone regions, particularly in developing countries and in urban domains. While publicly available Digital Elevation Model (DEM) datasets are increasingly accessible, their spatial resolution is often insufficient for reflecting fine-scaled elevation details, which hinders the ability to simulate effectively pluvial floods in built environments. To address this issue, we implemented a deep learning-based method, which efficiently enhances the spatial resolution of DEM data, and quantified the improvement in flood simulation. The method employs a tailored multi-source input module, enabling it to effectively integrate and learn from diverse data sources. By utilizing publicly open global datasets, low-resolution DEM datasets (such as the 30 m SRTM) in conjunction with high-resolution multispectral imagery (e.g., Sentinel-2A), our approach allows to produce a super-resolution DEM, which exhibits superior performance compared to conventional methods in reconstructing 10 m DEM data based on 30 m DEM data and 10 m multispectral satellite images. Such superior performance translates, when applied to pluvial flood simulations, into significantly improved accuracy of floodwater depth and inundation area predictions compared to existing alternatives. This study underscores the practical value of machine-learning techniques that leverage publicly available global datasets to generate DEMs that allow enhancing flood simulations.
- Preprint
(18488 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on nhess-2024-207', Anonymous Referee #1, 19 Nov 2024
The paper introduces a deep-learning method that combines low reolution DEM and multi-spectral images to obtain a high-resolution DEM that is ultimately used for running a pluvial flood simulation. The authors also compare this approach with other DL and not methods showing its improved efficacy.
The manuscript is well written, clear, concise, and informative. As such I recommend publication with just few minor details that might further improve the quality of the paper.
Minor comments:
In the results/discussion section, I would emphasize that the difference between the RCAN and the RCAN-MS is mainly in the inputs used (if I followed everything correctly), thus further proving your point that the extra information coming from multi-spectral images is beneficial, since so far it seemed "just" a difference in method as you have with VDSR, for example.
Could you explain why do the results in terms of flood simulations look more consistent in Dataset II rather than in Dataset I, at least visually?
For example, in Figure 6, all interpolation methods seem to produce some sort of accumulation ponds in correspondence of the bifurcations of the rivers and the bicubic approximation results in a noisy pattern. However, that does not seem the case for Figure 8 with Dataset II. Do you have any clue why?I think you could also comment further on why is the IoU very low (despite the proportional increase) for high thresholds of water depths.
In terms of metrics you could also consier adding a different metric such as the critical success index (CSI), which has been used in several flood studies.While most figures are of high quality, I think Figure 7 and 9 can be better, despite already being informative. Consider changing their style.
Citation: https://doi.org/10.5194/nhess-2024-207-RC1 -
AC1: 'Reply on RC1', Yue Zhu, 14 Feb 2025
We sincerely appreciate the time and effort the reviewer invested in evaluating our manuscript and providing insightful comments. We have carefully considered each comment and provide a detailed point-by-point response to each comment as below.
- In the results/discussion section, I would emphasize that the difference between the RCAN and the RCAN-MS is mainly in the inputs used (if I followed everything correctly), thus further proving your point that the extra information coming from multi-spectral images is beneficial, since so far it seemed "just" a difference in method as you have with VDSR, for example.
Thank you for this suggestion. We agree with this and will emphasise in the manuscript that the primary distinction between RCAN and RCAN-MS lies in the inputs used, as the tailored input layers to processing multi-sourced inputs. This added information supports the advantage of using multi-spectral images and strengthens the argument that they contribute to improved model performance. We can revise the discussion section to clarify this as follows:
(Line 138) “This study adopts a multi-source method for DEM super-resolution, utilizing the RCAN as the backbone structure. The proposed method, referred to as RCAN-Multispectral (RCAN-MS), incorporates a tailored multi-source and multi-scale input module, which is the key distinction from the original RCAN.”
(Line 147) “The tailored multi-source input module is integrated into the model structure before the first layer of the RCAN backbone structure (Fig. 1).”
(Line 400) “The improved performance of RCAN-MS in flood simulation, compared to its backbone method RCAN, underscores the value of incorporating multispectral data. The additional information provided by the multispectral images enhances terrain representation and reduces noise in the super-resolution DEM, thus leads to more accurate flood simulation results.”
- Could you explain why do the results in terms of flood simulations look more consistent in Dataset II rather than in Dataset I, at least visually? For example, in Figure 6, all interpolation methods seem to produce some sort of accumulation ponds in correspondence of the bifurcations of the rivers and the bicubic approximation results in a noisy pattern. However, that does not seem the case for Figure 8 with Dataset II. Do you have any clue why?
Thank you for raising this point. A potential explanation for the difference in flood simulation results between the two datasets may stem from the terrain characteristics of the study areas. As shown in Figures 4 and 5, the test area in Dataset 1 is relatively flat, while the second test area in Dataset 2 has a hillier terrain. In Dataset 1, the flatter landscape leads to a more diffuse distribution of floodwater, which can result in less distinct patterns and variability in the simulation results. In contrast, the hilly terrain of Dataset 2, even with bicubic interpolation, naturally facilitates more concentrated floodwater accumulation in certain areas, resulting in relatively more consistent simulation outcomes across different methods.
We can include this discussion in the manuscript as follows (line 360): “In addition, the terrain characteristics can influence the effectiveness of interpolation and super-resolution methods in flood simulation. Specifically, the improvement in flood simulation maps achieved by RCAN-MS is more pronounced in Dataset 1 than in Dataset 2. A key factor contributing to this discrepancy is the difference in terrain between the two datasets. As shown in Fig. 5 and Fig. 6, Dataset 1 features a relatively flat landscape, while Dataset 2 is characterized by hillier topography. In the flatter terrain of Dataset 1, floodwater tends to be more diffusely distributed, resulting in less distinct patterns and greater noise in the simulation results generated by baseline methods (e.g., bicubic interpolation). In contrast, the hilly terrain of Dataset 2 naturally promotes more concentrated water accumulation in specific areas, leading to more visually coherent flood patterns across different methods, even with bicubic interpolation. Therefore, the improvement provided by the proposed super-resolution method tends to be more significant in flatter regions, where its effects are more pronounced.”
- I think you could also comment further on why is the IoU very low (despite the proportional increase) for high thresholds of water depths.
Thank you for your question. The low IoU for high water depth thresholds, despite the proportional increase, can likely be attributed to the much smaller extent of deep floodwater areas. At higher thresholds, the areas of flooding become more concentrated in specific regions with much smaller spatial coverage, which may not align well with the predicted flood areas. In this case, at higher depth thresholds, even small misalignments between the predicted and actual flood zones can result in a significant decrease in IoU. While the proportional increase suggests that the model is correctly identifying more flood-prone areas as the water depth threshold rises, the precision and spatial accuracy required to match the predicted and actual flood extents become more challenging.
We can make the corresponding revision in the manuscript as follows: (line 325) “It can be observed in Fig. 8 and Fig. 10 that, although the proportional increase in IoU indicates that the proposed methods are correctly identifying more flood-prone areas compared to baseline methods, the IoU for high water depth thresholds is much lower than for lower water depth thresholds. This can be attributed to the significantly smaller spatial extent of deep floodwater areas. At higher depth thresholds, even small misalignments between the predicted and actual flood zones can result in a substantial reduction in IoU. While it becomes more challenging to simulate flood extents at higher depth thresholds, flood simulation based on RCAN-MS still achieved the best performance in simulating deep floodwater areas compared to all baseline methods in both datasets.”
- In terms of metrics you could also consider adding a different metric such as the critical success index (CSI), which has been used in several flood studies.
Thank you for this suggestion. We incorporated Intersection over Union (IoU) as one of the metrics in our analysis. The formulas for IoU and Critical Success Index (CSI) are mathematically identical in the context of this study. Both metrics measure the overlap between the predicted and actual positive areas (True Positives, TP) relative to the total number of areas covered by both predicted positives (TP + FP) and actual positives (TP + FN), which can be expressed as: CSI=IoU= TP/(TP+FP+FN) We believe this provides an adequate measure of overlap and performance in our flood simulation results.
- While most figures are of high quality, I think Figure 7 and 9 can be better, despite already being informative. Consider changing their style.
To improve Fig 7 and 9, we changed the figure style to bar charts (as shown in the supplement).
-
AC1: 'Reply on RC1', Yue Zhu, 14 Feb 2025
-
RC2: 'Comment on nhess-2024-207', Seth Bryant, 10 Dec 2024
The authors extend the practice of super-resolution (SR) for DEM by including multi-spectral image inputs. By enhancing the resolution of SRTM using this Sentinel-2A data, they demonstrate the performance improvement provided by their method for 2 case studies. The study is extended by comparing the performance of the resulting DEMs in a pluvial flood simulation.
From what I can tell, this is a well-done study, advances SR-DEM, and should be published. However, I'm not sure the work fits within NHESS... rather than journals more focused on ML or hydrodynamic modelling, like those where the references studies are published. i.e., there is only one NHESS reference in the bib (and this ref is intro fluff not related to the work). But maybe this is more of an editorial decision.
Further, the authors could consider the following suggestions:
- I appreciate the use of pluvial flood simulations to generate an additional performance metric for the SR, however I think this work should be de-emphasized and moved to the supplement... leaving the manuscript more focused on the SR architecture and experiment (which should be better described). i.e., while the pluvial sections take up roughly half the current manuscript, this work does not really influence the conclusion or abstract. You could instead focus on how traditional raster metrics (e.g., MAE) are inadequate for flood simulations... this would be an interesting paper... but a different paper.
- Code and data should be made public. e.g., there is no way to properly review such a manuscript without access to the code and data.
- The comparison against 4 baseline methods is a great idea and communicates the benefits of the proposed method well. However, these baseline methods (like the proposed method) need to be described adequately for reproducibility. i.e., the training and hyperparameters used. The authors should take care to provide as 'fair' a comparison as possible.
- The transferability of the method to different regions should be better explored and discussed (does the model need to be retrained for each region?)
Attached are additional comments. Thank you for the nice paper, it was a pleasure to read.
-
AC3: 'Reply on RC2', Yue Zhu, 14 Feb 2025
We sincerely appreciate the time and effort the reviewer invested in evaluating our manuscript and providing insightful comments. We have carefully considered each comment and provide a detailed point-by-point response to each comment as below.
- From what I can tell, this is a well-done study, advances SR-DEM, and should be published. However, I'm not sure the work fits within NHESS... rather than journals more focused on ML or hydrodynamic modelling, like those where the references studies are published. i.e., there is only one NHESS reference in the bib (and this ref is intro fluff not related to the work). But maybe this is more of an editorial decision.
Thank you for raising this point, we tend to believe that, as NHESS is an interdisciplinary journal publishing research with topics related to various aspects of natural hazards, studies on investigating input data quality to facilitate enhanced hazard mapping fit the journal’s scope. We also found some other published NHESS papers in this field. For instance, Blöschl et al. (2024) investigated hyper-resolution flood hazard mapping, which involves enhanced DEM data for improved flood simulation. Miller et al. (2022) tested the impact of different spatial resolutions of DEM data on snow avalanche modelling. Löwe & Arnbjerg-Nielsen (2020) explored the effect of data resolution on urban pluvial flood risk assessment.
Blöschl, G., Buttinger-Kreuzhuber, A., Cornel, D., Eisl, J., Hofer, M., Hollaus, M., … Stiefelmeyer, H. (2024). Hyper-resolution flood hazard mapping at the national scale. Natural Hazards and Earth System Sciences, 24(6), 2071–2091. doi: 10.5194/nhess-24-2071-2024
Miller, A., Sirguey, P., Morris, S., Bartelt, P., Cullen, N., Redpath, T., … Bühler, Y. (2022). The impact of terrain model source and resolution on snow avalanche modeling. Natural Hazards and Earth System Sciences, 22(8), 2673–2701. doi: 10.5194/nhess-22-2673-2022
Löwe, R., & Arnbjerg-Nielsen, K. (2020). Urban pluvial flood risk assessment – data resolution and spatial scale when developing screening approaches on the microscale. Natural Hazards and Earth System Sciences, 20(4), 981–997. doi: 10.5194/nhess-20-981-2020
We can add the related studies as references in the manuscript as follows:
“Methods to enhance the spatial resolution of DEM data have been widely adopted across various geospatial applications to improve risk estimation. These advancements have significantly enhanced the accuracy and reliability of natural hazard mapping, including flood prediction (Löwe & Arnbjerg-Nielsen, 2020; Tan et al., 2024), landslide modelling (Brock et al., 2020), volcanic flow assessment (Deng et al., 2019), and snow avalanche forecasting (Miller et al., 2022).”
(line 75) “The benefits of integrating multi-source inputs in remote sensing applications have been increasingly recognised, as the combination of complementary data sources enhances the robustness and reliability of model performance (J. Li et al., 2022). ... Blöschl et al. (2024) integrated additional bathymetric information into the DEM to enhance national-scale flood hazard mapping.”
- I appreciate the use of pluvial flood simulations to generate an additional performance metric for the SR, however I think this work should be de-emphasized and moved to the supplement... leaving the manuscript more focused on the SR architecture and experiment (which should be better described). i.e., while the pluvial sections take up roughly half the current manuscript, this work does not really influence the conclusion or abstract. You could instead focus on how traditional raster metrics (e.g., MAE) are inadequate for flood simulations... this would be an interesting paper... but a different paper.
Thank you for this comment. We would like to argue that the section on pluvial flood simulation evaluation should remain in the main manuscript. This is because the study aims to improve pluvial flood simulation by enhancing DEM data through super-resolution techniques, addressing the critical issue of the lack of publicly available high-resolution DEMs for flood mapping. Examining the effect of super-resolution DEM data on flood simulation is essential to the objectives of this study. Notably, quantifying the extent to which the proposed method improves flood simulation provides valuable insights for other researchers and practitioners considering whether to adopt this approach in their studies and work. Furthermore, the experimental results in the flood simulation section demonstrate that better performance in DEM super-resolution, as measured by traditional metrics, does not necessarily lead to improved performance in flood simulation. Therefore, to validate the effectiveness of the proposed super-resolution approach in enhancing flood simulation, it is crucial to include the flood simulation analysis as an integral part of this study.
- Code and data should be made public. e.g., there is no way to properly review such a manuscript without access to the code and data.
Thank you for raising this point. We agree to openly share the code and data of this study, except for the high-resolution data for Dataset 2, which is from TanDEM-X. This is because TanDEM-X data requires proposal submission and approval for data acquisitions. This information can be add in the manuscript as follows: “Except for the data from TanDEM-X, which requires a proposal submission and approval for data acquisition, all other data and codes are openly accessible here: https://zenodo.org/records/14868516"
- The comparison against 4 baseline methods is a great idea and communicates the benefits of the proposed method well. However, these baseline methods (like the proposed method) need to be described adequately for reproducibility. i.e., the training and hyperparameters used. The authors should take care to provide as 'fair' a comparison as possible.
Thank you for this suggestion, we will revise the manuscript to provide more details on training strategies and hyperparameters as follows:
“The training of DEM Super-resolution models was established and trained with PyTorch on two NVIDIA GeForce RTX 4090 GPUs on High-performance computing (HPC) clusters. All baseline models were implemented using the default parameter settings for hidden layers as specified in their original papers. The input and output layer configurations were adapted to suit the task of DEM super-resolution. All baseline methods utilized single-band input and output layers, except for RCAN-MS, which was configured with five input bands. These five bands comprised a single-band low-resolution DEM concatenated with four-band multispectral satellite images, enabling RCAN-MS to leverage additional spectral information. All the test methods adopted the same training strategy, they were all trained with a batch size of 8 and a learning rate of 110-4. With an adaptive learning rate scheduler, the learning rate decreases to a fraction of 0.8 when the validation loss stops decreasing for 50 epochs. The optimizer adopted for all the methods is Adam with default momentum parameters. The loss function is Mean Absolute Error (MAE). Regarding the stopping criteria for mode performance evaluation, all the models were trained for 200 epochs with the data in the training set, after which the epoch yielding the smallest MAE values on the validation set was selected for further performance elevation on test sets.”
- The transferability of the method to different regions should be better explored and discussed (does the model need to be retrained for each region?)
Thank you for highlighting this important aspect. Our model is trained and evaluated separately on data from specific study areas, and while the results demonstrate its effectiveness in this context, its transferability to other geographic areas has not been thoroughly tested, and it is expected that certain retraining or parameter-tuning would be required to achieve a good performance in other regions. Therefore, we addressed this point as a limitation in the manuscript as follows:
”In addition, one should observe that the model was trained and evaluated on specific geographic areas. Thus, its straight transferability without minor adjustments (e.g. fine tuning of parameterisation) may not be given for granted in other regions, particularly those with differing terrain characteristics. However, retraining or fine-tuning of the model is expected to allow, in general, an effective implementation in different regions.”
In addition, regarding the additional comments in the attachment, we have provided our response in the supplement. Thank you very much for your time and effort in reviewing this manuscript.
-
RC3: 'Comment on nhess-2024-207', Anonymous Referee #3, 22 Dec 2024
This paper presents a new deep learning method to downscale coarse, satellite-derived terrain data to 10m resolution by exploiting higher resolution multispectral image data. The results of the method are validated for two case areas, both through direct comparison against high resolution terrain data, and by comparing pluvial flood simulations with varying terrain inputs. Several benchmark downscaling methods are included in the comparison.
I think this is a very good paper. I very much appreciate the investigation of the effects of downscaling methods on the final application, i.e. pluvial flood simulation, and I think it is well placed within the scope of the journal. I have only some very minor comments that are mentioned below and don't require further review. I suggest accepting the paper.
Comments:
- Language - please perform a proofread, there are several typos distributed throughout the paper
- Units - please include units in the results figures e.g. Fig. 7 and 9. Similarly, the scores in Table 2 require units. I believe that the test in Hongkong does not have an average error of 8m, but how should we interpret an MSE of 66???
- Figure 1 - please include resolutions in the figure. The entire residual in residual block operates in 30m resolution. In addition, the upscaling module is not described. I suppose this is another 2D convolution. Does it receive a skip connection with high resolution as input?Citation: https://doi.org/10.5194/nhess-2024-207-RC3 -
AC2: 'Reply on RC3', Yue Zhu, 14 Feb 2025
We sincerely appreciate the time and effort the reviewer invested in evaluating our manuscript and providing insightful comments. We have carefully considered each comment and provide a detailed point-by-point response to each comment as below.
1. Language - please perform a proofread, there are several typos distributed throughout the paper
Thank you for your comments. We performed a careful proofreading and corrected typos in this manuscript.
2. Units - please include units in the results figures e.g. Fig. 7 and 9. Similarly, the scores in Table 2 require units. I believe that the test in Hongkong does not have an average error of 8m, but how should we interpret an MSE of 66???
Thank you for pointing this out. The metrics in Table 2 require proper units. PSNR and SSIM are unitless. MAE is measured in meters (m), MSE is measured in square meters (m2), and reflects the mean of squared elevation differences, making it more sensitive to outliers. The MSE of 66.6251 aligns with the average MAE of 5.8181 when considering that MSE is more sensitive to outliers. As such, MSE provides a measure of error variability, where large errors have more influence. This highlights the importance of interpreting both MAE and RMSE for a complete understanding of model performance. We can includ these units in the revised table to clarify the interpretations (as shown in the supplement).
3. Figure 1 - please include resolutions in the figure. The entire residual in residual block operates in 30m resolution. In addition, the upscaling module is not described. I suppose this is another 2D convolution. Does it receive a skip connection with high resolution as input?
Thank you for this comment. In terms of the resolution in Figure 1, we can includ resolution information for the low-resolution DEM data, high-resolution multi-spectral satellite images, and the super-resolution DEM output. We argue that, since the data passing through convolutional layers are tensors that may not have explicit physical units, it is more appropriate to represent the height and width of the spatial dimensions as spatial resolution in the figure. Therefore, we have added the spatial resolution (height, width) in this figure (as shown in the supplement).
Regarding the upscaling module, we can add a description regarding the upscaling layer as follows: “After that, the concatenated multi-source input is passed through the RCAN backbone structure, which consists of RIR blocks and includes a 2D convolutional layer at the end of the model structure to upscale the data flow to the size of the high-resolution DEM map.” We did not add a skip connection between the input and the high-resolution output, this is because there is a long skip connection at the end of the input module and before the upscaling layer at the end of the model structure, which are just a few layers to reach the final output.
-
AC2: 'Reply on RC3', Yue Zhu, 14 Feb 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
214 | 77 | 16 | 307 | 15 | 11 |
- HTML: 214
- PDF: 77
- XML: 16
- Total: 307
- BibTeX: 15
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1