the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Application of machine learning for integrated flood risk assessment: Case study of Hurricane Harvey in Houston, Texas
Abstract. Flood risk, encompassing hazard, exposure, and vulnerability is defined concerning potential losses. Machine learning techniques have gained traction among researchers to address the complexities of multi-variable flood risk assessment models and overcome issues associated with non-linear relationships. However, the focus has primarily been on flood hazard prediction rather than comprehensive risk assessment and damage estimations. Therefore, there is a need for experiments that combine risk elements using such methods. To address this need, this study utilized the Random Forest algorithm to analyze the correlations between the physical flood damage caused by Hurricane Harvey in 2017 in Houston, Texas and certain hazard, exposure, and vulnerability-related variables. The study identified poorly drained soils as the primary contributor to the losses, followed by population density and the ratio of developed lands with medium intensity. The study's findings also explored the reasons for the unexpectedly low importance of social vulnerability factors compared to the environmental justice concept. These findings and conclusions can provide insights to planners and stakeholders enhancing their understanding of the underlying causes contributing to flood risk. Future research can expand upon this study's methodology and findings by incorporating additional factors related to climate change.
- Preprint
(12382 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on nhess-2023-113', Anonymous Referee #1, 19 Sep 2023
This study explores the application of machine learning (ML) techniques in the context of multi-variable flood loss estimation, with the inclusion of environmental and socio-economic factors. The approach is tested for the event of Hurricane Harvey in Houston, Texas.
I would like to begin my review report with a general concern that has been increasingly relevant in studies like this employing ML approaches. Indeed, while ML has great capabilities in analyzing complex problems characterized by non-linearities between the different variables at hand, it is becoming increasingly common to present studies based on these approaches that actually do not significantly contribute to the advancement of knowledge. In this case, the research objective appeared quite interestingly, i.e., to assess the feature importance of environmental and socio-economic characteristics in shaping flood risk in urban areas.
However, the variables selected here and the scale of analysis (census tract) may not be entirely suitable for obtaining meaningful insights. As evidenced by the reported results on variable importance (page 21), the outcomes appear rather obvious, with variables like the percentage of drainage soils, population density and the percentage of medium-density developed areas emerging as the most crucial factors. This somewhat diminishes the discussion’s focus on the significance of socio-economic and environmental aspects. This issue is also evident in the somewhat shallow and unsubstantial discussion section of the manuscript, which fails to provide significant insights into the obtained results, leaving the research questions posed in the introduction largely unanswered.
In my view, while ML is undoubtedly a valuable tool, its application should ideally lead to new findings that significantly contribute to the advancement of a specific field, rather than merely representing an application or exercise that ultimately yields "some results". In the case of the present paper, this concern is further compounded by the strong similarities with prior studies (e.g., Knighton et al. (2020)), both in terms of methodology and objectives, raising additional questions about the novelty and originality of the work.
From the methodological point of view, the corresponding section effectively outlines the division of data into training and validation sets, but it would be beneficial if the Authors could provide more details on whether cross-validation techniques were employed to ensure the robustness of the model. Furthermore, the definition and brief description of the evaluation metrics used to assess model performance should be included in the methodological section, rather than appearing abruptly in the results section.
Finally, there would be also room for improving the overall quality of the presentation and writing, by avoiding the repetition of certain concepts and information multiple times throughout the manuscript (e.g., explanations of what machine learning algorithms are and their capabilities) and by combining the several maps from Figures 4 to 10 into fewer figures.
Citation: https://doi.org/10.5194/nhess-2023-113-RC1 -
AC1: 'Reply on RC1', Behrang Bidadian, 28 Sep 2023
Dear Referee,
We appreciate your thoughtful review of our manuscript and the time you invested in providing valuable feedback. We have carefully considered your comments and would like to respond respectfully, addressing some of your concerns.
Contributions to the Advancement of Knowledge: We firmly believe that our paper significantly contributes to the advancement of knowledge in the field of flood risk assessment. By employing machine learning techniques to analyze the complex interplay of environmental and socio-economic factors in flood loss estimation, we expanded upon previous studies by providing a more comprehensive approach and presenting/analyzing the results.
Scale of Analysis and Variable Selection: We acknowledge your concern about the scale of analysis. The choice of a census tract level for analysis was primarily driven by the availability of detailed socioeconomic data from the Census Bureau. Data limitations compelled us to work at this level, and we have noted this constraint in our discussion section.
Regarding the variables selected, at the time of manuscript submission, our approach represented the most comprehensive and suitable set of variables at least for our study area. Scientific research often builds on existing knowledge, and our study aimed to refine and enhance the understanding of flood risk by considering these variables in a location-specific context.
Outcomes: While some of our results may seem intuitive or common-sensical, it is important to remember that scientific experiments are essential to validate and quantify these ideas, especially in a location- and time-specific context. Additionally, what might appear "obvious" can vary depending on the audience, and it is our duty as researchers to provide empirical evidence to support our findings. We also believe that our study sheds light on the less “obvious” role of development pattern, which may not be immediately apparent without rigorous analysis.
Discussion Section: You commented on the discussion section labeling it as "shallow and unsubstantial" without mentioning any specific reasons for that. We tried to discuss the process limitations and the potential reasons for obtaining results not aligned with the expectations of the environmental justice notion. We believe this approach is neither shallow nor unsubstantial.
Research Question: Our research question was: What role do environmental and socioeconomic characteristics play in shaping flood risk in urban areas? We believe this question was addressed in our study, as we identified the significant impact of soil type, development pattern type, and population density on flood risk in the context of Hurricane Harvey and less significant role of the other factors.
Originality and Similarities with Prior Studies: We respectfully disagree with the assertion that our study lacks originality due to similarities with prior works such as Knighton et al. (2020). As indicated on page 7 of our manuscript, we explicitly state that our research expands upon previous studies like the one mentioned by incorporating a more comprehensive set of variables, including development patterns and additional demographic/socioeconomic factors. Furthermore, our study's application to a specific historical event, Hurricane Harvey, adds a unique dimension to the research landscape.
Presentation Quality: First, we appreciate your concern regarding the inclusion of multiple maps and will consider reducing them during the revision process. Then, it is important to note that while explaining fundamental concepts about machine learning algorithms and processes may appear unnecessary to experts in the field, our target journal does not exclusively focus on machine learning or AI. Therefore, these explanations are necessary to cater to a broader readership within the journal's audience.
In conclusion, we appreciate your feedback and would make necessary revisions to improve the quality of our manuscript. We are confident that our study contributes to the field of flood risk assessment, and we hope that our responses address your concerns adequately. If you have any further comments or suggestions, please do not hesitate to share them with us. Your guidance is invaluable in helping us refine our research.
Citation: https://doi.org/10.5194/nhess-2023-113-AC1
-
AC1: 'Reply on RC1', Behrang Bidadian, 28 Sep 2023
-
RC2: 'Comment on nhess-2023-113', Anonymous Referee #2, 12 Dec 2023
This manuscript refers to the role of environmental and socio-economic factors in flood damage in the case study of Hurricane Harvey, Houston, Texas.
Environmental and Socioeconomic Characteristics in Shaping Flood Risk:
The manuscript addresses the critical question of the role played by environmental and socioeconomic characteristics in shaping flood risk in urban areas. While the study identifies poorly drained soils, population density, and the ratio of developed lands as significant contributors to flood losses, the analysis lacks depth in explaining the complex interplay of these factors. A more nuanced exploration of the relationships and their implications for flood risk would strengthen the manuscript.Inclusion of Socioeconomic Variables in Prediction Model:
The manuscript emphasizes the importance of including socioeconomic variables in flood risk prediction models. However, it is unclear to what extent the inclusion of these variables improves the performance of the prediction model. A more detailed analysis, possibly through quantitative metrics or comparative studies, would provide a clearer understanding of the impact of socioeconomic variables on the predictive accuracy of the model.Definition and Quantification of Poorly Drained Soil:
The term "poorly drained soils" is crucial to the study, yet its definition and quantification are not sufficiently addressed. Reference to Lin and Billa (2021) is made, but the manuscript lacks clarity on how drainage capability was defined and quantified in the context of predicting average building damage ratios. Providing a clear and concise definition, along with the methodology, would enhance the transparency of the study.Clarity on Data Sources and Variable Definitions:
While the manuscript appropriately cites data sources, it falls short in providing clear definitions and methodologies for each variable. A more detailed exposition on how each variable was defined and calculated is essential for understanding the reliability of the analyses conducted. Improved transparency on data handling and preprocessing steps is warranted.Placement of Data Description Section:
The figures (4 to 10) seem to be more descriptive of the data gathering and preprocessing rather than direct results of the main analysis. To enhance clarity, it is recommended to include these details in a dedicated data description section preceding the results section. This would allow readers to better comprehend the data and its preparation before delving into the subsequent analyses.Limited Analysis in the Result/Discussion Section:
The result section, except data description, primarily focuses on the CPI analysis, providing only a quantitative aspect of the ML model's role in assessing the environmental and socioeconomic variables' impact. A more comprehensive evaluation of the model's reliability and performance in comparison to referenced researches would strengthen the manuscript. Additional quantitative metrics and a more expansive discussion of the ML model's outcomes are warranted. Additionally, the discussion section only offers general possible causes for result discrepancies from the hypothesis without further analysis to clarify the role of environmental and socioeconomic factors in flood damage.Overall, the manuscript has potential, but addressing these points will significantly enhance its clarity, depth, and contribution to the field.
Citation: https://doi.org/10.5194/nhess-2023-113-RC2 -
AC2: 'Reply on RC2', Behrang Bidadian, 27 Dec 2023
Dear Referee,
We appreciate your thorough review of our manuscript and the insightful comments you provided. Your feedback has been instrumental in identifying areas for improvement, and we are committed to addressing these concerns to enhance the clarity, depth, and contribution of our study.
- Environmental and Socioeconomic Characteristics in Shaping Flood Risk:
We acknowledge your point and plan to revise the manuscript by exploring the interrelationships among the identified variables, emphasizing both quantitative correlations (positive or negative) and spatial distribution patterns. This will provide a more nuanced understanding of their collective impact on flood risk in urban areas.
- Inclusion of Socioeconomic Variables in Prediction Model:
Your observation regarding the effectiveness of socioeconomic variables in our prediction model is duly noted. Our primary objective was to aid Houston in future planning for and management of flooding rather than introducing novel machine learning (ML) models or insights. We applied ML for that purpose as a tool to overcome the data complexities and answer questions about flood risk.
In this study, the socioeconomic variables appeared less effective than expected. We tried to analyze potential reasons for that as mentioned in the discussion section. Although we mentioned “incorporating demographic and socioeconomic data can potentially enhance the estimation accuracy of flood risk and damage models (Knighton et al., 2020)”, due to the above reason, it was not possible to measure the extent of improvement by these indicators in our model. Additionally, we analyzed the damage ratios and socioeconomic variables in only a snapshot in time. We agree that more detailed studies related to a more extended period can be conducted, as future studies, to measure the improvement.
- Definition and Quantification of Poorly Drained Soil:
We agree with your suggestion and will upgrade the manuscript with more comprehensive descriptions of the soil drainage classes, especially the Poorly Drained Soils based on the Natural Resources Conservation Service (NRCS) definitions.
- Clarity on Data Sources and Variable Definitions:
Your comment on the need for more precise definitions of variables is well taken. We tried to be concise about the variables. However, based on your feedback, we will update the manuscript with more descriptive metadata, ensuring a transparent presentation of data sources and variable definitions.
- Placement of Data Description Section:
Considering your comment, we will reorganize the presentation of maps, focusing on those that better illustrate the final results.
- Limited Analysis in the Result/Discussion Section:
As described earlier, the main focus of this article is not machine learning modeling, therefore it has not been submitted to an ML journal. Considering the main purpose, we believe the results of this study should be presented in a way that highlights the most significant factors that benefit the city for better preparation for future flood events and for implementing risk mitigation efforts more effectively. In addition, we discuss the potential reasons for the misalignment of the results with the expectations of the environmental justice theory.
In conclusion, we sincerely appreciate your constructive feedback. Your guidance is invaluable, and we are eager to refine our research based on this discussion. We believe our study contributes to the field of flood risk assessment and management, particularly in the context of Houston as a highly flood-prone city.
We look forward to the opportunity to modify the manuscript in line with this discussion and make it published in the journal.
Citation: https://doi.org/10.5194/nhess-2023-113-AC2
-
AC2: 'Reply on RC2', Behrang Bidadian, 27 Dec 2023
Status: closed
-
RC1: 'Comment on nhess-2023-113', Anonymous Referee #1, 19 Sep 2023
This study explores the application of machine learning (ML) techniques in the context of multi-variable flood loss estimation, with the inclusion of environmental and socio-economic factors. The approach is tested for the event of Hurricane Harvey in Houston, Texas.
I would like to begin my review report with a general concern that has been increasingly relevant in studies like this employing ML approaches. Indeed, while ML has great capabilities in analyzing complex problems characterized by non-linearities between the different variables at hand, it is becoming increasingly common to present studies based on these approaches that actually do not significantly contribute to the advancement of knowledge. In this case, the research objective appeared quite interestingly, i.e., to assess the feature importance of environmental and socio-economic characteristics in shaping flood risk in urban areas.
However, the variables selected here and the scale of analysis (census tract) may not be entirely suitable for obtaining meaningful insights. As evidenced by the reported results on variable importance (page 21), the outcomes appear rather obvious, with variables like the percentage of drainage soils, population density and the percentage of medium-density developed areas emerging as the most crucial factors. This somewhat diminishes the discussion’s focus on the significance of socio-economic and environmental aspects. This issue is also evident in the somewhat shallow and unsubstantial discussion section of the manuscript, which fails to provide significant insights into the obtained results, leaving the research questions posed in the introduction largely unanswered.
In my view, while ML is undoubtedly a valuable tool, its application should ideally lead to new findings that significantly contribute to the advancement of a specific field, rather than merely representing an application or exercise that ultimately yields "some results". In the case of the present paper, this concern is further compounded by the strong similarities with prior studies (e.g., Knighton et al. (2020)), both in terms of methodology and objectives, raising additional questions about the novelty and originality of the work.
From the methodological point of view, the corresponding section effectively outlines the division of data into training and validation sets, but it would be beneficial if the Authors could provide more details on whether cross-validation techniques were employed to ensure the robustness of the model. Furthermore, the definition and brief description of the evaluation metrics used to assess model performance should be included in the methodological section, rather than appearing abruptly in the results section.
Finally, there would be also room for improving the overall quality of the presentation and writing, by avoiding the repetition of certain concepts and information multiple times throughout the manuscript (e.g., explanations of what machine learning algorithms are and their capabilities) and by combining the several maps from Figures 4 to 10 into fewer figures.
Citation: https://doi.org/10.5194/nhess-2023-113-RC1 -
AC1: 'Reply on RC1', Behrang Bidadian, 28 Sep 2023
Dear Referee,
We appreciate your thoughtful review of our manuscript and the time you invested in providing valuable feedback. We have carefully considered your comments and would like to respond respectfully, addressing some of your concerns.
Contributions to the Advancement of Knowledge: We firmly believe that our paper significantly contributes to the advancement of knowledge in the field of flood risk assessment. By employing machine learning techniques to analyze the complex interplay of environmental and socio-economic factors in flood loss estimation, we expanded upon previous studies by providing a more comprehensive approach and presenting/analyzing the results.
Scale of Analysis and Variable Selection: We acknowledge your concern about the scale of analysis. The choice of a census tract level for analysis was primarily driven by the availability of detailed socioeconomic data from the Census Bureau. Data limitations compelled us to work at this level, and we have noted this constraint in our discussion section.
Regarding the variables selected, at the time of manuscript submission, our approach represented the most comprehensive and suitable set of variables at least for our study area. Scientific research often builds on existing knowledge, and our study aimed to refine and enhance the understanding of flood risk by considering these variables in a location-specific context.
Outcomes: While some of our results may seem intuitive or common-sensical, it is important to remember that scientific experiments are essential to validate and quantify these ideas, especially in a location- and time-specific context. Additionally, what might appear "obvious" can vary depending on the audience, and it is our duty as researchers to provide empirical evidence to support our findings. We also believe that our study sheds light on the less “obvious” role of development pattern, which may not be immediately apparent without rigorous analysis.
Discussion Section: You commented on the discussion section labeling it as "shallow and unsubstantial" without mentioning any specific reasons for that. We tried to discuss the process limitations and the potential reasons for obtaining results not aligned with the expectations of the environmental justice notion. We believe this approach is neither shallow nor unsubstantial.
Research Question: Our research question was: What role do environmental and socioeconomic characteristics play in shaping flood risk in urban areas? We believe this question was addressed in our study, as we identified the significant impact of soil type, development pattern type, and population density on flood risk in the context of Hurricane Harvey and less significant role of the other factors.
Originality and Similarities with Prior Studies: We respectfully disagree with the assertion that our study lacks originality due to similarities with prior works such as Knighton et al. (2020). As indicated on page 7 of our manuscript, we explicitly state that our research expands upon previous studies like the one mentioned by incorporating a more comprehensive set of variables, including development patterns and additional demographic/socioeconomic factors. Furthermore, our study's application to a specific historical event, Hurricane Harvey, adds a unique dimension to the research landscape.
Presentation Quality: First, we appreciate your concern regarding the inclusion of multiple maps and will consider reducing them during the revision process. Then, it is important to note that while explaining fundamental concepts about machine learning algorithms and processes may appear unnecessary to experts in the field, our target journal does not exclusively focus on machine learning or AI. Therefore, these explanations are necessary to cater to a broader readership within the journal's audience.
In conclusion, we appreciate your feedback and would make necessary revisions to improve the quality of our manuscript. We are confident that our study contributes to the field of flood risk assessment, and we hope that our responses address your concerns adequately. If you have any further comments or suggestions, please do not hesitate to share them with us. Your guidance is invaluable in helping us refine our research.
Citation: https://doi.org/10.5194/nhess-2023-113-AC1
-
AC1: 'Reply on RC1', Behrang Bidadian, 28 Sep 2023
-
RC2: 'Comment on nhess-2023-113', Anonymous Referee #2, 12 Dec 2023
This manuscript refers to the role of environmental and socio-economic factors in flood damage in the case study of Hurricane Harvey, Houston, Texas.
Environmental and Socioeconomic Characteristics in Shaping Flood Risk:
The manuscript addresses the critical question of the role played by environmental and socioeconomic characteristics in shaping flood risk in urban areas. While the study identifies poorly drained soils, population density, and the ratio of developed lands as significant contributors to flood losses, the analysis lacks depth in explaining the complex interplay of these factors. A more nuanced exploration of the relationships and their implications for flood risk would strengthen the manuscript.Inclusion of Socioeconomic Variables in Prediction Model:
The manuscript emphasizes the importance of including socioeconomic variables in flood risk prediction models. However, it is unclear to what extent the inclusion of these variables improves the performance of the prediction model. A more detailed analysis, possibly through quantitative metrics or comparative studies, would provide a clearer understanding of the impact of socioeconomic variables on the predictive accuracy of the model.Definition and Quantification of Poorly Drained Soil:
The term "poorly drained soils" is crucial to the study, yet its definition and quantification are not sufficiently addressed. Reference to Lin and Billa (2021) is made, but the manuscript lacks clarity on how drainage capability was defined and quantified in the context of predicting average building damage ratios. Providing a clear and concise definition, along with the methodology, would enhance the transparency of the study.Clarity on Data Sources and Variable Definitions:
While the manuscript appropriately cites data sources, it falls short in providing clear definitions and methodologies for each variable. A more detailed exposition on how each variable was defined and calculated is essential for understanding the reliability of the analyses conducted. Improved transparency on data handling and preprocessing steps is warranted.Placement of Data Description Section:
The figures (4 to 10) seem to be more descriptive of the data gathering and preprocessing rather than direct results of the main analysis. To enhance clarity, it is recommended to include these details in a dedicated data description section preceding the results section. This would allow readers to better comprehend the data and its preparation before delving into the subsequent analyses.Limited Analysis in the Result/Discussion Section:
The result section, except data description, primarily focuses on the CPI analysis, providing only a quantitative aspect of the ML model's role in assessing the environmental and socioeconomic variables' impact. A more comprehensive evaluation of the model's reliability and performance in comparison to referenced researches would strengthen the manuscript. Additional quantitative metrics and a more expansive discussion of the ML model's outcomes are warranted. Additionally, the discussion section only offers general possible causes for result discrepancies from the hypothesis without further analysis to clarify the role of environmental and socioeconomic factors in flood damage.Overall, the manuscript has potential, but addressing these points will significantly enhance its clarity, depth, and contribution to the field.
Citation: https://doi.org/10.5194/nhess-2023-113-RC2 -
AC2: 'Reply on RC2', Behrang Bidadian, 27 Dec 2023
Dear Referee,
We appreciate your thorough review of our manuscript and the insightful comments you provided. Your feedback has been instrumental in identifying areas for improvement, and we are committed to addressing these concerns to enhance the clarity, depth, and contribution of our study.
- Environmental and Socioeconomic Characteristics in Shaping Flood Risk:
We acknowledge your point and plan to revise the manuscript by exploring the interrelationships among the identified variables, emphasizing both quantitative correlations (positive or negative) and spatial distribution patterns. This will provide a more nuanced understanding of their collective impact on flood risk in urban areas.
- Inclusion of Socioeconomic Variables in Prediction Model:
Your observation regarding the effectiveness of socioeconomic variables in our prediction model is duly noted. Our primary objective was to aid Houston in future planning for and management of flooding rather than introducing novel machine learning (ML) models or insights. We applied ML for that purpose as a tool to overcome the data complexities and answer questions about flood risk.
In this study, the socioeconomic variables appeared less effective than expected. We tried to analyze potential reasons for that as mentioned in the discussion section. Although we mentioned “incorporating demographic and socioeconomic data can potentially enhance the estimation accuracy of flood risk and damage models (Knighton et al., 2020)”, due to the above reason, it was not possible to measure the extent of improvement by these indicators in our model. Additionally, we analyzed the damage ratios and socioeconomic variables in only a snapshot in time. We agree that more detailed studies related to a more extended period can be conducted, as future studies, to measure the improvement.
- Definition and Quantification of Poorly Drained Soil:
We agree with your suggestion and will upgrade the manuscript with more comprehensive descriptions of the soil drainage classes, especially the Poorly Drained Soils based on the Natural Resources Conservation Service (NRCS) definitions.
- Clarity on Data Sources and Variable Definitions:
Your comment on the need for more precise definitions of variables is well taken. We tried to be concise about the variables. However, based on your feedback, we will update the manuscript with more descriptive metadata, ensuring a transparent presentation of data sources and variable definitions.
- Placement of Data Description Section:
Considering your comment, we will reorganize the presentation of maps, focusing on those that better illustrate the final results.
- Limited Analysis in the Result/Discussion Section:
As described earlier, the main focus of this article is not machine learning modeling, therefore it has not been submitted to an ML journal. Considering the main purpose, we believe the results of this study should be presented in a way that highlights the most significant factors that benefit the city for better preparation for future flood events and for implementing risk mitigation efforts more effectively. In addition, we discuss the potential reasons for the misalignment of the results with the expectations of the environmental justice theory.
In conclusion, we sincerely appreciate your constructive feedback. Your guidance is invaluable, and we are eager to refine our research based on this discussion. We believe our study contributes to the field of flood risk assessment and management, particularly in the context of Houston as a highly flood-prone city.
We look forward to the opportunity to modify the manuscript in line with this discussion and make it published in the journal.
Citation: https://doi.org/10.5194/nhess-2023-113-AC2
-
AC2: 'Reply on RC2', Behrang Bidadian, 27 Dec 2023
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
731 | 232 | 116 | 1,079 | 30 | 41 |
- HTML: 731
- PDF: 232
- XML: 116
- Total: 1,079
- BibTeX: 30
- EndNote: 41
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1