the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Testing machine learning models for heuristic building damage assessment applied to the Italian Database of Observed Damage (DaDO)
Subash Ghimire
Philippe Guéguen
Adrien Pothon
Danijel Schorlemmer
Abstract. Assessing or forecasting seismic damage to buildings is an essential issue for earthquake disaster management. In this study, we explore the efficacy of several machine learning models for damage characterization, trained and tested on the database of damage observed after Italian earthquakes (DaDO). Six regression- and classification-based machine learning models were considered: random forest, gradient boosting and extreme gradient boosting. The structural features considered were divided into two groups: all structural features provided by DaDO or only those considered to be the most reliable and easiest to collect (age, number of storeys, floor area, building height). Macroseismic intensity was also included as an input feature. The seismic damage per building was determined according to the EMS-98 scale observed after seven significant earthquakes occurring in several Italian regions. The results showed that extreme gradient boosting classification is statistically the most efficient method, particularly when considering the basic structural features and grouping the damage according to the traffic-light based system used, for example, during the post-disaster period (green, yellow and red). The results obtained by the machine learning-based heuristic model for damage assessment are of the same order of accuracy as those obtained by the traditional Risk-UE method. Finally, the machine learning analysis found that the importance of structural features with respect to damage was conditioned by the level of damage considered.
Subash Ghimire et al.
Status: final response (author comments only)
-
RC1: 'Comment on nhess-2023-7', Zoran Stojadinovic, 26 Feb 2023
- The overall quality of the preprint (general comments)
The overall quality of the preprint is good. The topic of testing machine learning models for heuristic building damage assessment is significant for the science community. The research is well structured and explained. The authors made an effort to test a large number of different experiment scenarios. There are minor deficiencies in the paper, mostly related to the nature of the dataset, not the method.
Not so long ago, this paper would probably deserve to be published as is (with some technical corrections). But, since the research shows similar methods and results compared to previously published papers, this reviewer believes that the authors should make an additional effort to demonstrate the added value of this research to the body of knowledge. There is not much more to do regarding experiments, but adding a Discussion chapter is an opportunity to improve the paper.
- Individual scientific questions/issues (specific comments)
It appears that most ML-based articles have trouble with inconsistent datasets and imbalanced recorded damage distributions, obtaining similar results which are scientifically acceptable but not impressive. The Discussion chapter cannot solve ML-related issues in earthquake damage assessment, but the authors can present their views on limitations, opportunities, advances, future work or the way forward based on their findings.
Here are some topics which the authors could discuss in the additional chapter:
- Highlight the differences and stated advances - the use of oversampling and the conditioned importance of structural features related to damage states. What is the implication? The abstract and conclusion lack some numerical research highlights.
- How do the authors explain obtaining similar results for significantly different combinations of methods, set sizes, features and target classes? Why do we all get similar results? What is the way forward? Should we dismiss machine learning, or should we improve something? Are we possibly missing a key feature to include in post-earthquake surveys? Which one?
- The authors had the unique advantage of using multiple earthquake datasets from the same region. That is maybe the key difference compared to other papers – the dataset covered the whole MSI range while single earthquakes provide only a fraction. And yet, similar results were obtained? Why?
- How do the authors evaluate the usefulness of the research and model implementation for new earthquakes? How to implement the model without the class of undamaged buildings? Without them, what will the model tell local authorities - that all buildings are damaged? Why should they use machine learning and not the traditional Risk-UE method? What are the benefits?
- How about transferability? The authors should explain the sentence addressing transferability (lines 599-601) in more detail. Obviously, there are differences between regions regarding code implementation or the human impact on construction quality (Turkey earthquake).
- Technical corrections
There is a need for some technical corrections, highlighted in the attached file. The authors should carefully check the paper for unnecessary long phrases, missing articles or spelling. For example:
- Line 6 needs better wording regarding “six models” (possibly - six models were considered: regression- and classification-based machine learning models, each using random forest, gradient boosting and extreme gradient boosting).
- There seems to be some redundant text in Lines 173-179.
-
RC2: 'Comment on nhess-2023-7', Marta Faravelli, 28 Feb 2023
General comments
The article describes the application of machine learning models to a heuristic method for post-earthquake damage assessment, applied to observed damage data after seismic events that affected Italy. The topic is interesting and worthy of investigation but in my opinion there are some points that need to be clarified before its publication. The tables and figures are clear and complete. The writing is fluent. The bibliography is extensive, there are only a few corrections to be made mentioned later.
Specific comments
The most significant problem in my opinion is not taking into account the uninspected buildings that would increase the number of buildings that reach the DG0 damage level. Without taking these buildings into account, there is a non-real distribution of damage that is amplified by the application of the "random oversampling" method. It is also not clear to me how this oversampling method is applied; it should be explained in more detail. Is it artificially "adding" buildings that reach the damage levels above DG0 in order to have the same number of buildings reaching the different damage levels? If so, the information regarding the percentage of buildings of a typology that reach a specific damage level is lost. This point needs to be clarified.
Other observations for consideration are given below:
1) line 45-47: it says that the damage is given by the combination of seismic hazard, exposure and vulnerability/fragility but it does not explain what these three elements are
2) line 48-49: the phrase from "For" to "scenario" is useless unless explained
3) vulnerability/fragility is mentioned but the difference between these two elements is not explained
4) line 60-61: "superior computational efficiency, easy handling of complex problems, and the incorporation of uncertainties" regarding the use of artificial intelligence applied to seismic risk assessment. These are strong statements that should be justified or reported in the conclusions with appropriate explanation
5) line 71: DaDO database is cited without saying it is Italian data (it is said later but it would be appropriate to say it here too)
6) line 82-82: sentence from "By" to "assessment" is unclear, explain better
7) line 91: it says that DaDO has observed damage data for major earthquakes in Italy. Specify the time range of data collected as there have been other earthquakes in Italy for which there are no data in DaDO for different reasons (very old earthquakes for which data were not being collected and more recent such as the 2016-2017 earthquake for which data are being processed)
8) line 103: specify that the scale from DG0 to DG5 is EMS98
9) line 127-129: mention that there are mostly damaged buildings in the database but do not explain how take into account this element i.e., the fact that the buildings in the database are not all those in the municipalities considered, there are also the buildings that have not been inspected that we can assume have zero damage
10) line 142-144: it is said that in DaDO there are MSI values provided by USGS ShakeMap tool but the intensity values in DaDO are MCS and were calculated by INGV (National Institute of Geophysics and Volcanology). Also, are the intensities coming from the "ShakeDaDO" database being considered? If yes, cite this database with the correct reference. If no, it might be useful to consider it (Faenza L., Michelini A., Crowley H., Borzi B., Faravelli M (2020) ShakeDaDO: A data collection combining earthquake building damage and ShakeMap parameters for Italy, Artificial Intelligence in Geosciences, Volume 1, 2020, Pages 36-51, doi.org/10.1016/j.aiig.2021.01.002.)
11) line 297: for completeness, I suggest to mention the value of ADG for DG3
12) lines 327-335: explain more about these 4 methods
13) lines 371-375: it is not clear the sentence from "Notes" to "areas"
14) traffic-light system: the introduction of this method for comparison in my opinion is not very significant. It is said that it was used during post-earthquake emergency situations but in my experience in Italy it is not. Instead of this traffic-light system, I suggest to consider the damage levels as they are directly present in the Aedes forms i.e. DG0, DG1, DG2+DG3, DG4+DG5, in addition to the five damage grades of the EMS98
15) Explain better how this machine learning method can be used in the post-event phase. Could a recorded value such as PGA be used for hazard instead of macroseismic intensity?
Technical corrections
Below my observations:
1) line 70: in the text there are these abbreviations not found in the bibliography: MINVU, 2021; MTPTC, 2010; NPC, 2015
2) line 287: AT for GBR is 0.50 but in the text it is listed as 0.49
3) line 288 and 294: Fig. 2 is mentioned instead it is Fig. 3
4) Reference: there are 3 papers absent in the text: Ghimire et al. 2021, Riedel and Gueguen 2018, Seo et al. 2012
Citation: https://doi.org/10.5194/nhess-2023-7-RC2 -
RC3: 'Comment on nhess-2023-7', Petros Kalakonas, 20 Mar 2023
General comments:
The article explores the use of advanced machine learning algorithms for post-earthquake heuristic damage assessment of buildings using a subset of the Italian DaDO database. The topic is very interesting and, in my opinion, quite important for the improvement of existing methodologies in earthquake scenario simulations and seismic risk analysis of building portfolios. The authors considered an extensive literature and, overall, the research is well presented and the writing is good, although I think some parts are unnecessary long.
Undoubtedly, the manuscript includes a significant amount of work related to the training and evaluation of the ML models using an innovative approach to tackle known issues in the development of ML models for damage assessment. However, I believe that discussion is missing in several key topics and the authors should consider a few additional aspects.
Specific comments:
1. Even though the topic of the research is clearly defined, the objectives are not sufficiently explained. Why should we explore ML models for damage assessment of building portfolios? What are the limitations of traditional/existing methodologies (e.g., Risk-UE)? Lines 54-58 mention the challenges in developing exposure models, which are true regardless of the damage assessment methodology. Finally, is the purpose of the manuscript to only demonstrate the benefits of ML models in this field or to use the developed heuristic model in other regions and future seismic events as well?
2. Lines 93-94: Why did the authors consider damage data from seven earthquakes and not the entire DaDO database? Typically, ML models benefit from the use of large datasets.
3. The input parameter Building location in terms of latitude and longitude is irrelevant given that the latitude and longitude of the epicentre of the earthquake is not used. Why the authors did not use the epicentral/hypocentral or source-to-site distance instead? As a consequence, the importance of Lat and Long in Figure 5 is misleading.
4. Observing the data distribution of Figure 2, it is clear that the wide majority of the buildings (85%) are one-storey. Therefore, the input parameters Height of building, Number of storeys and Regularity in terms of elevation are not so relevant for the training of the ML models. In general, these structural parameters are crucial for the seismic response and vulnerability of buildings, thus I believe the authors should address this issue.
5. Considering the above observation, did the authors test the employment of the recorded/median PGA instead of/along with MSI? Potentially, the performance of the heuristic model could be improved and outperform traditional approaches.
6. Lines 129-131: A justification is missing regarding why the inclusion of the Irpinia-1980 dataset in the training can lead to biased outcomes. Why is it only relevant for testing the models?
7. Lines 142-143: Did the authors test the importance of other parameters provided by USGS, such as Mw and hypocentral depth of the main events?
8. Lines 175-176: It is not clear how the DG is converted into a continuous variable for the regression ML models.
9. Lines 184-185: This is not true. It entirely depends on the ML algorithm and the training process. For example, in the case of artificial neural networks, the eigenvalues of the training dataset used by some common optimization algorithms have a considerable impact.
10. Lines 195-196: Were the reported metrics throughout the manuscript obtained from the training or testing datasets? I believe it is important to clarify this.
11. Essential information is missing from the manuscript regarding the optimization of the hyperparameters presented in Table 3. How did the authors fine-tune the models? How was under- and over- fitting prevented? In particular, Random-forest and XGBC models are prone to overfitting.
12. Chapter 4.3.1: A very long discussion of the results is included, which the reader can interpret by observing the figures. However, the fact that a large number of predictions are underestimated is only mentioned in line 503 and it is not discussed. This finding needs to be elaborated and explained, as it may be related to comment 11.
13. Lines 555-557 & 597-599: Based on the results and this conclusion, there is no benefit of employing XGBC over the traditional approach of RISK-EU. The authors should provide justification for this important finding and elaborate on the potential benefits of ML models over RISK-EU.
14. Lines 580-581: The XGBC model is not optimal, it just performed slightly better than the other models.
15. Lines 599-600: From which results did the authors draw this conclusion? Do similar building portfolios refer to primarily one-storey buildings?
Minor edits:
1. Line 39: A reference is missing for this interesting information.
2. Line 46: What do the authors mean by necessary damage?
3. Line 127: This sentence does not read well; I suggest to rephrase it.
4. Line 132: Replace “methods” with models and “earthquake’s” with earthquakes’.
5. Lines 177-178: This sentence is a repetition of the one in lines 173-175. I suggest to just mention that the same ML algorithms were used for regression and classification.
6. Lines 208-209: MAE and MSE are acronyms. Replace the words “average” with mean.
7. Line 299: Replace “Summary of optimized input parameters” with Summary of optimized hyperparameters. The term input parameters refers to the input variables (e.g., MSI, Building age, etc.).
8. Line 407 & 591: Replace “machine” with “machine learning model”.
Citation: https://doi.org/10.5194/nhess-2023-7-RC3
Subash Ghimire et al.
Subash Ghimire et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
284 | 57 | 11 | 352 | 3 | 3 |
- HTML: 284
- PDF: 57
- XML: 11
- Total: 352
- BibTeX: 3
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1