the Creative Commons Attribution 4.0 License.
Development of a seismic loss prediction model for residential buildings using machine learning – Ōtautahi / Christchurch, New Zealand
Quincy Ma
Pavan Chigullapally
Joerg Wicker
Liam Wotherspoon
RC1: 'Comment on nhess-2022-227', Zoran Stojadinovic, 22 Sep 2022
1. The overall quality of the preprint (general comments)
The overall quality of the preprint is good. The topic of mapping a building representation directly to monetary compensation is exciting and significant for the science community. The research is well structured and explained.
But, since the dataset is substantially large and the prediction task relatively easy (3 broad classes), this reviewer expected slightly better results in the confusion matrix. Therefore, it seems reasonable that the model could improve with some adjustments.
2. Individual scientific questions/issues (specific comments)
Here are some suggestions to improve the model and preprint:
a) Dataset.
For better prediction results, the authors should preserve (and demonstrate) the original data distribution from the initial dataset when merging and filtering instances. For the same reasons, this reviewer believes that it is necessary to include the class of undamaged buildings in the dataset (with 0$ compensations), unavoidable when mapping damage states.
b) Building representation.
For a better presentation of the mapping problem, the authors should show the data distribution for more features (like in Figure 6 for construction type). Surprisingly, the height of buildings isn’t included in the building representation even though it captures the dynamics of the building. A feature like “number of floors” could possibly be informative.
c) More discussion on the confusion matrix would be helpful. For example, how do the authors explain the accuracy difference between b) and c) in Figure 12? Does it have to do with PGA ranges of earthquakes? The worse prediction seems to be "Predicted Medium / Actual OverCap". How to explain this?
d) It would be interesting to evaluate the prediction accuracy for the sum of compensations for all buildings. It is reasonable to expect good prediction accuracy for total cost since errors would cancel out each other. But it is difficult to perform without precise “OverCap” values.
e) Finally, how do the authors evaluate the usefulness of the research and model implementation for new earthquakes? Namely, what about the changing value of money over time and frequent changes in market prices? How to implement the model without the class of undamaged buildings (this version could work just if damaged buildings were pre-selected)?
3. Technical corrections
There is a significant number of needed technical corrections. Some examples are highlighted in the attached file. The authors should carefully check the paper.-
AC1: 'Reply on RC1', Samuel Roeslin, 22 Nov 2022
# Comment Reply 1 a) Dataset.
For better prediction results, the authors should preserve (and demonstrate) the original data distribution from the initial dataset when merging and filtering instances.
For the same reasons, this reviewer believes that it is necessary to include the class of undamaged buildings in the dataset (with 0$ compensations), which is unavoidable when mapping damage states.Thanks for the comment.
This model has been developed for insurance purposes with the aim of helping the Earthquake Commission (EQC) to get an understanding of the possible loss distribution across Christchurch for any future earthquake. EQC’s interest is concentrated on damaged buildings for which a claim might be lodged. It must be reminded that the data used to train the machine learning (ML) model pertains to buildings for which one or multiple claims have been lodged as part of the Canterbury Earthquake Sequence (CES). Getting details information on undamaged buildings was not part of the scope. Thus, extracting reliable data related to the undamaged category is not straightforward. Simply assuming that a building was not damaged because no EQC claims were lodged is not satisfactory, as it is not possible to affirm that the remaining buildings had proper insurance coverage. Moreover, for some buildings which suffered slight damage, the building owners might have decided to cover the cost of the reparations by themselves to avoid paying the excess.
Nevertheless, following the suggestion, we tried to include a fourth category for undamaged buildings and explored the influence on the ML model performance. The EQC data includes a few instances with zero compensation (BuildingPaid = NZ$0). Figure i shows the number of instances for 4 Sep 2010 and 22 Feb 2011. With 7%, the number of instances is very limited compared to the “low” and “medium” categories. Despite the low number in the category with no damage, a new machine learning model has been retrained (considering class imbalance). Figure ii shows the confusion matrix for the Random Forest algorithm trained using four categories. It can be seen that the overall accuracy dropped and that the model is having difficulties making predictions for the zero damage category. It was thus decided to keep only three categories (low, medium, and overcap).Figure i: Number of instances in BuildingPaid categorical in the filtered data set including the category with zero compensation: (a) 4 Sep 2010, (b) 22 Feb 2011
Figure ii: Confusion matrix for Random Forest algorithm including the category with BuildingPaid = 0
2 b) Building representation.
For a better presentation of the mapping problem, the authors should show the data distribution for more features (like in Figure 6 for construction type).
Surprisingly, the height of buildings isn’t included in the building representation even though it captures the dynamics of the building. A feature like “number of floors” could possibly be informative.
Re: data distribution
Thanks for the suggestion. A new table has been added showing the nine attributes used in the model. The table gives information about the type and distribution of each attribute.Re: height of the buildings
Thanks for the remark. The authors of this paper are aware of the inclusion of the building height (or number of stories) as an attribute in the ML models in similar studies (Ghimire et al., 2022; Harirchian et al., 2021; Mangalathu et al., 2020; Stojadinović et al., 2022). The non-inclusion of the building height in this study was dictated by the availability of the information in the dataset and not based on a deliberate choice. While the original EQC data has an attribute for the number of storeys, many instances are missing. Figure iii shows the number of instances available for each storey category. It can be seen that for 4 Sep 2010 and for 22 Feb 2011, storey information is missing for 58% and 93% of the instances respectively. The information available related to the building height is thus very limited. When available, the data for 4 Sep 2010 shows that 10,37 buildings have one storey, 2,677 two storeys, and 140 three storeys. Similarly, for the 22 Feb 2011, 1,522 of the buildings have one storey, 727 two storeys, and 91 three storeys. Selecting only instances where the number of storeys is known would have been very limiting from the aspect of training a ML model. It should be reminded that the EQC cover is limited to residential dwellings. This study is thus limited to residential buildings only which for Christchurch are mostly one-story height houses.
Accurate information on the number of storeys could not be obtained from the RiskScape database either. The attribute ‘Storeys’ is reported as a float which seems to have been calculated from the building floor area 'BLDGFL_1' divided by the building footprint 'FOOTP_1'.
In the aim of retaining high data accuracy it has been decided to not include the number of storeys from the EQC dataset in this study. This has been clarified in section 3.2 of the paper.Figure iii: Number of instances for each storey category (EQC data): (a) 4 Sep 2010, (b) 22 Feb 2011
3 c) More discussion on the confusion matrix would be helpful. For example, how do the authors explain the accuracy difference between b) and c) in Figure 12? Does it have to do with PGA ranges of earthquakes? The worse prediction seems to be “Predicted Medium / Actual OverCap”. How to explain this? Thanks for the comment.
We are still in the process of retraining the ML model with the suggestions provided by both reviewers. Section 8 of the paper will be updated.4 d) It would be interesting to evaluate the prediction accuracy for the sum of compensations for all buildings. It is reasonable to expect good prediction accuracy for total cost since errors would cancel out each other. But it is difficult to perform without precise “OverCap” values. Thanks for the suggestion.
Building claims larger than NZ$100,000 (+GST) were handled by private insurers. Unfortunately, private insurers were not inclined to make their data available for this research work. As mentioned in paragraph 5.2, the current data is thus soft-capped at NZ$100,000 (+GST) making the analysis of the total costs not possible with the currently available data.5 e) Finally, how do the authors evaluate the usefulness of the research and model implementation for new earthquakes? Namely, what about the changing value of money over time and frequent changes in market prices?
How to implement the model without the class of undamaged buildings (this version could work just if damaged buildings were pre-selected)?Re: model implementation for new earthquakes
Thanks for the comment. The ML model presented in this paper was designed to be easily retrainable. This enables the addition of new instances in the training set after the occurrence of a new earthquake. However, one of the challenges, is that claim amounts are not readily available after an earthquake as on-site assessment of building damage can be spread over a long period of time. To circumvent this issue, building damage should be assessed on a representative set of buildings. This subset where the damage extent will be known should be added to the training set of the ML model that can be retrained. The ML model can then be used to make predictions on the entire building portfolio. This approach has been schematically described in a new version of Fig 10.
The selection of a representative training set can be made following an event based on the effects of the earthquake or prior to the event using a predetermined representative subset of the residential building in Christchurch. Special care should be applied to ensure that the selected buildings can be used to produce a satisfactory seismic loss assessment at the city level. Recent discussions with experts highlighted the uniqueness of the Canterbury region. They mentioned that the analysis of damage observations across the CES showed that the main earthquake events affect different areas in Christchurch. They thus suggested that the selection of a representative set of buildings should take into account the geographical characteristics, the liquefaction setting, and building characteristics.
When expert opinion is not available, similar studies showed that ML could even be employed in the selection of a representative building set (Mangalathu & Jeon, 2020).
The actual selection of a representative set of buildings for Christchurch is beyond the scope of this study.Re: Change in value of money
Thanks for the remark. The authors agree with the necessity to consider the evolution of the market over time. Here again, the ease of retraining the ML model when new or updated training data gets available should be highlighted. The authors are aware of the step change related to the value of the EQC cap over time (e.g., since 1 Oct 2022, the new cap is at NZ$300,000 + GST (Earthquake Commission (EQC)). Nevertheless, Fig 8 showed that for 4 Sep 2010 earthquake most of the claims fell in the ‘low’ and ‘medium’ categories. Even for the 22 Feb 2011 earthquake, which was unprecedented in the damage extent caused in the Canterbury region, many claims still relate to the ‘low’ and ‘medium’ categories. It is thus believed that the value of the model lies in its ability to make predictions for those categories (‘low’ reflecting the limit of initial cash settlement consideration, ‘medium’ for building having more damage but where claims are still fully addressed by EQC only, and ‘overcap’ where private insurer come into consideration for higher level of damages).Re: no undamaged buildings class
Thanks again for the comment. Please see our reply to comment #1. This ML model has been developed with the purpose of being used in the insurance setting. The focus is thus on being able to predict the possible damage and loss extent for buildings having EQC claims in future earthquakes.6 Technical corrections
There is a significant number of needed technical corrections. Some examples are highlighted in the attached file. The authors should carefully check the paper.Thanks for having highlighted those typos. The errors have been corrected.
Earthquake Commission (EQC). (2022). EQC Insurance Overview.
Ghimire, S., Guéguen, P., Giffard-Roisin, S., & Schorlemmer, D. (2022). Testing machine learning models for seismic damage prediction at a regional scale using building-damage dataset compiled after the 2015 Gorkha Nepal earthquake. Earthquake Spectra.
Harirchian, E., Kumari, V., Jadhav, K., Rasulzade, S., Lahmer, T., & Raj Das, R. (2021). A Synthesized Study Based on Machine Learning Approaches for Rapid Classifying Earthquake Damage Grades to RC Buildings. Applied Sciences, 11(16), 7540.
Mangalathu, S., & Jeon, J.-S. (2020). Regional Seismic Risk Assessment of Infrastructure Systems through Machine Learning: Active Learning Approach. Journal of Structural Engineering, 146(12)., S., Sun, H., Nweke, C. C., Yi, Z., & Burton, H. v. (2020). Classifying earthquake damage to buildings using machine learning. Earthquake Spectra, 36(1), 183–208.
RiskScape. (2015). Asset Module Metadata. initially hosted on
Stojadinović, Z., Kovačević, M., Marinković, D., & Stojadinović, B. (2022). Rapid earthquake loss assessment based on machine learning and representative sampling. Earthquake Spectra, 38(1), 152–177.
RC2: 'Comment on nhess-2022-227', Anonymous Referee #2, 27 Sep 2022
The authors have presented a novel ML based approach to estimating the loss to buildings after an earthquake is 3 categories - low, medium, and high. As part of model training, the authors have performed spatial data merging between 3 datasets, and only selected the subset of buildings with the least chance of erroneous data attributes. The authors have also focused on the 4 earthquakes in NZ around 2011 with the maximum number of data points. The paper is well structured, and is fairly easy to follow.
However, I found some of the key information about the ML approach missing or confusing in the paper, and have highlighted it below. I believe that the paper would be further improved substantially by adding or clarifying the ML approach. I have listed both my major and minor concerns below.
- The selection of the test set is unclear in the paper. It also appears that the test set has been erroneously used as a validation set. If that is the case, then it is difficult to assess the generalizability of the authors’ conclusions. It would be helpful to clarify how the test set was selected and used in this study. Additional comments regarding test set are also included below with specific line references.
- While it is a suitable approach to only select the 4 events with the highest number of claims during model training, the other events with fewer claims could be used for testing purposes. This would not only ensure that no data leakage occurred between the training and test sets, but also enable the authors to validate the generalizability of their models more effectively.
- It would improve the paper if the authors added their thoughts on some of the potential use cases of this research. While the authors’ conclusions indicate a promise for using ML within this domain, it was unclear how this model and approach could be used in the future. For example, if training data is needed each time an earthquake occurs, then is one of the use cases to manually collect a subset of ground truth data for building losses, train a model, and then apply it widely to the rest of the buildings?
- Further discussion of the model metrics such as recall and precision would be helpful. For example, a recall of only 20% for overcap, and 49% for low loss category indicates that 80% and 51% of these losses, respectively would be missed when implementing this model. Depending on the model’s use cases, this could have a significant impact on the model’s utility. Further discussion of the most appropriate metric (or their combinations), given the model’s use cases would also improve the paper. For example, why was accuracy selected as the primary evaluation metric for choosing the best performing model?
- I appreciated that the authors listed the distribution imbalance of different features, such as construction type. However, the paper could be further improved by adding the model performance in those different feature categories. This would enable the reader to understand in which categories the model performs better than others.
- Given the relatively low performance of the ML model (as highlighted above for recall), adding a section on error analysis would substantially improve the paper. In error analysis for ML, the objective is to identify the cases in which the model does not perform well. This error analysis is often used in ML modeling to improve model performance and generalizability.
- Figure 13 is missing, and appears to be a repeat of Figure 12. Hence, Section 9 - Insights - could not be reviewed.
- It would further improve the paper if the authors added some information about their hyperparameter tuning methodology, and which search strategy they used.
- Line 50 - While the authors are completely correct in the paragraph at line 50, this paper deals with ML for structured data, for which the goal is often to surpass human performance since humans are generally unable to identify all patterns in millions of data points with hundreds of features, often found in these problems. Hence the paragraph does not apply to the ML scope of this paper. It may be suitable to remove the paragraph within the scope of this paper, or change “human-level performance” to “baseline model”, which would be a more suitable term in this case.
- Line 65 - latter -> later
- Line 73 - Suggest adding reference/url for the source of the data.
- Line 83 - It would be helpful to further describe Figure 2. Why is there a difference in the number of claims and buildings?
- Line 95 - I was curious about the accuracy of the Riskscape dataset. For example, are the building characteristics determined statistically from Census data similar to HAZUS in the US, or was it based on collecting data from building records so that it is expected to be fairly accurate? If possible, it would be helpful in the paper to include some information describing Riskscape’s data collection methodology and comment on its expected accuracy.
- Line 115 - Although a reference is provided to the authors’ previous work, it would be helpful to summarize the major reasons for incorrect merging using direct spatial joins within this paper to help understand the issue without having to read the previous work.
- Line 122 - It would be helpful if the authors added the percentage of addresses in each of the 3 categories - 1-1 match with titles, 0-1 match, and many-1 match.
- Line 131 - It would be helpful if the authors added the percentage of RiskScape data that was discarded.
- Line 132 - I was unable to understand the intent described in this paragraph, especially the first and the last sentences.
- Table 1 - The table is very helpful. However, the action taken for 2 points LINZ and 1 point Riskscape was unclear. The above mentioned percentages of data could also be added to Table 1 instead.
- Line 150 - It would be helpful if the authors added the methodology used to merge soil conditions, and liquefaction occurrence with street address. Did they use the same inverse distance weighted interpolation as seismic demand?
- Line 172 - The reason for discarding claims with maximum value lower than or higher than $115,000 is unclear. Is it because this wasn’t possible and hence the data is erroneous?
- Line 240 - It is unclear which event was selected as the test set. From my understanding of line 243, one of the 4 events was selected as test set, and the other 3 events as training+validation sets. However, it also appears from the sentence that in different instances of the model, a different event was selected as a test set so as to determine the most generalizable model. If that is the case, the test set was erroneously used as a validation set, since the model cannot be changed at any point after evaluating its performance on the test set. It would be helpful to clarify the selection of the test set, and ensure that it was only used once at the end to evaluate the performance of the final developed model.
- Line 254 - It would be helpful if the authors added how the min-max scaling was implemented with respect to training, validation, and test sets.
- Line 286 - It is unclear which limitations related to random forest model the authors are referring to.
- Figure 11 - The SVM model does not appear to have been modeled correctly as its output prediction is always the medium category, hence it has been reduced to a trivial model.
- Line 295 - It appears that the model was selected based on the best performing model on the test set. This indicates that the test set was not used correctly, as the model selection can only be done using validation sets. The test set must only be used to show the performance of an already selected model on it.
- Line 326 - While the authors raise an accurate point about the lack of claims information exceeding $115,000, it is not clear how that data could have benefitted this study since the claims have been bucketed and all those claims greater than $115,000 are already expected to be included in the over-cap category.
