the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using machine learning algorithms to identify predictors of social vulnerability in the event of a hazard: Istanbul case study
Oya Kalaycıoğlu
Serhat Emre Akhanlı
Emin Yahya Menteşe
Mehmet Kalaycıoğlu
Sibel Kalaycıoğlu
Download
- Final revised paper (published on 15 Jun 2023)
- Supplement to the final revised paper
- Preprint (discussion started on 20 Jul 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on nhess-2022-198', Yi (Victor) Wang, 08 Aug 2022
As an open reviewer of this manuscript, I first thank the handling editor of this special issue of NHESS, for giving me the opportunity to conduct the review. Next, let me introduce myself to demonstrate my qualifications for this review. My name is Yi Victor Wang (https://dryvw.com/). I am currently serving as a Postdoctoral Fellow at the Institute for Earth, Computing, Human and Observing (ECHO) at Chapman University, Orange, California, USA. I have been authentically studying and researching in the scholarly field of science, engineering, and management of hazards and disaster risks for over a decade. I have a bachelor’s degree, Master’s degree, and a Ph.D. degree in this field. One of my major academic contributions so far is the proposal of an empirical predictive modeling approach to quantifying disaster vulnerability with consideration of social factors (a.k.a., social vulnerability). To facilitate communications regarding this review on the topics of social vulnerability to natural hazards, especially in the event of an earthquake, from my perspective, I recommend that the authors take a look at my first-authored peer-reviewed journal papers of Wang et al. 2019, 2020, and 2021 as well as Wang and Sebastian 2021 listed at the end of this review. In particular, Wang et al. 2021 is highly pertinent to what has been covered by the authors’ manuscript.
General Comments
In terms of the authors’ manuscript and research, I like the idea of applying machine learning (ML) methods to quantify social vulnerability and to identify predictors of social vulnerability. I also appreciate the technical prowess of the authors manifested in their statistical analyses. Having said these, however, I believe that the current version of the submission is far from the level of acceptance for publication. There are a number of major issues that render the manuscript highly dubious. The story line is also logically unsound and broken at several locations of the manuscript. The way the manuscript is laid out exposes the authors’ lack of knowledge, confidence, and familiarity in topics related to disaster vulnerability and natural hazards. The authors have spent a disproportionately large amount of effort in showing the technical details of a few selected sections of their research that are actually not highly important regarding the purposes of their research. The motivations, results, and discussions in the manuscript around the topics, that are supposed to be pertinent to the practices to improve earthquake disaster risk reductions, are presented in a highly superficial manner. In order to receive a green light from me, the authors need to solve the major and minor issues as listed below and conduct a thorough revision to their manuscript accordingly in the later stage of the review process.
Specific Comments
- L1: The uncountable noun of vulnerability in disaster research, especially for risk assessment for prediction of future loss, essentially means the propensity of an entity towards loss given a unit exposed value (such as life, economy, health, livelihood, infrastructural functionality, etc.) when the entity has experienced a certain level of hazard strength (such as ground shaking of an earthquake, wind gust of a tornado, inundation of a flood, etc.). In addition, vulnerability is usually also considered to be associated with the tendency towards a long-term suffering due to poor recovery by many, especially social scientists. To facilitate the management of disaster vulnerability before an unwanted event occurs, we may conceptualize disaster vulnerability as a combination of social vulnerability due to social factors, environmental vulnerability due to environmental factors, infrastructural vulnerability due to infrastructural factors, etc., as described in many classical literatures such as Cutter 1996 (https://doi.org/10.1177/030913259602000407). By the way, this Cutter 1996 is not the paper cited in the authors’ manuscript. In the early days without big data on reliable and sufficient historical records of disaster losses, practitioners and scholars needed some method to quickly estimate disaster vulnerability. When it came to social vulnerability, professionals found that using social factors to construct a social vulnerability index (SVI) seemed to be a good approach for measuring social vulnerability. However, SVI itself is not social vulnerability. It is an indicator/predictor of social vulnerability at most. In the title, the authors claim that their research was to identify predictors of social vulnerability. But according to the body of the manuscript, it is clear that what the authors actually did was to identify predictors of an SVI. This is equivalent to building models to establish the relationships between a set of social variables and another set of social variables. What is the point for doing this when the authors could simply add these so-called predictors directly into their SVI?
- Then, regarding the SVI in the authors’ research, I am not sure how the authors could resolve this second issue satisfactorily. As I have said in the previous comment, the original efforts to create SVIs were limited by a lack of sufficient historical records of event losses. Now, we are in year 2022 in the age of big data. We are having access to a gigantic amount of historical records of event losses to support empirical modeling of disaster vulnerability, socially, environmentally, infrastructurally, or in whatever manner. Why do we have to get stuck with the non-empirically derived SVIs to guide disaster risk reduction practices? For those SVIs that cannot be verified with historical data on losses, they are not reliable for offering any policy suggestions. For those SVIs that can potentially be verified with historical data on losses, it would be more appropriate to directly establish empirical models of disaster vulnerability with calibrations of models on the historical data on losses. Without empirical evidence that directly associates with the expected event losses or poor recovery processes, any SVI is merely a product of social construction based on amplified voices from a seemingly scholarly, but actually perhaps more political than academic, echo chamber that eventually result in the production of some form of emperor’s new clothes more or less.
- L2: The title emphasizes “social vulnerability in the event of an earthquake”. While reading the manuscript, however, I could hardly find anything to support the hypothesis that the manuscript is about vulnerability to an earthquake event. The input variables of the ML models have nothing to do with earthquakes. The authors have also failed to show why the output variables of the ML models are for an earthquake event. It seems that the data of the research is based on a survey by the Directorate of Earthquake and Ground Research of Istanbul Metropolitan Municipality. Although the name of this organization involves earthquake, the variables of the survey used by the authors seem to be totally unrelated to earthquake events. There are no measures of hazard strengths of earthquake events, such as local magnitude, moment magnitude, peak ground acceleration, peak ground velocity, peak ground displacement, peak spectral acceleration, modified Mercalli intensity, etc. The authors need to justify why their work is for an earthquake event or for earthquake events.
- L27: The term of “social vulnerability risk” or “risk of social vulnerability” that also appears later in the manuscript is exceptionally confusing. As I have referred to the meaning of social vulnerability previously and the word “risk” also has its specific meanings, what is the meaning of this “social vulnerability risk”? For a summary of the meanings associated with the word “risk” in scholarly works, the authors may have a look at Möller 2012 (https://doi.org//10.1007/978-94-007-1433-5_3). It seems that, with their survey data, the authors created two categories, i.e., a high SVI and a low SVI, based on a cutoff score. So, why do the authors have to call these two categories “severe risk of social vulnerability” and “non-severe risk of social vulnerability”, instead of “high SVI” and “low SVI”?
- L152-153: The authors need to introduce more regarding their SVI score, as it is unclear how readers may access an English version of IMM 2018 and MenteÅe et al. 2019 is just a conference abstract and presentation instead of a peer-reviewed journal publication or technical report. The authors need to transparently and concisely demonstrate why their SVI can effectively measure or indicate social vulnerability of a household in the event of an earthquake. Is their SVI related to an expected loss or loss ratio given a metric of earthquake hazard strength?
- L266-268: According to the title of the manuscript, the authors’ main work was to use ML algorithms to identify predictors of social vulnerability. First, the initial feature selection of input variables of ML models has nothing to do with ML algorithms, as the authors claim clearly on L163 that the “predictors chosen have been selected following extensive literature reviews” and “discussions with experts”. Then, it is still unclear what ML algorithms the authors have adopted for quantifying the importance of input variables in their predictive classification models. It seems that the main work of the authors was merely to calibrate some supervised ML classification models to map a set of already chosen input variables to their binary output variable of SVI score category. The authors need to at least explain more in a concise manner how they measured the importance of input variables of the ML models.
- Regarding the ML classification models, I am not convinced that the authors have the capability to properly compare the prediction results of the models that they have adopted. When dealing with statistical analysis, model validation, resampling, subsampling, etc., it seems that the authors have a lot to say. But when it comes to the ML models, there is almost nothing in the manuscript. For example, what is an SVM? What is an ANN? Are the authors using the multilayer perceptron, convolutional neural network, recurrent neural network, autoencoder network, or something else for their ANN modeling? What is the difference between a CART and an RF? Are the authors capable of explaining all the models they used in their study?
- The entire Introduction section needs to be thoroughly revised. The authors need to make sure that their introduction is concise, relevant to their research work, and following a story line that is logically sound. For example, on L34-35, the authors start their manuscript with a UN-qualified definition of disaster in terms of coping capacity. However, this definition is irrelevant to the vulnerability quantification at a household level.
- L3-37: The statement that the “evolution of an earthquake event into a disaster is typically studied through the lenses of geoscientists, civil engineers and earthquake engineers” is not true. There are many social scientists who have dedicated their research works to studying earthquake risks and disaster vulnerability to earthquakes (see, e.g., Stallings 1995 https://www.routledge.com/Promoting-Risk-Constructing-the-Earthquake-Threat/Stallings/p/book/9780202305455; Bolin and Stanford 1998 https://doi.org/10.4324/9780203028070).
- L37-39: The statement that “it is often forgotten or ignored that the human consequences of disasters are in part derived from the composition of the population and society prior to the event” is false. There are plenty of works looking at the social factors of disaster vulnerability even for quantitative and engineering modeling purposes within the context of earthquake hazard (see, e.g., Peduzzi et al. 2009 https://doi.org/10.5194/nhess-9-1149-2009; Lin et al. 2015 https://doi.org/10.5194/nhess-15-2173-2015; Wang et al. 2019, 2020, 2021; Chen and Zhang 2022 https://doi.org/10.1016/j.ress.2022.108645).
- L49-65: This paragraph is totally unacceptable. Many sentences in this paragraph do not follow a logical flow. They read more like an awkward assemble of incompatible spare parts with fake “made in” labels on them. For example, on L55-58, the capacity of an entity to anticipate, cope with, resist, and recover from the impact of an earthquake actually includes the ability to reduce casualties due to collapse of buildings in an earthquake event.
- L59-61: Following the previous comment, I find it extremely difficult to understand why the authors have to talk about something called “social risks”? Also, I highly doubt that Prof. Susan Cutter has ever mentioned the term “social risks” in her 1996 paper. Can the authors provide the page number for where Cutter mentioned “social risk”?
- L66-83: This paragraph also reads awkward. It is unclear what the main point is for the authors to compile such a paragraph. On L68, the authors even cited the wrong Cutter 1996 paper.
- L86: The authors list logistic regression (LR) as a traditional data analysis tool. How could the authors justify their using LR as an ML method later?
- L92: The statement involving using “ML methodology over regression techniques” is confusing. Supervised ML methodology consists of two basic groups of methods. One is classification and the other is regression. ML regression methodology is part of ML methodology.
- L95-111: This paragraph needs to be rewritten to be concise and professional. It needs to serve the purpose of pointing out the motivation of and rationale for the proposed research. The authors need to read more technical papers published in hazard and disaster journals to get more familiar with the flavor of the introduction sections of papers that can be accepted for publication in this journal and rewrite their introduction accordingly.
- I am not at all convinced how the authors could justify the identification of risk of job loss in the event of an earthquake as a vulnerability factor/predictor. To reduce disaster risk is to reduce the expectation of event losses, which include the loss of livelihoods, or job loss. It is totally pointless to tell practitioners that, to reduce disaster risk including risk of job loss given an earthquake event, we need to reduce the risk of job loss given an earthquake event.
- L116-118: What is this “broad conceptual model”? What are the other models that the authors have compared their model to for demonstrating “a better understanding”? How is the authors' model better?
- L157-160: The listed three reasons for treating social vulnerability as a binary output of ML models are not convincing at all. First, with regression approaches with a numerical output variable, one can also identify vulnerability factors/predictors quantitatively and empirically. Second, the accuracy of predictions does not depend on whether using a classification or regression method. Third, it may be actually easier to interpret the regression results, especially when the regression models are linear or close to being linear.
- L497-498: Without historical data on event losses and recovery processes involved in their modeling efforts, how can the authors make such a bold statement that, based on their research, they “have found that socially, economically, and environmentally vulnerable communities are more likely to suffer disproportionately from disasters”? Where are the actual evidences?
- L521-526: The authors claim that their research can support decision makers and local authorities to improve disaster risk reduction practices. However, I feel hardly confident to agree with this claim after reading the manuscript. ML methods are good at predicting output variable values based on the optimization of parameters of a mathematical model that empirically represents the relationship between the input and output variables based on the data for training. What the authors have achieved is using an index-based approach to create an SVI to indicate social vulnerability during their first phase. However, as I have mentioned previously, this indicator is not social vulnerability itself. It is an indicator of social vulnerability. Without consideration of empirical data on event losses, etc., this indicator itself is not a good indicator of disaster vulnerability. Then, in their second phase, which is what is mainly presented in the manuscript, the authors used ML methods to establish models of the relationships between a set of social variables as the input and their SVI as the output. With these models, the authors suggest that practitioners may identify pertinent social variables to improve disaster management. However, when targeting the identified social variables and changing their values, such an alteration of input variable values will only change the predicted model output value, while the changing of the model output value may have nothing to do with the actual reduction of social vulnerability. ML methodology does not identify causal relationships. I am simply wondering how, from the authors’ perspective, their modeling results can actually benefit local management of seismic disaster risks. Can the authors explain it more in detail? In addition, what potential issues should the practitioners pay attention to when the practitioners are encouraged to apply the authors’ models for guiding earthquake disaster risk reduction practices?
Technical Issues
- L24: Why are the words “Artificial”, “Neural”, and “Network” with their first letters capitalized while the ones on L21 are not? Plus, “(ANN)” should be following the “artificial neural network” on L21.
- L42-44: Why do the authors have to mention three return periods when the mentioning of 100-year return period alone would suffice in this sentence? Also, where is the evidence to support this statement?
- L53-54: What do the authors mean by the phrase “robust and concrete disaster risk reduction”? What does “robust” mean? What does “concrete” mean?
- Table 1: What is “Dept”?
List of My Publications Relevant to this Manuscript
Wang, Y. V., and Sebastian, A. (2021). Community flood vulnerability and risk assessment: An empirical predictive modeling approach. Journal of Flood Risk Management 14(3), e12739. https://doi.org/10.1111/jfr3.12739
Wang, Y. V., Gardoni, P., Murphy, C., and Guerrier, S. (2019). Predicting fatality rates due to earthquakes accounting for community vulnerability. Earthquake Spectra 35(2), 513–536. https://doi.org/10.1193/022618EQS046M
Wang, Y. V., Gardoni, P., Murphy, C., and Guerrier, S. (2020). Worldwide predictions of earthquake casualty rates with seismic intensity measure and socioeconomic data: A fragility-based formulation. Natural Hazards Review 21(2), 04020001. https://doi.org/10.1061/(ASCE)NH.1527-6996.0000356
Wang, Y. V., Gardoni, P., Murphy, C., and Guerrier, S. (2021). Empirical predictive modeling approach to quantifying social vulnerability to natural hazards. Annals of the American Association of Geographers 111(5), 1559–1583. https://doi.org/10.1080/24694452.2020.1823807
Citation: https://doi.org/10.5194/nhess-2022-198-RC1 -
AC1: 'Reply on RC1', Oya Kalaycioglu, 29 Aug 2022
Publisher’s note: this comment was edited on 31 August 2022. The following text is not identical to the original comment, but the adjustments were minor without effect on the scientific meaning.
Responses to Reviewer 1’ comments
Dear Dr. Wang, firstly we appreciate your dedicated time and effort in providing feedback on our proposed manuscript. We thank you for your valuable and insightful comments. We will revise the manuscript carefully in line with the social vulnerability to natural hazards literature. We hope that it will satisfy the proposed requirements for successful publication. Please see below for a point-by-point response to your comments and concerns.
We thank you for praising the technical implementations used in our manuscript. However, we understand your concerns relating to lack of in depth discussion from the perspective of disaster vulnerability and natural hazards. We will revise the manuscript accordingly in the new version.
Before laying out our perspective on your valuable comments; we would like to provide a short background of authors that may help the reviewer’s subjective comments on authors’ “lack of knowledge, confidence, and familiarity in topics related to disaster vulnerability and natural hazards”. Three authors of this paper (E.Y.M., M.K. and S.K) are a part of an ongoing project titled “Tomorrow’s Cities – Disaster Risk In Transition” funded by the United Kingdom Research and Innovation (UKRI) coordinated by the University of Edinburgh, focusing on facilitating transition to risk informed management and decision making in focus cities where the low-income communities are faced with multi-hazard risks (you can refer to these publications about the project: Galasso et. al., 2021; https://doi.org/10.1016/j.ijdrr.2021.102158; Cremen, et. al., https://doi.org/10.1016/j.scitotenv.2021.152552, 2022; Cremen et. al., 10.1029/2021EF002388, 2022). Two of the authors of this paper (M.K. and S.K) have a manuscript titled “An Analysis of Social Vulnerability in a Multi-hazard Urban Context with a Focus on Disaster Risk Reduction Policies: The Case of Sancaktepe, Istanbul” which is under review in the International Journal of Disaster Risk Reduction (IJDRR); E.Y.M is taking part in three different articles that are currently under review in the same journal (IJDRR) that are focusing on different aspects of multi-hazard risk informed decision making. In addition, three of the authors (O.K., E.Y.M. and S.K.) are part of another ongoing project titled “Social Vulnerability Analysis and Development of a Social Vulnerability Scale in the Face of Earthquakes in the Marmara Region, Turkey” which is funded by the Scientific and Technological Research Council of Turkey (TUBITAK).
The previous papers of our authors on social vulnerability and disaster risk the following papers may be relevant for your interest.
Erkan BB, Karanci AN, Kalaycioglu S, Özden AT, Çalışkan I, Özakşehir G. From Emergency Response to Recovery: Multiple Impacts and Lessons Learned from the 2011 Van Earthquakes. Earthquake Spectra. 31, 527-540, 2015. https://doi.org/10.1193/060312EQS205M
Duzgun, H.S.B., Yucemen, M.S., Kalaycioglu, S., et al. An integrated earthquake vulnerability assessment framework for urban areas. Nat. Hazards, 59, 917–947 (2011). https://doi.org/10.1007/s11069-011-9808-6
Kalaycioglu, S., Rittersberger-Tılıç, H., Çelik, K., et al. Integrated Natural Disaster Risk Assessment: The Socio-Economic Dimension of Earthquake Risk in the Urban Area. Geohazards, Proceedings of the 2006 ECI Conference on Geohazards in Lillehammer, Norway, 2006. https://dc.engconfintl.org/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1013&context=geohazards
Specific Comments
- L1: The uncountable noun of vulnerability in disaster research, especially for risk assessment for prediction of future loss, essentially means the propensity of an entity towards loss given a unit exposed value (such as life, economy, health, livelihood, infrastructural functionality, etc.) when the entity has experienced a certain level of hazard strength (such as ground shaking of an earthquake, wind gust of a tornado, inundation of a flood, etc.). In addition, vulnerability is usually also considered to be associated with the tendency towards a long-term suffering due to poor recovery by many, especially social scientists. To facilitate the management of disaster vulnerability before an unwanted event occurs, we may conceptualize disaster vulnerability as a combination of social vulnerability due to social factors, environmental vulnerability due to environmental factors, infrastructural vulnerability due to infrastructural factors, etc., as described in many classical literatures such as Cutter 1996 (https://doi.org/10.1177/030913259602000407). By the way, this Cutter 1996 is not the paper cited in the authors’ manuscript. In the early days without big data on reliable and sufficient historical records of disaster losses, practitioners and scholars needed some method to quickly estimate disaster vulnerability. When it came to social vulnerability, professionals found that using social factors to construct a social vulnerability index (SVI) seemed to be a good approach for measuring social vulnerability. However, SVI itself is not social vulnerability. It is an indicator/predictor of social vulnerability at most. In the title, the authors claim that their research was to identify predictors of social vulnerability. But according to the body of the manuscript, it is clear that what the authors actually did was to identify predictors of an SVI. This is equivalent to building models to establish the relationships between a set of social variables and another set of social variables. What is the point for doing this when the authors could simply add these so-called predictors directly into their SVI?
Authors’ Response: Thank you for raising up this issue and give us a chance to clarify the scope of our paper. As mentioned in the manuscript (Line 119), SVI of the households were identified using various social variables in a previous research. Those variables which are used to construct SVI were selected through extensive literature reviews, expert opinions and factor analysis (please see details in the response to your comment number 5). However, those variables were related to household information as well as intrinsic characteristics of society, such as the perception of risk (in relation with earthquake hazard), the measures taken against risk of an earthquake and cultural values that are determinants on actions to reduce risk. These predictors used to develop SVI could only be obtained via survey research and was not directly available for representing the population in Istanbul.
In this study we used this pre-obtained SVI of the N=41,093 households as the output. Then, we aimed to build a model using ML algorithms, that can predict SV status of these households using widely available predictors which can be obtained from various institutional data bases. Due to the relatively large number of sample size (N=41,093) we used supervised ML algorithms for building a predictive model for SV status. With the ML methodology that we used in this paper we aimed:
- Finding the ML algorithm and model with the best performance in terms of correctly classifying the households in terms of their SV (accuracy, AUC and sensitivity).
- Assessing the most influential predictors – that are available in the data bases – to predict SV of the households.
With this proposed model, one can predict SV status of the households population wise, using the widely available data. Those data are available in data bases of metropolitan / district municipalities, city governorship and public authorities, etc., hence our model proposal has potential to reduce the time and economic burden that one may spend to conduct surveys to calculate SVI and assess households with high SV. Our modelling approach could also be used by decision makers to identifying and prioritizing action towards target groups in the population in the interests of risk mitigation, by classifying the household with the highest SV.
Here we note that, calculating the physical, economic and social losses of different administrative or regions after a disaster is out of the scope of our study. It is also worth highlighting that, this study is not based on “physical” components of risk but rather “social”. We are going to emphasize this better in the new version to make our scope more explicit. In addition, although it is known that quantifying social vulnerability can benefit from the use of historical data, and measures of hazard strengths of earthquake events (Wang et al., 2021, https://doi.org/10.1080/24694452.2020.1823807), in the absence of empirical research and findings; SVI seems the only way to represent the SV. In this regard we use SVI as a proxy for SV. Therefore, we believe that within Istanbul context it is obvious and reasonable for us to focus on social characteristics of a given context rather than physical ones.
- Then, regarding the SVI in the authors’ research, I am not sure how the authors could resolve this second issue satisfactorily. As I have said in the previous comment, the original efforts to create SVIs were limited by a lack of sufficient historical records of event losses. Now, we are in year 2022 in the age of big data. We are having access to a gigantic amount of historical records of event losses to support empirical modeling of disaster vulnerability, socially, environmentally, infrastructurally, or in whatever manner. Why do we have to get stuck with the non-empirically derived SVIs to guide disaster risk reduction practices? For those SVIs that cannot be verified with historical data on losses, they are not reliable for offering any policy suggestions. For those SVIs that can potentially be verified with historical data on losses, it would be more appropriate to directly establish empirical models of disaster vulnerability with calibrations of models on the historical data on losses. Without empirical evidence that directly associates with the expected event losses or poor recovery processes, any SVI is merely a product of social construction based on amplified voices from a seemingly scholarly, but actually perhaps more political than academic, echo chamber that eventually result in the production of some form of emperor’s new clothes more or less.
Authors’ response: Thank you for your comment. We agree with you in the sense that derivation of SVI would benefit from historical data on losses. However, Wang et al. (2022, https://doi.org/10.1038/s41598-022-17878-6) mentioned the two viewpoints for the definition of vulnerability to natural hazards risks:
Page 1, Paragraph 2:
“However, measuring vulnerability to natural hazard risk is challenging due to the complexity of the concept. Vulnerability to natural hazard risks has been defined by considering two viewpoints. The first perceives vulnerability as the system of being physically exposed to a hazard. Exposure is usually measured by the number or density of people and buildings in hazard-affected areas. The second viewpoint considers vulnerability as a more complex capacity of society and individuals to cope with hazard and damage. In this case, vulnerability often refers to social vulnerability that was quantified in the classic work by Cutter et al. proposed a factor analytic framework to construct the social vulnerability index of U.S. counties. This framework has been applied to many countries including Norway, Nepal, China, Bangladesh, Portugal, India, Brazil, Colombia, and Zimbabwe. A wide range of factors used to measure vulnerability include demographic and socioeconomic status, housing, development of facilities, and medical services. Those factors reflect social inequalities, shaping the susceptibility of various groups to hazards and governing their capacity to respond...”
Among these definitions, we considered the second viewpoint in which the social vulnerability was considered as a more complex capacity of society and individuals to anticipate, cope with, resist, and recover from the impact of an earthquake. Our understanding for social vulnerability in risk assessment is more than using social variables for loss estimation. We aim to illustrate how the standard variables for social vulnerability such as socio-economic and demographic characteristics only provides a partial picture of the lived experiences of people when a hazard unfolds into a disaster. Without a clearer picture of how vulnerabilities take a shape in given circumstances with respect to gender and education differences, access to health, social security condition and property ownership of a country, the decision-making process can be significantly hampered for policymakers, resulting in inefficient and ineffective policies.
As advised, we included the information related to housing, development facilities, and access to medical services as an indication of ability to reduce casualties due to collapse of buildings in an earthquake event. In addition, such measures of vulnerability that have been employed in the other countries as stated by Wang et al. (2022, https://doi.org/10.1038/s41598-022-17878-6) have not yet been used in Turkey, where earthquake hazard is classified as high. Therefore, to our knowledge it was the first study in Turkey in which SVI was constructed using the factor analysis framework, as proposed by Cutter et al. (2003).
In the revised manuscript, we will discuss these two viewpoints, explain how the SVI was calculated in detailed and revise the paragraph L49-65 as mentioned in your comment number 11.
In addition, we disagree with the reviewer’s comment on the “empirical” approach; as it is quite challenging to access/find quality empirical information regarding disaster-related topics, particularly in Turkey as in many developing countries and the global south context. We are of course aware of the open data sources and statistics available but none of them is sufficient for making decisions in a complex urban environment such as Istanbul. Such aggregated data sources include too many assumptions and uncertainties that hinder risk informed decisions at the local level. We agree with the value of empirical research but there is a reason that we had to rely on surveys and literature reviews to identify social vulnerability predictors. This is simply because related information is mostly not in place, then even if it is there (gathered by related institutions), it is not shared. Moreover, the literature that we rely on and the indicators that we evaluate are mostly generated from empirical information. We accept that it is not evident in our manuscript. Therefore, we will be expanding the methods section in the revised version to give more details on the “social vulnerability research” conducted in Istanbul previously.
- L2: The title emphasizes “social vulnerability in the event of an earthquake”. While reading the manuscript, however, I could hardly find anything to support the hypothesis that the manuscript is about vulnerability to an earthquake event. The input variables of the ML models have nothing to do with earthquakes. The authors have also failed to show why the output variables of the ML models are for an earthquake event. It seems that the data of the research is based on a survey by the Directorate of Earthquake and Ground Research of Istanbul Metropolitan Municipality. Although the name of this organization involves earthquake, the variables of the survey used by the authors seem to be totally unrelated to earthquake events. There are no measures of hazard strengths of earthquake events, such as local magnitude, moment magnitude, peak ground acceleration, peak ground velocity, peak ground displacement, peak spectral acceleration, modified Mercalli intensity, etc. The authors need to justify why their work is for an earthquake event or for earthquake events.
Authors’ response: The analysis presented in this research is based on survey data, which was carried out by the “the Directorate of Earthquake and Ground Research of Istanbul Metropolitan Municipality”. The data available were collected through face-to-face interviews conducted with the selected households. The interview questions used to derive SVI includes questions related to respondents’ preparedness to earthquake, their risk of perception of earthquakes and their past experience on earthquake incidence. As our data is referenced to a previous study for the Earthquake Department in Municipality as you also mentioned and our academic reference is Cutter’s scheme for variable definition which was also in relation to earthquake disaster, the title emphasizes earthquakes. It is obvious that these variables can also be used for social vulnerability assessment for other disaster types, but we do not intend to claim that selected variables are universal for all types of disasters. In that case, that claim has to be justified scientifically, but this can be a recommendation of our paper for future studies.
As stated above in the response to review number 2, to provide a more explicit background, we are expanding the methods section that will detail the “social vulnerability research” conducted in Istanbul. This previous study (which is referred as phase 1 in the manuscript) is the most comprehensive data source available for constructing SVI for Istanbul, which is used as the output in this manuscript. So that readers can better judge the performance of ML algorithms. This section will highlight how the predictors are selected, assessed and led to the evaluation of social vulnerability in the Istanbul context.
It is also worth highlighting that this study is not based on “physical” components of risk but rather “social”. Therefore, we believe it is obvious and reasonable for us to focus on social characteristics of a given context rather than physical ones. We are going to emphasize this clearly in the new version to make our scope more explicit.
- L27: The term of “social vulnerability risk” or “risk of social vulnerability” that also appears later in the manuscript is exceptionally confusing. As I have referred to the meaning of social vulnerability previously and the word “risk” also has its specific meanings, what is the meaning of this “social vulnerability risk”? For a summary of the meanings associated with the word “risk” in scholarly works, the authors may have a look at Möller 2012 (https://doi.org//10.1007/978-94-007-1433-5_3). It seems that, with their survey data, the authors created two categories, i.e., a high SVI and a low SVI, based on a cutoff score. So, why do the authors have to call these two categories “severe risk of social vulnerability” and “non-severe risk of social vulnerability”, instead of “high SVI” and “low SVI”?
Authors’ response: Thank you for your suggestion. We agree with you in the sense that, using “social vulnerability risk” is not the right terminology and what we intended to refer was “status of social vulnerability level”. We will revise the manuscript based on your suggestions on the definition of risk.
- L152-153: The authors need to introduce more regarding their SVI score, as it is unclear how readers may access an English version of IMM 2018 and Mentese et al. 2019 is just a conference abstract and presentation instead of a peer-reviewed journal publication or technical report. The authors need to transparently and concisely demonstrate why their SVI can effectively measure or indicate social vulnerability of a household in the event of an earthquake. Is their SVI related to an expected loss or loss ratio given a metric of earthquake hazard strength?
Authors’ response: Unfortunately, the previous study explaining the calculation of SVI is available only in Turkish as an institutional report by Istanbul Metropolitan Municipality (IMM, 2018) and it was presented in a conference (Mentese et al., 2019). We are aware that, in the literature there is a great variety of indicators that are being used to assess social vulnerability, as social vulnerability itself has a complex nature. As mentioned in our response to your comment number 2, SVI was calculated following the factor analysis framework and 53 indicators, which were reduced to 7 factors. The strategy which was proposed by Cutter et al. (2003, https://doi.org/10.1111/1540-6237.8402002) was used. Thus, in the construction of SVI, social vulnerability was considered as a capacity of society and individuals to cope with hazard and damage. The indicators chosen for the calculation of SVI have been selected following extensive literature reviews and discussions with experts. While constructing SVI, the residential units’ type and construction, and infrastructure were also used as they are potentially important in understanding social vulnerability, as they may relate with potential economic losses, injuries, and fatalities from natural hazards.
In the revised manuscript will give more details regarding the calculation of SVI, and how it relates to expected losses.
- L266-268: According to the title of the manuscript, the authors’ main work was to use ML algorithms to identify predictors of social vulnerability. First, the initial feature selection of input variables of ML models has nothing to do with ML algorithms, as the authors claim clearly on L163 that the “predictors chosen have been selected following extensive literature reviews” and “discussions with experts”. Then, it is still unclear what ML algorithms the authors have adopted for quantifying the importance of input variables in their predictive classification models. It seems that the main work of the authors was merely to calibrate some supervised ML classification models to map a set of already chosen input variables to their binary output variable of SVI score category. The authors need to at least explain more in a concise manner how they measured the importance of input variables of the ML models.
Authors’ response: The statements of “predictors chosen have been selected following extensive literature reviews, discussions with experts...” relates to the selection of variables for the previously conducted survey research and SVI calculation, as explained above in item 5. As you say, it does not relate to selection of input variables for ML models and it is in the wrong place. Before fitting the ML models, we used feature selection by identifying the predictors with near zero variance and also assessed whether there exists multicollinearity or linear dependency between input variables. We will revise section 2.3 accordingly.
- Regarding the ML classification models, I am not convinced that the authors have the capability to properly compare the prediction results of the models that they have adopted. When dealing with statistical analysis, model validation, resampling, subsampling, etc., it seems that the authors have a lot to say. But when it comes to the ML models, there is almost nothing in the manuscript. For example, what is an SVM? What is an ANN? Are the authors using the multilayer perceptron, convolutional neural network, recurrent neural network, autoencoder network, or something else for their ANN modeling? What is the difference between a CART and an RF? Are the authors capable of explaining all the models they used in their study?
Authors’ Response: Thank you for your comment. We have now included the supplementary file, that we did not include in the initial submission. We intended not to get into details of explaining the ML methods and packages that we have used. But now, we hope that you can find the answers relating to questions in this supplementary document attached with this response.
- The entire Introduction section needs to be thoroughly revised. The authors need to make sure that their introduction is concise, relevant to their research work, and following a story line that is logically sound. For example, on L34-35, the authors start their manuscript with a UN-qualified definition of disaster in terms of coping capacity. However, this definition is irrelevant to the vulnerability quantification at a household level.
Authors’ Response: After thorough consideration of all your comments regarding the Introduction section, the introduction will be revised thoroughly in the next version of the manuscript, and the definition of disaster will be rewritten in accordance with the vulnerability literature.
- L3-37: The statement that the “evolution of an earthquake event into a disaster is typically studied through the lenses of geoscientists, civil engineers and earthquake engineers” is not true. There are many social scientists who have dedicated their research works to studying earthquake risks and disaster vulnerability to earthquakes (see, e.g., Stallings 1995 https://www.routledge.com/Promoting-Risk-Constructing-the-Earthquake-Threat/Stallings/p/book/9780202305455; Bolin and Stanford 1998 https://doi.org/10.4324/9780203028070).
Authors’ Response: We agree with your comment and the sentence in L3-37 does not reflect what we meant to say. We will remove this sentence and replace it by the literature review on social scientists’ work in the area of earthquake risks and disaster vulnerability to earthquakes.
- L37-39: The statement that “it is often forgotten or ignored that the human consequences of disasters are in part derived from the composition of the population and society prior to the event” is false. There are plenty of works looking at the social factors of disaster vulnerability even for quantitative and engineering modeling purposes within the context of earthquake hazard (see, e.g., Peduzzi et al. 2009 https://doi.org/10.5194/nhess-9-1149-2009; Lin et al. 2015 https://doi.org/10.5194/nhess-15-2173-2015; Wang et al. 2019, 2020, 2021; Chen and Zhang 2022 https://doi.org/10.1016/j.ress.2022.108645).
Authors’ Response: After thorough consideration of your comment, we agree that the sentence in L37-39 should be removed.
- L49-65: This paragraph is totally unacceptable. Many sentences in this paragraph do not follow a logical flow. They read more like an awkward assemble of incompatible spare parts with fake “made in” labels on them. For example, on L55-58, the capacity of an entity to anticipate, cope with, resist, and recover from the impact of an earthquake actually includes the ability to reduce casualties due to collapse of buildings in an earthquake event.
Authors’ Response:. There seems to be a misunderstanding of the paragraph, otherwise the words in this comment went beyond the limits of criticism. It can be clearly understood from our manuscript that our intention in the paper is not loss assessment (also covering loss of some social elements) of disasters but we rather concentrate on the intrinsic characteristics of households which also has a place in academic literature for social vulnerability (please see our response to your comment numbers 2 and 5).In the revised manuscript, we will explain the various viewpoints to assess social vulnerability and how those viewpoints relate to the ability to reduce casualties due to collapse of buildings in an earthquake event. We again note that, while constructing SVI, the residential units’ type and construction, and infrastructure were also used as they may relate with potential economic losses, injuries, and fatalities from natural hazards.
- L59-61: Following the previous comment, I find it extremely difficult to understand why the authors have to talk about something called “social risks”? Also, I highly doubt that Prof. Susan Cutter has ever mentioned the term “social risks” in her 1996 paper. Can the authors provide the page number for where Cutter mentioned “social risk”?
Authors’ Response: There is typo in that sentence referenced to Cutter (1996) which has to be social vulnerability not social risk. The social risk terminology issue and the sentence will be fixed in the revised manuscript.
- L66-83: This paragraph also reads awkward. It is unclear what the main point is for the authors to compile such a paragraph. On L68, the authors even cited the wrong Cutter 1996 paper.
Authors’ Response: This paragraph is a continuation of literature as a summary of the context of different types of social vulnerability studies. We will update the introduction in the revised manuscript.
- L86: The authors list logistic regression (LR) as a traditional data analysis tool. How could the authors justify their using LR as an ML method later?
Authors’ Response: Logistic regression is a statistical technique which is used for binary classification problems. Due to large sample size, in our study we used a variety of supervised ML techniques for binary classification in addition to the logistic regression technique. Therefore, we will revise the sentence in section 2.4, Line 180 as: “We developed models for classification of households in terms of their social vulnerability status in the event of an earthquake using supervised machine learning (ML) algorithms: Classification and Regression Tree (CART), Random Forest (RF), Artificial Neural Network (ANN), Support Vector Machine (SVM), Naïve Bayes (NB), and K-Nearest Neighbours (KNN). The predictive performances of these ML models are compared to that of logistic regression model, which is a standard statistical technique used for binary classification…”
- L92: The statement involving using “ML methodology over regression techniques” is confusing. Supervised ML methodology consists of two basic groups of methods. One is classification and the other is regression. ML regression methodology is part of ML methodology.
Authors’ Response: We will revise this sentence as “A relatively small number of researchers have opted to use ML methodology over traditional statistical techniques in vulnerability research”.
- L95-111: This paragraph needs to be rewritten to be concise and professional. It needs to serve the purpose of pointing out the motivation of and rationale for the proposed research. The authors need to read more technical papers published in hazard and disaster journals to get more familiar with the flavor of the introduction sections of papers that can be accepted for publication in this journal and rewrite their introduction accordingly.
Authors’ Response: After thoroughly considering your comments about Introduction section (comments number 8 to 16), we will rewrite Introduction in the revised manuscript.
- I am not at all convinced how the authors could justify the identification of risk of job loss in the event of an earthquake as a vulnerability factor/predictor. To reduce disaster risk is to reduce the expectation of event losses, which include the loss of livelihoods, or job loss. It is totally pointless to tell practitioners that, to reduce disaster risk including risk of job loss given an earthquake event, we need to reduce the risk of job loss given an earthquake event.
Authors’ Response: By job loss, we meant to refer the employment loss following a disaster. Various authors discussed that, economic losses and increase in the number of unemployed in a community will lower the coping capacities and contribute to a slower recovery from the disaster (Cutter et al., 2003, https://doi.org/10.1111/1540-6237.8402002; Chen et a., 2013, https://doi.org/10.1007/s13753-013-0018-6; Llorente-Marrón et al., 2020, https://doi.org/10.3390/su12093574). In accordance with the literature we will change the terminology to “Employment Loss” and describe it in relation to social vulnerability context.
- L116-118: What is this “broad conceptual model”? What are the other models that the authors have compared their model to for demonstrating “a better understanding”? How is the authors' model better?
Authors’ Response: We will revise this sentence as: “We posit that the application of ML methods has a potential to lead to a better understanding of households that would be socially vulnerable in the event of an earthquake in the studies with large datasets”.
- L157-160: The listed three reasons for treating social vulnerability as a binary output of ML models are not convincing at all. First, with regression approaches with a numerical output variable, one can also identify vulnerability factors/predictors quantitatively and empirically. Second, the accuracy of predictions does not depend on whether using a classification or regression method. Third, it may be actually easier to interpret the regression results, especially when the regression models are linear or close to being linear.
Authors’ response: There has been a misunderstanding regarding the construction of SVI. In the previous work, SVI’s of the households were already classified into four categories of SVI: Very low SVI, Low SVI, High SVI and Very High SVI. In the dataset that was used in here, instead of using four SVI categories as an outcome, we used binary SVI (by combining high and low groups) to discriminate the households that requires the most urgent action from all others. Within the scope of our study, we aimed to identify and predict the households with the highest SVI which require the urgent action. Also statistically speaking, the available performance metrics for a multi-class confusion matrix are limited compared to a binary classification problem (Markoulidakis et al., 2021, https://doi.org/ 10.3390/technologies9040081). Therefore, in accordance with our motivation and for the sake of interpretability and ease of application we used binary outcome.
- L497-498: Without historical data on event losses and recovery processes involved in their modeling efforts, how can the authors make such a bold statement that, based on their research, they “have found that socially, economically, and environmentally vulnerable communities are more likely to suffer disproportionately from disasters”? Where are the actual evidences?
Authors’ response: We agree that this is a strong statement. We meant to state that “We have found that households with no social security, potential to loss of employment after an earthquake, living in poor housing conditions and with lower education levels are more vulnerable and have less capacity to cope with and recover from earthquakes.”
- L521-526: The authors claim that their research can support decision makers and local authorities to improve disaster risk reduction practices. However, I feel hardly confident to agree with this claim after reading the manuscript. ML methods are good at predicting output variable values based on the optimization of parameters of a mathematical model that empirically represents the relationship between the input and output variables based on the data for training. What the authors have achieved is using an index-based approach to create an SVI to indicate social vulnerability during their first phase. However, as I have mentioned previously, this indicator is not social vulnerability itself. It is an indicator of social vulnerability. Without consideration of empirical data on event losses, etc., this indicator itself is not a good indicator of disaster vulnerability. Then, in their second phase, which is what is mainly presented in the manuscript, the authors used ML methods to establish models of the relationships between a set of social variables as the input and their SVI as the output. With these models, the authors suggest that practitioners may identify pertinent social variables to improve disaster management. However, when targeting the identified social variables and changing their values, such an alteration of input variable values will only change the predicted model output value, while the changing of the model output value may have nothing to do with the actual reduction of social vulnerability. ML methodology does not identify causal relationships. I am simply wondering how, from the authors’ perspective, their modeling results can actually benefit local management of seismic disaster risks. Can the authors explain it more in detail? In addition, what potential issues should the practitioners pay attention to when the practitioners are encouraged to apply the authors’ models for guiding earthquake disaster risk reduction practices?
Authors’ Response: As mentioned in response to earlier comment numbers 1 and 5, SVI was calculated following the factor analysis framework strategy which was proposed by Cutter et al. (2003, https://doi.org/10.1111/1540-6237.8402002) was used. Being said that, in the revised manuscript will explain the construction of SVI in more detailed. Our motivation was to assess the variables which contribute the most to the social vulnerability of the households, given a pre-developed SVI as an outcome. We agree with you in the sense that our results are not directed towards defining disaster risk reduction policies together with a physical / social loss estimation but it is aimed at identifying the characteristics of the households that are more vulnerable to disasters using variables which are readily available in population-wide data bases of various institutions. Therefore, the groups with certain characteristics which are more vulnerable may be prioritized by decision makers in terms of their needs in order to develop new social assistance schemes that are specifically targeted to disaster vulnerability. Such kind of targeted assistance is missing in the local and national disaster risk reduction policies in Turkey, though it is a part of the Sendai Framework.
We would also like to emphasize that in a case where empirical data is not in place, we find it reasonable to use an index-based methodology to represent SV. We are aware that empirical findings are better in representing, but we used the index results as the empirical findings are not available for İstanbul case. We also think that our approach to use index-based results can be an alternative for countries where the data sources are limited.
Technical Issues
- L24: Why are the words “Artificial”, “Neural”, and “Network” with their first letters capitalized while the ones on L21 are not? Plus, “(ANN)” should be following the “artificial neural network” on L21.
Authors’ Response: We will correct the typo in the revised manuscript.
- L42-44: Why do the authors have to mention three return periods when the mentioning of 100-year return period alone would suffice in this sentence? Also, where is the evidence to support this statement?
Authors’ Response: We will give literature evidence in the revised manuscript and correct the return periods.
- L53-54: What do the authors mean by the phrase “robust and concrete disaster risk reduction”? What does “robust” mean? What does “concrete” mean?
Authors’ Response: In this sentence, we wanted to emphasize the importance of considering social aspects as well as physical ones when making robust and concrete recommendations for disaster risk reduction measures. Not “making robust and concrete DRR”. We will fix this typo. Robust is used for the strength of our model, and concrete refers to the data-driven model that we used in this study.
- Table 1: What is “Dept”?
Authors’ Response: It is a typo. We will correct this typo as “Debt” which refers to households with debt.
-
RC2: 'Reviewer Comment on nhess-2022-198', Jocelyn West, 01 Sep 2022
Overall Comments:
I am very grateful for the opportunity to review this manuscript and learn from this research. This manuscript uses various machine learning (ML) models to classify and analyze households in Istanbul based on social vulnerability, with particular concern for earthquake hazards. The social vulnerability index (SVI) measure used as the outcome variable is based on a novel survey dataset of more than 41,000 households in Istanbul. I believe the analysis of social vulnerability at the household level is a valuable contribution to the fields of vulnerability studies and risk assessment because data availability typically limits vulnerability analysis to a larger geographic unit, such as neighborhoods or municipalities. This study is also among a small-yet-growing number of social vulnerability analyses that incorporate machine learning models, for which I commend the research team. The study uses ML to assess the contributions of individual vulnerability indicators from the SVI to a household’s likelihood of being among the top 20% most vulnerable. In doing so, it has the potential to reveal which indicators matter most for the highest levels of vulnerability in this context. Finally, I appreciate the figures and data visualizations provided in the manuscript and associated web application to aid in the understanding and use of these results.
I have several specific suggestions that I would like the authors to address to help improve the manuscript before publication.
Specific Suggestions:
- An overarching question I would like the authors to address is whether, and how, the ML analyses might be used to improve upon the original SVI measure for Istanbul. What do the ML models add to the understanding of social vulnerability in Istanbul that was not clear previously? I think the manuscript will be strengthened if the authors can better connect these dots for readers.
- I recommend building upon the discussion of the scale at which data were analyzed as a strength of this research. Social vulnerability is not often able to be analyzed at the household level, so that is a significant potential contribution of this research worth discussing.
- The current description of the data and methods used to construct the Istanbul SVI are not yet sufficiently complete and accurate to allow their reproduction by fellow scientists. As I understand, there is a vulnerability index for Istanbul that was created as a Phase 1 of this study. However, its description is not available in English (the language of this journal), and there seems to be no peer reviewed publication associated with the SVI. In light of this, the authors need to describe the construction of the index in detail before using it as the outcome variable in the ML models. Thus, I recommend adding a section to the manuscript describing the construction of the social vulnerability index. I encourage the authors to acknowledge the limitations of various approaches to index construction, including those raised by Spielman et al. (2020) and others. I include some recommended publications on this topic at the end.
- Building upon the explanation of the SVI construction, please also explain why the decision was made to evaluate the vulnerability index as a binary variable that refers to the top 20% of the vulnerability index. For instance, why not use the continuous index as the outcome variable? Why not use the top 25%? Finally, discuss any potential strengths, weaknesses, or consequences of defining vulnerability in this way.
- It was also my impression that the variables in the Istanbul vulnerability index are not specific to earthquake vulnerability in particular. Instead, they seem to refer to social vulnerability more generally and not in relation to any single hazard. This is not necessarily a problem with the data. However, the claim to understand earthquake vulnerability in particular needs to be more fully substantiated, or removed, because the SVI data used in this study do not appear to refer to earthquake-related vulnerability, even if the original household survey did focus on earthquakes.
- I am curious to know how the “risk of job loss” was assessed, and I would be interested to see more explanation of why job loss would be specific to a post-earthquake context, or whether this is a general measure of job insecurity. If possible, please describe briefly how that question was asked on the original survey and whether it was a self-assessment of potential job loss. This will hopefully help readers better understand that variable.
- The last sentence of the abstract suggests that “The machine learning methodology and the findings that we present in this paper can serve as a guidance for decision makers…” I would like to know more about how specifically the machine learning methodology could be used by decision makers. I do understand how the SVI data can be a tool for decision makers, but it is not yet clear to me how the ML component could practically be used by decision makers to reduce vulnerability. Would you argue that this ML analysis can be used to improve the SVI as a decision making tool? If so, explaining how and providing an example of a use case might be helpful.
Grammatical/proofreading:
- In the abstract, please define what the outcome variable is when mentioning it.
- Change “dept” to “debt” in the last line of Table 1
- I recommend avoiding the term “natural disasters” and instead simply using “disasters” or being more specific. (line 63, 103, 453, 529)
- Page 4, Line 121: Avoid using the word “intrinsic” to describe social vulnerability because social science research specifies that vulnerability is not intrinsic to people; it is instead a condition borne by some people under certain conditions. People are also not vulnerable at all times or in all contexts. In other words, social vulnerability does not emerge from the characteristics in the SVI. Rather these variables can sometimes indicate or signal who is more likely to bear vulnerability created by structural forces and power imbalances. It is important that the language used is clear about this.
- Page 25, Line 497: I recommend rephrasing the sentence, “We have found that socially, economically, and environmentally vulnerable communities are more likely to suffer disproportionately from disasters,” because this is not a finding of the current study as it does not evaluate impacts of disasters. Instead, this is a finding of many previous studies that you could perhaps cite here. Simply removing the words “we have found that…” may be sufficient.
- I find the phrase “social vulnerability risk” to be a bit confusing and inconsistent with most literature on this topic. Social vulnerability is typically considered a sub-component of risk, so these phrases should not be combined. Instead, use the phrase “social vulnerability” without the word “risk.” For similar reasons, the phrase “social risk” should be replaced with either “social vulnerability” or “disaster risk,” as appropriate.
Suggested readings on social vulnerability measurement and validation:
- Szczyrba, L., Zhang, Y., Pamukcu, D., Eroglu, D. I., & Weiss, R. (2021). Quantifying the Role of Vulnerability in Hurricane Damage via a Machine Learning Case Study. Natural Hazards Review, 22(3), 04021028. https://doi.org/10.1061/(ASCE)NH.1527-6996.0000460
- Bakkensen, L. A., Fox-Lent, C., Read, L. K., & Linkov, I. (2017). Validating Resilience and Vulnerability Indices in the Context of Natural Disasters. Risk Analysis, 37(5), 982–1004. https://doi.org/10.1111/risa.12677
- Rufat, S., Tate, E., Emrich, C. T., & Antolini, F. (2019). How Valid Are Social Vulnerability Models? Annals of the American Association of Geographers, 109(4), 1131–1153. https://doi.org/10.1080/24694452.2018.1535887
- Spielman, S. E., Tuccillo, J., Folch, D. C., Schweikert, A., Davies, R., Wood, N., & Tate, E. (2020). Evaluating social vulnerability indicators: Criteria and their application to the Social Vulnerability Index. Natural Hazards, 100(1), 417–436. https://doi.org/10.1007/s11069-019-03820-z
- Tate, E. (2012). Social vulnerability indices: A comparative assessment using uncertainty and sensitivity analysis. Natural Hazards, 63(2), 325–347. https://doi.org/10.1007/s11069-012-0152-2
Citation: https://doi.org/10.5194/nhess-2022-198-RC2 -
AC2: 'Reply on RC2', Oya Kalaycioglu, 08 Oct 2022
Responses to Reviewer 2’s comments
Authors’ Response to Overall Comments of the Reviewer 2: Dear Dr. West, firstly we thank you for your valuable and constructive comments that will improve our manuscript. Also, thank you very much for appreciating our use of household level survey data, ML methodology and data visualization. We will revise the manuscript carefully in line with your comments. Please see below for a point-by-point response to your comments and concerns.
Specific Suggestions of the Reviewer2 :
1. An overarching question I would like the authors to address is whether, and how, the ML analyses might be used to improve upon the original SVI measure for Istanbul. What do the ML models add to the understanding of social vulnerability in Istanbul that was not clear previously? I think the manuscript will be strengthened if the authors can better connect these dots for readers.
Authors’ response: We thank the reviewer for this comment, giving us the chance of elaborating on this topic. In this study, we used the pre-obtained SVI score (based on the survey on 41,093 households) as the output, which the reviewer refers to as the original SVI measure. In that study (we refer to as Phase 1), the SVI score and SV category of the households were calculated based on pre-determined variables identified through a literature review. These variables are related to social, economic and demographic properties of the households as well as cultural characteristics of the community, such as the perception of and preparedness against earthquake risk. The required information to represent these predictors were obtained via survey research since such representative data of the population in Istanbul was not directly available from any institutional source before. Therefore, in this paper, we aimed to build a ML based model, that can predict the SV status of these households using publicly available databases without requiring to implementation of a household-based survey. This is valuable to reduce the time and economic burden that one may spend conducting surveys to calculate SVI and assess households with high SV. Our modeling approach could also be used by decision makers to identify and prioritize action towards target groups in the population in the interests of risk mitigation, by classifying the household with respect to their SV level. In this regard, we believe such an ML approach to identify SV is beneficial to effectively and practically interpret the social context better in the face of disasters.
We also believe that such practicality will make it possible to understand SV better as it will enable researchers to adopt this study on different spatial contexts with different variables. Eventually, this will lead to a more comprehensive understanding of the phenomenon. In the revised version of this paper, we are going to highlight this message more explicitly with proper justification.
2. I recommend building upon the discussion of the scale at which data were analyzed as a strength of this research. Social vulnerability is not often able to be analyzed at the household level, so that is a significant potential contribution of this research worth discussing.
Authors' response: We appreciate the reviewer for drawing attention to the importance of measuring social vulnerability at the household level, which is lacking in the literature due to limited data availability. In that sense, our study is one of the few studies which is based on large-scale household survey data to assess social vulnerability predictors. In the revised manuscript, we will emphasize this major contribution of our study by providing details on the methodology of the survey and its outputs.
3. The current description of the data and methods used to construct the Istanbul SVI are not yet sufficiently complete and accurate to allow their reproduction by fellow scientists. As I understand, there is a vulnerability index for Istanbul that was created as a Phase 1 of this study. However, its description is not available in English (the language of this journal), and there seems to be no peer reviewed publication associated with the SVI. In light of this, the authors need to describe the construction of the index in detail before using it as the outcome variable in the ML models. Thus, I recommend adding a section to the manuscript describing the construction of the social vulnerability index. I encourage the authors to acknowledge the limitations of various approaches to index construction, including those raised by Spielman et al. (2020) and others. I include some recommended publications on this topic at the end.
Authors' response: Unfortunately, the previous study explaining the calculation of SVI is available only in Turkish as an institutional report by Istanbul Metropolitan Municipality (IMM, 2018) and it was presented at a conference (Mentese et al., 2019). As the reviewer suggests, in the literature, there are various approaches and indicators to construct a social vulnerability index. In Phase 1 of this study, SVI was calculated following the factor analysis framework and 53 indicators, which were reduced to 7 factors. The strategy which was proposed by Cutter et al. (2003, https://doi.org/10.1111/1540-6237.8402002) was used. The indicators chosen for the calculation of SVI have been selected following extensive literature reviews and discussions with experts. In the revised manuscript, we will add a section which is devoted to describing the social vulnerability index, including a discussion of various approaches in the literature. We are also grateful for the publication suggestions that we believe will improve our article.
4. Building upon the explanation of the SVI construction, please also explain why the decision was made to evaluate the vulnerability index as a binary variable that refers to the top 20% of the vulnerability index. For instance, why not use the continuous index as the outcome variable? Why not use the top 25%? Finally, discuss any potential strengths, weaknesses, or consequences of defining vulnerability in this way.
Authors’ response: In the previous work (phase 1), SVI’s of the households were classified into four categories: Very low SVI, Low SVI, High SVI and Very High SVI. In the dataset that was used in here, instead of using four SVI categories as an outcome, we used binary outcome. Within the scope of our study, we aimed to identify and predict the households with the highest SVI which require the urgent action. Thus, we compared very high SVI group to all others, and this group corresponds to 20% of the households. Also statistically speaking, the available performance metrics for a multi-class confusion matrix are limited compared to a binary classification problem (Markoulidakis et al., 2021, https://doi.org/ 10.3390/technologies9040081). Therefore, in accordance with our motivation and for the sake of interpretability and ease of application we used binary outcome. In the revised manuscript, in the new section that we will add regarding the construction of SVI, we will clarify this.
5. It was also my impression that the variables in the Istanbul vulnerability index are not specific to earthquake vulnerability in particular. Instead, they seem to refer to social vulnerability more generally and not in relation to any single hazard. This is not necessarily a problem with the data. However, the claim to understand earthquake vulnerability in particular needs to be more fully substantiated, or removed, because the SVI data used in this study do not appear to refer to earthquake-related vulnerability, even if the original household survey did focus on earthquakes.
Authors’ response: The outcome of the study - which is the dichotomized SVI – was derived based on the Cutter’s scheme social vulnerability in relation to earthquake hazards. The interview questions used to derive SVI includes questions related to respondents’ preparedness for earthquake, their risk of perception of earthquakes and their experience with earthquake incidence. We agree that the predictors assessed in our current study using ML, can also be used for social vulnerability assessment for other disaster types. However, we do not intend to claim that selected variables are universal for all types of disasters as our outcome is based on earthquake-related vulnerability. We believe that, in the new section explaining the derivation of SVI, how the outcome SVI relates to earthquake will be clarified.
6. I am curious to know how the “risk of job loss” was assessed, and I would be interested to see more explanation of why job loss would be specific to a post-earthquake context, or whether this is a general measure of job insecurity. If possible, please describe briefly how that question was asked on the original survey and whether it was a self-assessment of potential job loss. This will hopefully help readers better understand that variable.
Authors’ response: By risk of job loss, we meant to refer the risk of employment loss following an earthquake. In the original survey, the participants were asked to assess their and other household members’ risk of losing job following an earthquake. As a SV predictor, we considered risk of job loss mainly related to informal employment which may be either in the form of casual, seasonal employment or self-employment, where social security and social insurance registrations are not provided by the employers. Various authors discussed that economic losses and an increase in the number of unemployed in a community will lower the coping capacities and contribute to a slower recovery from the disaster (Cutter et al., 2003, https://doi.org/10.1111/1540-6237.8402002; Chen et a., 2013, https://doi.org/10.1007/s13753-013-0018-6; Llorente-Marrón et al., 2020, https://doi.org/10.3390/su12093574). In accordance with the literature, we will change the terminology to “Employment Loss” and describe it in relation to social vulnerability context.
7. The last sentence of the abstract suggests that “The machine learning methodology and the findings that we present in this paper can serve as a guidance for decision makers…” I would like to know more about how specifically the machine learning methodology could be used by decision makers. I do understand how the SVI data can be a tool for decision makers, but it is not yet clear to me how the ML component could practically be used by decision makers to reduce vulnerability. Would you argue that this ML analysis can be used to improve the SVI as a decision making tool? If so, explaining how and providing an example of a use case might be helpful.
Authors’ response: We thank the reviewer for giving us a chance to clarify this important aspect of our study. Our motivation was to assess the variables which contribute most to the social vulnerability of the households, given a pre-developed SVI as an outcome. By using our proposed ML model, one can input these variables – which are available in data bases of metropolitan / district municipalities, city governorship and public authorities, etc. – to our proposed final ANN model and obtain the status of social vulnerability of the household. Hence our model proposal has potential to reduce the time and economic burden that one may spend to conduct surveys to calculate household level SVI and assess households with high SV. We agree in the sense that our results are not directed towards defining disaster risk reduction policies but based on the social vulnerability characterization, these households can be prioritized by decision makers to develop new social policies targeted to disaster vulnerability. Such targeted policies are particularly missing in the Turkish context as well as in many of the low-mid income countries. We will clarify better the contribution of using ML methodology in relation to predict SV in the revised manuscript.
Grammatical/proofreading suggestions of the Reviewer 2:
1. In the abstract, please define what the outcome variable is when mentioning it.
Authors’ response: We will define the outcome in the abstract in the revised manuscript.
2. Change “dept” to “debt” in the last line of Table 1
Authors’ response: It is a typo. We will correct this typo as “Debt”.
3. I recommend avoiding the term “natural disasters” and instead simply using “disasters” or being more specific. (line 63, 103, 453, 529)
Authors’ response: We will use “disasters” in the revised manuscript instead of “natural disasters”.
4. Page 4, Line 121: Avoid using the word “intrinsic” to describe social vulnerability because social science research specifies that vulnerability is not intrinsic to people; it is instead a condition borne by some people under certain conditions. People are also not vulnerable at all times or in all contexts. In other words, social vulnerability does not emerge from the characteristics in the SVI. Rather these variables can sometimes indicate or signal who is more likely to bear vulnerability created by structural forces and power imbalances. It is important that the language used is clear about this.
Authors’ response: We totally agree and we will substitute “intrinsic” with “specific dynamics leading to social vulnerability/ies”.
5. Page 25, Line 497: I recommend rephrasing the sentence, “We have found that socially, economically, and environmentally vulnerable communities are more likely to suffer disproportionately from disasters,” because this is not a finding of the current study as it does not evaluate impacts of disasters. Instead, this is a finding of many previous studies that you could perhaps cite here. Simply removing the words “we have found that…” may be sufficient.
Authors’ response: We agree that we made a strong statement here. We will revise this sentence accordingly.
6. I find the phrase “social vulnerability risk” to be a bit confusing and inconsistent with most literature on this topic. Social vulnerability is typically considered a sub-component of risk, so these phrases should not be combined. Instead, use the phrase “social vulnerability” without the word “risk.” For similar reasons, the phrase “social risk” should be replaced with either “social vulnerability” or “disaster risk,” as appropriate.
Authors’ response: We thank the reviewer for this suggestion. By “social vulnerability risk” we intended to refer was “status of social vulnerability”. We will revise “social vulnerability risk” phrase accordingly. Also, there is typo in the sentence related to “social risk” referenced to Cutter (1996). It has to be “social vulnerability” as the reviewer has suggested, not “social risk”. The social risk terminology issue and the sentence will be fixed in the revised manuscript.
Citation: https://doi.org/10.5194/nhess-2022-198-AC2