Exploring the utility of social media data for urban flood impact assessment in data scarce cities

Guo, Kaihua; Guan, Mingfu; Yan, Haochen; Chan, Faith Ka Shun

doi:https://doi.org/10.5194/nhess-2022-109

Preprints

https://doi.org/10.5194/nhess-2022-109

Preprints

08 Apr 2022

| 08 Apr 2022

Status: this preprint was under review for the journal NHESS but the revision was not accepted.

Exploring the utility of social media data for urban flood impact assessment in data scarce cities

Kaihua Guo, Mingfu Guan, Haochen Yan, and Faith Ka Shun Chan

Abstract. The growing amount of social media data is a valuable and rapidly available information source to inform flood response and recovery. In this study, a workflow framework is developed to assess urban flood impacts by extracting and analysing social media data, as well as identifying the intensive public response areas, using the case of 2020 China Chengdu rainstorm-induced flooding. A crawler-algorithm is applied to extract and filter the social media data from the commonly used social platforms, namely Weibo (static data) and Tiktok (dynamic data). Based on the spatiotemporal analysis and the identified 232 flood sites with geological locations, the study shows that, social media activities and precipitation have a significant positive correlation temporally. The temporal evolution analysis of social media topics reveals the process of flooding enabling quickly to determine the severely affected areas. Spatially, social media data can give spatial flood information and social media activities are generally associated with the demographical distribution of users. Based on a flood simulation, the framework can generate reliable data source of urban flooding from social media, which can enhance flood risk modelling with the aid of hydrodynamic model. This study demonstrates the utility of social media data for urban flood assessment.

Received: 29 Mar 2022 – Discussion started: 08 Apr 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1649 KB)

Supplement (31951 KB)

Download & links

Preprint (1649 KB)
Metadata XML
Supplement (31951 KB)
BibTeX
EndNote

Kaihua Guo, Mingfu Guan, Haochen Yan, and Faith Ka Shun Chan

Status: closed

RC1:
'Comment on nhess-2022-109', Anonymous Referee #1, 02 Jun 2022

The manuscript deals with a very relevant topic and is generally well written. It is the result of an ambitious and impressive effort to explore the utility of social media data for urban flood impact assessment in data scarce cities. The method presented here promises in itself to be very valuable, and applicable in future flood impact studies. However, I have only major concern on the flood modeling and validating. Can you provide more informations on the methodology used (modeling process) and some disscusions.

Citation: https://doi.org/10.5194/nhess-2022-109-RC1
- AC1: 'Reply on RC1', Kaihua Guo, 13 Jul 2022
  
  Response: We appreciate the reviewer for the positive comments. For the flood modeling and validation part, we will present more detailed description about the input data, the model, and the setup of model parameters in section 4.1of revised manuscript. The model we are using is the commonly used model based on Shallow Water Equations that has been well studied in past decade. The supportiveness of social media to the simulation results will be further discussed in this section as well, to explore the applicability of social media data in flood impact assessments in data-scare cities. Please noted: as indicated in the manuscript, the main aim of the manuscript is to develop a replicable and optimized social media data processing method in order to better filter and identify valuable flood data from a large amount of ordinary unstructured public social information full of intensive unknown interventions. Therefore, the flood modelling part here is mainly for the purpose of demonstrating the utility of the gathered data.
  
  Citation: https://doi.org/10.5194/nhess-2022-109-AC1
RC2:
'Comment on nhess-2022-109', Zhang Hongping, 12 Jun 2022
Urban flooding is a hot issue in the world. There are still many challenges in urban flood emergency response, one of which is collecting the timely flooding information. This research demonstrates a technological workflow to get flooding information from social media and deeply mine valuable information from them. It’s doubtlessly a good try, though there is still a long way to put it in practical use. Here are some specific comments and suggestions as following.

About the introduction section:

The introduction section seems a little logically weak and needs a better organization. For example:

(1)The content in the first paragraph seems a little far away from the research topic. I would like to introduce the topic directly from urban floods in the world, and then explain why we need to get data from social media for urban flood emergency management and what the difficulties are.

(2)At the beginning of the second paragraph, it would be better to state the urban flood situation in a worldwide view, not only from China, because urban flooding is a worldwide hot issue. China can be used as one of examples.

(3)The traditional observation methods are mentioned as “costly, delay and time-consuming” methods in Line 57. But discussion about the traditional observation methods seems insufficient. It would be better to give more discussion on what kinds of traditional observation methods there are and how costly, delay and time-consuming they are before Line 57.

(4) Line 45 seems to indicate that the existing researches use the unified format data. If there are unified format data, why do we need to use the unformatted data from Weibo and Tiktok? Please give more explanations about it. It would be better to give more details about the existing researches on utilizing the social media data for flood emergency management. What have they done and what are their limits?

(5)From Line 57, two research questions are listed. There might be one more question: how are the data from social media validated before they are used? Or what are the uncertainties when they are used? Please give a little more statements about it.

About Table 2:

(1)Table 2 seems a little confusing. Usually the first row is the header which defines the content of each column, and each column has contents matching with the header. In this table, row 5 seems like a new header, and row 7 doesn't seem to match either headers. It would be better to reorganize this table.

(2) More explanation about time flag in Table 2 would be better. Does it mean: the Pearson correlation coefficient between the hourly precipitation and the hourly tweet number with a certain time lag? If it does, please specifically clarify the hours of precipitation and hours of tweets analyzed during calculating their Pearson correlation coefficient.

(3) Line 212 describes Storm 1 filtering as by excluding relevant discussions. Please give more explanations on how relevant discussions are identified in a huge number of tweets.

About Table 4.

It might be more figurative to transpose this table by placing the check-in data and flood-point data for one district in one row in order to give a better comparison.

About Section 4

Some contents in Section 4 seem a week relation with the results of this research. For example:

(1)In Section 4.1 the author seems to discuss the value or the role of social media for various stakeholders. It’s a fact, but not a result of this research. It might be better to use this content to develop the necessity of this research in the introduction section.

(2)Section 4.3 is titled as “comparison with other platforms and countries”. I would like to make comparison just between researches, weakening the idea of countries. Furthermore, the content in this section seems like the progress of other similar research. Is it possible to put it in the introduction section as existing research progress?

(3)Section 4.2 gives a good example of deeper mining of the value of social media data. It might be better to focus on this example and give more details about it in Section 4.

Others

Some other minor questions or suggestions are seen in the manuscript.
Citation: https://doi.org/10.5194/nhess-2022-109-RC2
- AC2: 'Reply on RC2', Kaihua Guo, 13 Jul 2022
  
  We appreciate the reviewer for the suggestions. The supplement is our response point by point.
  
  Citation: https://doi.org/10.5194/nhess-2022-109-AC2

Status: closed

RC1:
'Comment on nhess-2022-109', Anonymous Referee #1, 02 Jun 2022

The manuscript deals with a very relevant topic and is generally well written. It is the result of an ambitious and impressive effort to explore the utility of social media data for urban flood impact assessment in data scarce cities. The method presented here promises in itself to be very valuable, and applicable in future flood impact studies. However, I have only major concern on the flood modeling and validating. Can you provide more informations on the methodology used (modeling process) and some disscusions.

Citation: https://doi.org/10.5194/nhess-2022-109-RC1
- AC1: 'Reply on RC1', Kaihua Guo, 13 Jul 2022
  
  Response: We appreciate the reviewer for the positive comments. For the flood modeling and validation part, we will present more detailed description about the input data, the model, and the setup of model parameters in section 4.1of revised manuscript. The model we are using is the commonly used model based on Shallow Water Equations that has been well studied in past decade. The supportiveness of social media to the simulation results will be further discussed in this section as well, to explore the applicability of social media data in flood impact assessments in data-scare cities. Please noted: as indicated in the manuscript, the main aim of the manuscript is to develop a replicable and optimized social media data processing method in order to better filter and identify valuable flood data from a large amount of ordinary unstructured public social information full of intensive unknown interventions. Therefore, the flood modelling part here is mainly for the purpose of demonstrating the utility of the gathered data.
  
  Citation: https://doi.org/10.5194/nhess-2022-109-AC1
RC2:
'Comment on nhess-2022-109', Zhang Hongping, 12 Jun 2022
Urban flooding is a hot issue in the world. There are still many challenges in urban flood emergency response, one of which is collecting the timely flooding information. This research demonstrates a technological workflow to get flooding information from social media and deeply mine valuable information from them. It’s doubtlessly a good try, though there is still a long way to put it in practical use. Here are some specific comments and suggestions as following.

About the introduction section:

The introduction section seems a little logically weak and needs a better organization. For example:

(1)The content in the first paragraph seems a little far away from the research topic. I would like to introduce the topic directly from urban floods in the world, and then explain why we need to get data from social media for urban flood emergency management and what the difficulties are.

(2)At the beginning of the second paragraph, it would be better to state the urban flood situation in a worldwide view, not only from China, because urban flooding is a worldwide hot issue. China can be used as one of examples.

(3)The traditional observation methods are mentioned as “costly, delay and time-consuming” methods in Line 57. But discussion about the traditional observation methods seems insufficient. It would be better to give more discussion on what kinds of traditional observation methods there are and how costly, delay and time-consuming they are before Line 57.

(4) Line 45 seems to indicate that the existing researches use the unified format data. If there are unified format data, why do we need to use the unformatted data from Weibo and Tiktok? Please give more explanations about it. It would be better to give more details about the existing researches on utilizing the social media data for flood emergency management. What have they done and what are their limits?

(5)From Line 57, two research questions are listed. There might be one more question: how are the data from social media validated before they are used? Or what are the uncertainties when they are used? Please give a little more statements about it.

About Table 2:

(1)Table 2 seems a little confusing. Usually the first row is the header which defines the content of each column, and each column has contents matching with the header. In this table, row 5 seems like a new header, and row 7 doesn't seem to match either headers. It would be better to reorganize this table.

(2) More explanation about time flag in Table 2 would be better. Does it mean: the Pearson correlation coefficient between the hourly precipitation and the hourly tweet number with a certain time lag? If it does, please specifically clarify the hours of precipitation and hours of tweets analyzed during calculating their Pearson correlation coefficient.

(3) Line 212 describes Storm 1 filtering as by excluding relevant discussions. Please give more explanations on how relevant discussions are identified in a huge number of tweets.

About Table 4.

It might be more figurative to transpose this table by placing the check-in data and flood-point data for one district in one row in order to give a better comparison.

About Section 4

Some contents in Section 4 seem a week relation with the results of this research. For example:

(1)In Section 4.1 the author seems to discuss the value or the role of social media for various stakeholders. It’s a fact, but not a result of this research. It might be better to use this content to develop the necessity of this research in the introduction section.

(2)Section 4.3 is titled as “comparison with other platforms and countries”. I would like to make comparison just between researches, weakening the idea of countries. Furthermore, the content in this section seems like the progress of other similar research. Is it possible to put it in the introduction section as existing research progress?

(3)Section 4.2 gives a good example of deeper mining of the value of social media data. It might be better to focus on this example and give more details about it in Section 4.

Others

Some other minor questions or suggestions are seen in the manuscript.
Citation: https://doi.org/10.5194/nhess-2022-109-RC2
- AC2: 'Reply on RC2', Kaihua Guo, 13 Jul 2022
  
  We appreciate the reviewer for the suggestions. The supplement is our response point by point.
  
  Citation: https://doi.org/10.5194/nhess-2022-109-AC2

Kaihua Guo, Mingfu Guan, Haochen Yan, and Faith Ka Shun Chan

Supplement

https://doi.org/10.5194/nhess-2022-109-supplement

Kaihua Guo, Mingfu Guan, Haochen Yan, and Faith Ka Shun Chan

Viewed

Total article views: 2,041 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,303	674	64	2,041	104	131	132

HTML: 1,303
PDF: 674
XML: 64
Total: 2,041
Supplement: 104
BibTeX: 131
EndNote: 132

Views and downloads (calculated since 08 Apr 2022)

Month	HTML	PDF	XML	Total
Apr 2022	171	61	9	241
May 2022	39	37	0	76
Jun 2022	48	39	5	92
Jul 2022	36	34	5	75
Aug 2022	14	26	0	40
Sep 2022	12	20	1	33
Oct 2022	12	14	1	27
Nov 2022	17	12	0	29
Dec 2022	21	3	0	24
Jan 2023	12	8	0	20
Feb 2023	25	11	0	36
Mar 2023	17	10	0	27
Apr 2023	21	14	1	36
May 2023	19	9	0	28
Jun 2023	18	9	0	27
Jul 2023	20	14	4	38
Aug 2023	9	6	1	16
Sep 2023	24	17	0	41
Oct 2023	14	16	1	31
Nov 2023	14	9	0	23
Dec 2023	9	12	1	22
Jan 2024	26	14	0	40
Feb 2024	15	16	2	33
Mar 2024	11	15	5	31
Apr 2024	24	10	7	41
May 2024	26	17	4	47
Jun 2024	14	9	2	25
Jul 2024	20	10	2	32
Aug 2024	13	11	3	27
Sep 2024	16	24	2	42
Oct 2024	9	10	0	19
Nov 2024	7	9	0	16
Dec 2024	10	7	0	17
Jan 2025	15	14	0	29
Feb 2025	17	7	1	25
Mar 2025	19	12	1	32
Apr 2025	24	8	1	33
May 2025	21	17	2	40
Jun 2025	20	18	0	38
Jul 2025	29	16	0	45
Aug 2025	60	24	1	85
Sep 2025	309	15	1	325
Oct 2025	26	10	1	37

Cumulative views and downloads (calculated since 08 Apr 2022)

Month	HTML	PDF	XML	Total
Apr 2022	171	61	9	241
May 2022	39	37	0	76
Jun 2022	48	39	5	92
Jul 2022	36	34	5	75
Aug 2022	14	26	0	40
Sep 2022	12	20	1	33
Oct 2022	12	14	1	27
Nov 2022	17	12	0	29
Dec 2022	21	3	0	24
Jan 2023	12	8	0	20
Feb 2023	25	11	0	36
Mar 2023	17	10	0	27
Apr 2023	21	14	1	36
May 2023	19	9	0	28
Jun 2023	18	9	0	27
Jul 2023	20	14	4	38
Aug 2023	9	6	1	16
Sep 2023	24	17	0	41
Oct 2023	14	16	1	31
Nov 2023	14	9	0	23
Dec 2023	9	12	1	22
Jan 2024	26	14	0	40
Feb 2024	15	16	2	33
Mar 2024	11	15	5	31
Apr 2024	24	10	7	41
May 2024	26	17	4	47
Jun 2024	14	9	2	25
Jul 2024	20	10	2	32
Aug 2024	13	11	3	27
Sep 2024	16	24	2	42
Oct 2024	9	10	0	19
Nov 2024	7	9	0	16
Dec 2024	10	7	0	17
Jan 2025	15	14	0	29
Feb 2025	17	7	1	25
Mar 2025	19	12	1	32
Apr 2025	24	8	1	33
May 2025	21	17	2	40
Jun 2025	20	18	0	38
Jul 2025	29	16	0	45
Aug 2025	60	24	1	85
Sep 2025	309	15	1	325
Oct 2025	26	10	1	37

Viewed (geographical distribution)

Total article views: 1,948 (including HTML, PDF, and XML) Thereof 1,948 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 30 Oct 2025

Download

Preprint (1649 KB)
Metadata XML

Short summary

This study investigated the utility of social media in urban flood assessment using the case of 2020 China Chengdu flooding. We presented an efficient workflow to collect, process and identify unstructured flood related data in near real-time during a storm event. Based on identified social media database and 232 flood sites, this study shows that social media data can provide valuable spatial and timely information for urban flooding emergency management.


Total:	0
HTML:	0
PDF:	0
XML:	0

Exploring the utility of social media data for urban flood impact assessment in data scarce cities

Supplement

Viewed

Viewed (geographical distribution)

Cited

2 citations as recorded by crossref.