Articles | Volume 25, issue 5
https://doi.org/10.5194/nhess-25-1681-2025
© Author(s) 2025. This work is distributed under the Creative Commons Attribution 4.0 License.
Social sensing a volcanic eruption: application to Kīlauea, 2018
Download
- Final revised paper (published on 12 May 2025)
- Preprint (discussion started on 10 Jan 2024)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on nhess-2024-3', Robert Goldman, 12 May 2024
- AC1: 'Reply on RC1', James Hickey, 26 Jul 2024
-
RC2: 'Comment on Hickey et al - Social sensing a volcanic eruption: application to Kilauea 2018', Anonymous Referee #2, 22 May 2024
- AC2: 'Reply on RC2', James Hickey, 26 Jul 2024
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Publish subject to minor revisions (review by editor) (01 Aug 2024) by Amy Donovan
ED: Reconsider after major revisions (further review by editor and referees) (05 Sep 2024) by Giovanni Macedonio
AR by James Hickey on behalf of the Authors (05 Sep 2024)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (23 Sep 2024) by Giovanni Macedonio
RR by Robert Goldman (25 Nov 2024)
RR by Brianna Corsa (24 Jan 2025)
ED: Publish subject to minor revisions (review by editor) (28 Jan 2025) by Giovanni Macedonio
AR by James Hickey on behalf of the Authors (03 Feb 2025)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (15 Feb 2025) by Giovanni Macedonio
AR by James Hickey on behalf of the Authors (24 Feb 2025)
This manuscript applies social sensing as a novel approach to understanding broad patterns in public reactions to Twitter posts ("tweets") related to and published during the 2018 eruption of Hawai'i's Kīlauea volcano. Specifically, this paper investigates temporal trends in topics of tweets published within Hawai'i and compares these to temporal patterns in user sentiment obtained through the VADER sentiment analysis program. The stated aim of this paper is to test whether social sensing can track and quantify changes in societal actions and emotional responses during an eruptive crisis, and whether those changes are coincident with different stages of the eruption. Another, broader goal--based on the abstract--is to identify and explain how the observed temporal trends in syn-eruptive tweet content and tweet sentiment scores reflect patterns in volcanic activity, civil protection actions and socioenomic pressures, and (possibly) identify a correlation between the posting of tweets containing warning or risk information and the resulting actions taken and/or sentiments felt by members of local communities in Hawai'i dealing with the eruption.
As a scientist who has conducted and published smaller scale qualitative and mixed methods analyses of social media and other communications during the 2018 Kīlauea eruption, I share the goals of the current manuscript's authors, and am excited to see how these authors employed a Twitter API and VADER to analyze and interpret the content and sentiment of over 100,000 tweets. I also commend the authors for presenting the results of this large dataset in concise and easily understandable figures, and for their choice of reader-friendly sequential color scales in several of these figures.
However, some of the main inferences and conclusions need to be explained or illustrated more clearly before the scientific quality of this manuscript is sufficient for publication. Although I consider these to be "minor" revisions, they are important enough that I strongly encourage the authors to incorporate them before publication. These changes are summarized as follows (and explained in more detail in the Specific Comments):
Finally, I have one major stylistic recommendation that I repeat in each of the relevant figures: changing some of the timeseries plots to be colorblind friendly.
Specific Comments (for Abstract and Conclusions, without line numbers)
Changes regarding statements made in the Abstract:
Changes regarding statements made in the Conclusion:
Specific Comments (with line numbers--also included in annotated pdf)
Line 25: Consider stating what distance range is defined as "near" a volcano.
Line 32: Include reference(s) on emotional state or reaction of affected populations.
Line 47: "from inaccessible locations"--explain how exactly these locations are inaccessible: physically, geographically, technologically? Presumably these are locations that allow individuals to access their social media accounts. Would be good to elaborate on this.
Lines 53-54: "strong positive correlation between social media activity and damage losses"--Does this mean higher social media activity with greater damage losses?
Lines 54-55: Does "negative correlation" mean more negative sentiment with higher damage losses?
Line 71: Consider adding a parenthetical definition for "laze" if your target audience is not limited to volcanologists.
Line 80: "driven by rock fall into the lowering lava lake"--this was an interesting phenomenon that is worth citing a source or two for! (Especially because the original hypothesized explanation--lava falling below the water table--was later disproven)
Line 83: "significant additional lower magnitude seismicity"-- What defines "significant" seismicity that is lower magnitude? I ask because this phrase may read more easily if you define it, e.g., "additional lower magnitude seismicity (M_ to M_)," . . . Or, if you still want to convey significant but un-felt seismicity, you might consider rephrasing: "collapses were associated with felt ~M5 earthquakes and additional unfelt but significant seismicity (M_ to M_),"
Lines 97-98: "increasing two-way dialogue and the speed and reach of official communications"-- I would encourage you to also cite Goldman et al. (2024), since two-way dialogues and reach of USGS Volcanoes' social media are significant components of this study, which were not captured in the 2023 paper. Full citation below:
Goldman, R.T., McBride, S.K., Stovall, W.K., & Damby, D.E. (2024). USGS and social media user dialogue and sentiment during the 2018 eruption of Kīlauea Volcano, Hawai’i. Frontiers in Communication, 9:986974. https://doi.org/10.3389/fcomm.2024.986974
Line 121: 'Kilauea'--Does this include lowercase kilauea and spelling with kahakō (Kīlauea), if applicable? Would be good to clarify either way.
Line 138: "Source removal"-- Perhaps rename this as "External source removal" or "Removal based on Source" since it doesn't seem like you're removing the source attribute itself.
Line 142: "Username removal"-- Consider rephrasing to better describe the process. For example, "Removal based on Username".
Line 155: "F1 Score" in Table 1-- I would recommend you define F1 score, perhaps in your description of the Machine Learning Relevance Filter.
Lines 168-169: I would encourage you to cite Goldman et al. (2024)--full citation provided in an earlier comment--and any other studies that have used VADER for short-form social media sentiment analysis.
Line 170: I would also encourage you to cite the original study on VADER:
Hutto, C., and Gilbert, E. (2014). VADER: a parsimonious rule-based model for sentiment analysis of social media text. ICWSM 8, 216–225. doi: 10.1609/icwsm.v8i1.14550)
Line 172: "including the use of intensifiers, negations, and punctuation"-- You should also mention emoticons and slang, and cite Hutto and Gilbert (2014) here.
Lines 185-186: Provide citations for the process of inter-coder reliability checks.
Lines 186-188: Provide citation(s) that explain Fleiss Kappa agreement score and support your judgement that the score range was sufficient to progress.
Line 220: This is a particularly strong paragraph due to citing other sources, and explaining the significance and possibly reasons for the contrast between the high percentage or relevant volcano tweets and low relevance of posts in other social sensing studies natural hazards. Use this as guidance for adding citations in the other portions of your main text as indicated in my comments.
Lines 234-235: cite Hutto & Gilbert (2014).
Line 241: I recommend citing sources that discuss one or both of these explanations.
Line 243: Consider citing a source or two that also captures these sentiments ("personal shock and upset")
Lines 243-244: Are you able to cite an examples of this increased media attention and circulation of news articles on Twitter? You do this further down when describing dramatized/sensationalized accounts of the eruption, so it would be good to see some citations up here, as well!
Line 260: Define whether "Hawai'i" is the State of Hawaii or just the Big Island.
Lines 263-264: In addition to Calabrò et al. (2020), I recommend also citing Goldman et al. (2023), since interview participants from that study consistently stated that news media outlets provided sensationalized eruption coverage.
Line 281: Consider adding a parenthetical definition of paroxysm if your target audience is not exclusively volcanologists. I would also recommend you cite source(s) for the occurrence of this relatively significant event.
Line 288: Are you able to cite the news article reporting on the destruction of homes?
Line 303: I would be careful in how you define correlation here. To me, the shape of the observation, support & concern, and damage & disruption curves are much more linear than either field-based damage assessment curve, but particularly more linear than the "number of buildings in contact with lava" curve. Moreover, the increases are staggered in time, with the tweet curves increasing roughly two weeks before the damage assessment curves do. I'm not confident there is an actual correlation here.
Lines 306-307: I would cite Neal et al. (2019), Science, and any other relevant publications describing this event and the ensuing change in lava flow impacts.
Lines 308-309: As implied in my earlier comment, I am not personally convinced there are correlations, or at least those strong enough for you to consider them favorable. I would urge you to either provide a stronger argument and evidence that this is the case, or soften your claim from "favorable" correlations to "weak" or "minor" correlations.
For example, can you point to other studies that compare cumulative count curves and clearly distinguish between (strongly) correlated data and uncorrelated (or weakly) correlated data? Is there a correlation coefficient or other metric you can provide to quantify the strength of your correlation? I do think your inclusion of the field-based damage assessments are informative and worth presenting, but would suggest you strengthen your argument, or otherwise soften your claim.
Line 329: "socioeconomic pressures"-- This needs to be explained more in the Results sections. I can infer that there are socioeconomic pressures from the word clouds in Figure 4 and the "damage & disruption" tweet counts and field-based damage assessment data in Figure 5, but there are missing explanations of how these data indicate socioeconomic pressure.
Some questions for you to consider:
Lines 332-333: This point--"there is no guarantee those individuals most affected, for example losing property or livelihoods, contributed to the data collection"--is worth exploring further. At a minimum, you should point to a few previously published studies that explore the impacts of this eruption (or other eruptions) on individuals' sentiments or well-being. Then, you should probably explain how your study can be built upon in order to address the uncertainty arising from this anonymised big data approach and thus provide an even more concrete correlation between social media sentiment and on-the-ground impacts to individuals.
Line 334: I would advise clarifying how these news headlines would have contributed to negative sentiment, as you do in the Results. If it is due to sensationalizing, I would state that again here, and also recommend citing Goldman et al. (2023), Volcanica.
Line 344: "Our analyses lend further weight to this finding"-- How? You should explain which correlations illustrate the positive impact of sharing warnings and mitigation actions on user sentiment, and indicate which figures show these correlations.
I've also noted in your Conclusions section that the manuscript does not currently provide a correlation between the occurrence/timing of warnings and risk reduction communications (on the one hand), and community response actions and affect (if any) on user sentiment on the other. Put another way, it is not clear to me that a link has been established between warning/risk reduction communications and community response or sentiment.
Lines 350-352: It's not clear to me what point you are aiming to get across in this paragraph. Are you advocating for the incorporation of social sensing in more scientific studies of crowd-sourced observations? How does that improve upon the approach of Wadsworth et al. (2022)? Or, what is/are the main weaknesses of the Wadworth et al. approach that social sensing addresses?
If you address the above questions, and provide a more natural segue into your next paragraph on the broader implications of automated social sensing data collection and analysis, this paragraph will be much stronger.
Line 353: Or else what? Elaborate on the consequence of having insufficient metadata.
Line 356: "in real-time"-- You should cite some studies that have already utilized real-time social sensing.
Lines 358-359: Goldman et al. (2024), Frontiers in Communication, would be another relevant source to cite for tracking the spread of misinformation on social media during an eruption event.
Lines 359-360: I'm not sure what you mean by "irrelevant" data, and also what "this approach" refers to. Please clarify. (See also my technical correction for this sentence).
Lines 361-362: I recommend you cite studies that have studied posts in different languages and/or across different social media platforms. For the latter, here are two publications:
Hughes, A. L. et al. (2014) Online public communications by police and fire services during the 2012 Hurricane Sandy. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1505–1514. https://doi.org/10.1145/2556288.2557227
Ruan, T., Kong, Q.,McBride, S. K., Sethjiwala, A., and Lv, Q. (2022). Cross-platform analysis of public responses to the 2019 Ridgecrest earthquake sequence on Twitter and Reddit. Sci. Rep. 12:1634. doi: 10.1038/s41598-022-05359-9
Line 363: Provide citations of this phenomenon, particularly if the term "signal" is defined.
Line 364: I would suggest using the term "non-local" instead of "external," since external can mean information external to a particular organization (e.g., an official volcano monitoring agency), regardless of its locality.
Line 366: Regarding the use of traditional structured interviews, I would cite the following publications:
Donovan, A., J. R. Eiser, and R. S. J. Sparks (2014). “Scientists’ views about lay perceptions of volcanic hazard and risk”. Journal of Applied Volcanology 3(1). issn: 2191-5040. doi: 10.1186/s13617-014-0015-5.
Goldman et al. (2023), Volcanica. (Full citation already included in manuscript)
Haynes, K., J. Barclay, and N. Pidgeon (2008). “The issue oftrust and its influence on risk communication during a volcaniccrisis”. Bulletin of Volcanology 70(5), pages 605–621. issn: 1432-0819. doi: 10.1007/s00445-007-0156-z.
Naismith, A., M. T. Armijos, E. A. Barrios Escobar,W. Chigna, and I. M. Watson (2020). “Fireside tales: understanding experiences of previous eruptions among other factors that influence the decision to evacuate from eruptive activity of Volcán de Fuego”. Volcanica 3(2), pages 205–226. issn: 2610-3540. doi: 10.30909/vol.03.02.205226.
Lines 367-368: Regarding the benefit of complementing qualitiative interviews with quantitative social sensing methods, I would cite the following publications:
Creswell, J. W. (2009). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 3rd Edn. Thousand Oaks, CA: Sage Publications, Inc.
Goldman, R.T., McBride, S.K., Stovall, W.K., & Damby, D.E. (2024). USGS and social media user dialogue and sentiment during the 2018 eruption of Kīlauea Volcano, Hawai’i. Frontiers in Communication, 9:986974. https://doi.org/10.3389/fcomm.2024.986974
Graham, O., Thomas, T., Hicks, A., Edwards, S., Juman, A., Ramroop, A., et al. (2023). Facts, Faith and Facebook: science communication during the 2020–2021 La Soufrière, St. Vincent volcanic eruption. SP 539, SP539-2022–289. doi: 10.1144/SP539-2022-289
Ruan, T., Kong, Q.,McBride, S. K., Sethjiwala, A., and Lv, Q. (2022). Cross-platform analysis of public responses to the 2019 Ridgecrest earthquake sequence on Twitter and Reddit. Sci. Rep. 12:1634. doi: 10.1038/s41598-022-05359-9
Line 380: "similar temporal trend"-- This language makes more sense than the stronger claim of "favorable correlation" that I critiqued in your Results section.
Line 382: See overarching Conclusions comment near start of "Specific Comments" section of interactive comments regarding "efficacy of warnings and other official risk reduction communications."
Lines 383-384 (final clause of final sentence of Conclusion): This point needs to be explained more in the Discussion, particularly the transition to real-time data collection and monitoring misinformation.
Line 399: Clarify that readers must have a Zenodo account in order to access.
Technical Corrections (for figures--also included in annotated pdf)
Figure 1(a): The isopachs are hard to see, especially the 50 cm since the color is nearly identical to the lava flows there. Maybe consider making the isopachs dashed black lines and distinguishing them by different dash-lengths, or perhaps just the thickness labels alone? This would allow the isopachs to be legible in grayscale, as well.
Figure 1(b): I really like this figure overall, but can't help noticing how small the event text is in panel (b). May be worth providing a table listing each event and the corresponding aviation color code in the supplement?
Figure 2(a)-2(b): I would suggest assigning a different color than green to the "all" tweets lines in panels (a) and (b). Perhaps black or dark gray, both of which would stand out better to colorblind readers (or users with a grayscale copy of your manuscript).
I would also suggest you make the legend and axis tickmark label fonts slightly larger, at least as large as your axis labels and significant event labels.
Line 215 (Figure 2 caption): I don't see the term "bigram" defined anywhere--I would recommend you do so in the Methods.
Lines 215-216 (Figure 2 caption): Do you have frequency values for the most common and least common bigrams shown in panels c) and d)? It would give a sense of scale and also be useful to compare with the daily tweet frequencies in panels a) and b).
Figure 3: I would suggest changing the color of the timeseries in panel (a) from green to a different color (e.g., yellow-orange) to provide contrast with the timeseries of panel (b) that is colorblind friendly, while maintaining contrast with your red mean value lines.
(That being said, given that you have two separate panels for each timeseries, this suggestion should take lower precedence than my color adjustment suggestions for your other figures, where the timeseries overlap or three or more timeseries are being compared).
As with Figure 2, I would also suggest increasing the font size of your axis tickmark labels.
Figure 4(a)-4(b): Given the larger overall size of this figure, I think your tickmark labels are a good size! However, I would suggest making the colorbar legend label and tickmark numbers slightly larger to match the grid axis labels, especially since you have the Log-10 subscript.
Line 270 (Figure 4 caption): I would explicitly define the scores for positive and negative sentiment score groups.
Line 271 (Figure 4 caption): As with Figure 2, I would recommend you define the max and min frequency counts for the largest and smallest words, respectively, in each wordcloud.
Figure 5: I would suggest choosing a legend color scheme akin to a sequential gradation, such as the thermal color legend used in Figure 1, the cyan to magenta gradient in Figure 4 (a)-(b), or the red-to-brown/black gradient used in your word clouds in Figure 2 panels (c)-(d).
This would benefit colorblind readers or readers with a grayscale version of your manuscript.
(Link with other examples of sequential color gradients, if helpful): https://matplotlib.org/stable/users/explain/colors/colormaps.html
Are you able to correlate the earliest syn-eruption peaks in panels (a), (b), (c), and (e) with specific events or types of tweets? If so, I would also label those. If not, are these peaks attributable to the start of the eruption itself? It may be worth reiterating in the figure caption if that is the case.
As with Figures 2-3, I would suggest making the tickmark labels a larger font, as well as the font for each of your five timeseries categories (observation, warning, etc.). The size of your "Daily Tweets" and event labels are good. Same suggestion for panels (f)-(g) as with (a)-(e): larger font for the axis tickmark labels.
I like your usage of dashed lines in panel (g) to distinguish between lines--you might consider assigning different dash marks or other symbols to your timeseries lines as an alternative or complementary strategy to the sequential gradations I've suggested for this and other figures.
Lines 296-297 (Figure 5 caption): Is the normalization time period for panel (g) identical to the gray "watch" period in the preceding panels? Consider clarifying this.
Line 297 ("building damage data," Figure 5 caption): Is this the same as "contact with lava"? I would advise clarifying this point in the text, since in my mind contact with lava can range in severity from minor exterior damage to complete destruction of a building.
Line 306: Consider adding a vertical line in Figure 5(g) indicating June 3, to help illustrate the contrasting rates of tweet accumulation before and after this date.
Table 2: You might consider a light gray shading background for the bold-face rows and columns as an additional way to create contrast between these and the non-bolded table cells.
Figure A1: I would suggest choosing a legend color scheme akin to a sequential gradation, such as the thermal color legend used in Figure 1, the cyan to magenta gradient in Figure 4 (a)-(b), or the red-to-brown/black gradient used in your word clouds in Figure 2 panels (c)-(d). This would benefit colorblind readers or readers with a grayscale version of your manuscript.
(Link with other examples of sequential color gradients, if helpful): https://matplotlib.org/stable/users/explain/colors/colormaps.html
Figure A2: I would also suggest replacing the green "not news" line with a different color, such as blue, to aid red-green colorblind readers.
For both Figures A1 and A2, I would also suggest larger axis tickmark labels.
Technical Corrections (with line numbers)
Line 311: Consider replacing the highlighted text with this grammatical/stylistic edit: "highlighted a high proportion of these were related"
Line 315: capitalize "Volcanoes" in "USGS Volcanoes"
Lines 335-336: I think the clear message gets lost in how this sentence is structured, which currently reads more like a dependent clause. Is the clear message that Hawaiian tweets with a negative sentiment score show a harmful effect on societal mood? Is the message that this harmful effect is the result of localized eruption impacts? I recommend you rephrase to make the meaning clearer.
Lines 359-360: This sentence may read easier with less qualifying language and without the double negative. (e.g., "This approach may be facilitated through collecting highly relevant data within online volcanic conversation.")
Line 361: You may want to tighten the wording of this sentence. Example: ". . . if improved geolocation information are available, and to compare the insights provided by different languages, social media networks, or messaging applications."
Line 364: I might add: "bias our understanding of events away from their perceptions by local communities"
Line 371: delete "very"
Line 373: add a comma after "eruption"