Articles | Volume 25, issue 7
https://doi.org/10.5194/nhess-25-2421-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/nhess-25-2421-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An automated approach for developing geohazard inventories using news: integrating natural language processing (NLP), machine learning, and mapping
BRGM, 3 avenue Claude Guillemin, Orléans, 45060, France
Eurasia Institute of Earth Sciences, Istanbul Technical University, İstanbul, Türkiye
Invited contribution by Aydogan Avcioglu, recipient of the EGU Soil System Sciences Outstanding Student and PhD candidate Presentation Award 2022.
Ogün Demir
Nezahat Gökyiğit Botanic Garden, Biodiversity Information Department, İstanbul, Türkiye
Tolga Görüm
Eurasia Institute of Earth Sciences, Istanbul Technical University, İstanbul, Türkiye
Related authors
No articles found.
Hunter N. Jimenez, Erkan Istanbulluoglu, Tolga Gorum, Thomas A. Stanley, Pukar M. Amatya, Hakan Tanyas, Mehmet C. Demirel, Aykut Akgun, and Deniz Bozkurt
EGUsphere, https://doi.org/10.5194/egusphere-2025-3011, https://doi.org/10.5194/egusphere-2025-3011, 2025
This preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).
Short summary
Short summary
After a major earthquake struck near the Türkiye/Syria border in February 2023, a powerful storm brought intense rainfall to the region, triggering additional landslides. We used satellite data and a physics-based model to map probabilistic landslide hazard using both coseismic and hydrologic drivers. We also explored how the sequence of these disasters affected landslide risk. Finally, we offer a method for seasonal forecasting of landslide hazard in at-risk areas using the historic climate.
S. Coskun, C. Bayik, S. Abdikan, T. Gorum, and F. Balik Sanli
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-M-1-2023, 497–502, https://doi.org/10.5194/isprs-archives-XLVIII-M-1-2023-497-2023, https://doi.org/10.5194/isprs-archives-XLVIII-M-1-2023-497-2023, 2023
Cited articles
Akbas, A.: Seasonality, persistency, regionalization, and control mechanism of extreme rainfall over complex terrain, Theor. Appl. Climatol., 152, 981–997, https://doi.org/10.1007/s00704-023-04440-1, 2023.
Akbas, A., Gorum, T., and Ozdemir, H.: FlooDOT (FlooD inventory Of Türkiye): A comprehensive flood inventory and its spatio-temporal analyses, J. Flood Risk Manag., under review, 2025.
Altinok, D.: A diverse set of freely available linguistic resources for Turkish, P. Annu. Meet. Assoc. Comput. Linguist., 1, 13739–13750, 2023.
Arslan, H., Baltaci, H., Demir, G., and Ozcan, H. K.: Spatiotemporal changes and background atmospheric factors associated with forest fires in Türkiye, Environ. Monit. Assess., 196, 10, https://doi.org/10.1007/s10661-024-13027-w, 2024.
Avcıoğlu, A., Görüm, T., Akbaş, A., Moreno-de las Heras, M., Yıldırım, C., and Yetemen, Ö.: Regional distribution and characteristics of major badland landscapes in Turkey, Catena, 218, 106562, https://doi.org/10.1016/j.catena.2022.106562, 2022.
Avcıoğlu, A., Akbaş, A., Görüm, T., and Yetemen, Ö.: The compound effect of topography, weather, and fuel type on the spread and severity of the largest wildfire in NW of Turkey, Nat. Hazards, 121, 3219–3237, https://doi.org/10.1007/s11069-024-06885-7, 2024.
Battistini, A., Segoni, S., Manzo, G., Catani, F., and Casagli, N.: Web data mining for automatic inventory of geohazards at national scale, Appl. Geogr., 43, 147–158, https://doi.org/10.1016/j.apgeog.2013.06.012, 2013.
Bhuyan, K., Rana, K., Ferrer, J. V., Cotton, F., Ozturk, U., Catani, F., and Malik, N.: Landslide topology uncovers failure movements, Nat. Commun., 15, 2633, https://doi.org/10.1038/s41467-024-46741-7, 2024.
Brown, J. D., Spencer, T., and Moeller, I.: Modelling storm surge flooding of an urban area with particular reference to modelling uncertainties: a case study of Canvey Island, United Kingdom, Water Resour. Res., 43, W06402, https://doi.org/10.1029/2005WR004597, 2007.
Clemens, K.: Geocoding with OpenStreetMap data, in: Proceedings of GEOProcessing 2015, 22–27 February 2015 Lisbon, Portugal, ISBN 978-1-61208-383-4, 2015.
CRED: EM-DAT [Dataset], CRED/UCLouvain, Brussels, Belgium, (version 2023.12) https://www.emdat.be (last access: 22 December 2023), 2023.
Delaney, K. B. and Evans, S. G.: The 2000 Yigong landslide (Tibetan Plateau), rockslide-dammed lake and outburst flood: review, remote sensing analysis, and process modelling, Geomorphology, 246, 377–393, https://doi.org/10.1016/j.geomorph.2015.06.020, 2015.
Demir, O. and Avcıoğlu, A.: tr-news-scraper: Turkish news articles scraper based on specified keywords, software version 0.1.0, Zenodo, https://doi.org/10.5281/zenodo.10509650, 2024 (data available at: https://github.com/demirogun/tr-news-scraper, last access: 20 January 2025).
Doğan, U. and Yılmaz, M.: Natural and induced sinkholes of the Obruk Plateau and Karapınar-Hotamış Plain, Turkey, J. Asian Earth Sci., 40, 496–508, https://doi.org/10.1016/j.jseaes.2010.09.014, 2011.
Duman, T. Y., Can, T., and Emre, O.: 1:1 500 000 scaled Turkish landslide inventory map, Gen. Dir. Miner. Res. Explor., Spec. Publ., 27, Ankara, ISBN 978-605-4075-84-3, 2011.
Ekberzade, B., Yetemen, O., Sen, O. L., and Dalfes, H. N.: Simulating the potential forest ranges in an old land: the case for Turkey's forests, Biodivers. Conserv., 31, 3217–3236, https://doi.org/10.1007/s10531-022-02485-8, 2022.
Fan, X., Scaringi, G., Korup, O., West, A. J., van Westen, C. J., Tanyas, H., Hovius, N., Hales, T. C., Jibson, R. W., Allstadt, K. E., Zhang, L., Evans, S. G., Xu, C., Li, G., Pei, X., Xu, Q., and Huang, R.: Earthquake-induced chains of geologic hazards: patterns, mechanisms, and impacts, Rev. Geophys., 57, 421–503, https://doi.org/10.1029/2018RG000626, 2019.
Fan, X., van Westen, C. J., Xu, Q., Gorum, T., and Dai, F.: Analysis of landslide dams induced by the 2008 Wenchuan earthquake, J. Asian Earth Sci., 57, 25–37, https://doi.org/10.1016/j.jseaes.2012.06.002, 2012.
Fang, Z., Tanyas, H., Gorum, T., Dahal, A., Wang, Y., and Lombardo, L.: Speech-recognition in landslide predictive modelling: a case for a next generation early warning system, Environ. Model. Softw., 170, 105833, https://doi.org/10.1016/j.envsoft.2023.105833, 2023.
Fidan, S. and Görüm, T.: Türkiye'de ölümcül heyelanların dağılım karakteristikleri ve ulusal ölçekte öncelikli alanların belirlenmesi, Turk. Geogr. J., 123–134, https://doi.org/10.17211/tcd.731596, 2020.
Franceschini, R., Rosi, A., Catani, F., and Casagli, N.: Detecting information from Twitter on landslide hazards in Italy using deep learning models, Geoenviron. Disasters, 11, 22, https://doi.org/10.1186/s40677-024-00279-4, 2024.
Froude, M. J. and Petley, D. N.: Global fatal landslide occurrence from 2004 to 2016, Nat. Hazards Earth Syst. Sci., 18, 2161–2181, https://doi.org/10.5194/nhess-18-2161-2018, 2018.
Gallegos, H. A., Schubert, J. E., and Sanders, B. F.: Two-dimensional, high-resolution modeling of urban dam-break flooding: a case study of Baldwin Hills California, Adv. Water Resour., 32, 1323–1335, 2009.
Gökkaya, E., Gutiérrez, F., Ferk, M., and Görüm, T.: Sinkhole development in the Sivas gypsum karst, Turkey, Geomorphology, 386, 107746, https://doi.org/10.1016/j.geomorph.2021.107746, 2021.
Gómez, D., García, E. F., and Aristizábal, E.: Spatial and temporal landslide distributions using global and open landslide databases, Nat. Hazards, 117, 25–55, https://doi.org/10.1007/s11069-023-05848-8, 2023.
Görüm, T. and Fidan, S.: Spatiotemporal variations of fatal landslides in Turkey, Landslides, 18, 1691–1705, https://doi.org/10.1007/s10346-020-01580-7, 2021.
Görüm, T., Fan, X., van Westen, C. J., Huang, R. Q., Xu, Q., Tang, C., and Wang, G.: Distribution pattern of earthquake-induced landslides triggered by the 12 May 2008 Wenchuan earthquake, Geomorphology, 133, 152–167, https://doi.org/10.1016/j.geomorph.2010.12.030, 2011.
Gregory, I., Donaldson, C., Murrieta-Flores, P., and Rayson, P.: Geoparsing, GIS, and textual analysis: current developments in spatial humanities research, Int. J. Humanit. Arts Comput., 9, 1–14, 2015.
Guha-Sapir, D., Below, R., and Hoyois, P.: EM-DAT: International Disaster Database, Université Catholique de Louvain, Brussels, Belgium, 2015.
Haltas, I., Yildirim, E., Oztas, F., and Demir, I.: A comprehensive flood event specification and inventory: 1930–2020 Turkey case study, Int. J. Disast. Risk Re., 56, 102086, https://doi.org/10.1016/j.ijdrr.2021.102086, 2021.
Haque, U., Blum, P., Da Silva, P. F., Andersen, P., Pilz, J., Chalov, S. R., Malet, J. P., Auflič, M. J., Andres, N., Poyiadji, E., and Lamas, P. C.: Fatal landslides in Europe, Landslides, 13, 1545–1554, https://doi.org/10.1007/s10346-016-0689-3, 2016.
Harcup, T. and O'Neill, D.: What is news?, Journal. Stud., 18, 1470–1488, https://doi.org/10.1080/1461670X.2016.1150193, 2017.
Hickey, J., Young, J., Spruce, M., Pandit, R., Williams, H., Arthur, R., Stovall, W., and Head, M.: Social sensing a volcanic eruption: application to Kīlauea, 2018, Nat. Hazards Earth Syst. Sci., 25, 1681–1696, https://doi.org/10.5194/nhess-25-1681-2025, 2025.
Hickman, L., Thapa, S., Tay, L., Cao, M., and Srinivasan, P.: Text preprocessing for text mining in organizational research: review and recommendations, Organ. Res. Methods, 25, 114–146, https://doi.org/10.1177/1094428120971683, 2022.
Hu, Y.: Geo-text data and data-driven geospatial semantics, Geogr. Compass, 12, 1–19, https://doi.org/10.1111/gec3.12404, 2018.
Jones, R. L., Guha-Sapir, D., and Tubeuf, S.: Human and economic impacts of natural disasters: can we trust the global data?, Sci. Data, 9, 1–7, https://doi.org/10.1038/s41597-022-01667-x, 2022.
Kirschbaum, D. B., Adler, R., Hong, Y., Hill, S., and Lerner-Lam, A.: A global landslide catalog for hazard applications: method, results, and limitations, Nat. Hazards, 52, 561–575, https://doi.org/10.1007/s11069-009-9401-4, 2010.
Kirschbaum, D., Stanley, T., and Zhou, Y. P.: Spatial and temporal analysis of a global landslide catalog, Geomorphology, 249, 4–15, https://doi.org/10.1016/j.geomorph.2015.03.016, 2015.
Kitazawa, K. and Hale, S. A.: Social media and early warning systems for natural disasters: a case study of Typhoon Etau in Japan, Int. J. Disaster Risk Re., 52, 101926, https://doi.org/10.1016/j.ijdrr.2020.101926, 2021.
Koç, G., Petrow, T., and Thieken, A.: Analysis of the most severe flood events in Turkey (1960–2014): which triggering mechanisms and aggravating pathways can be identified?, Water, 12, 1562, https://doi.org/10.3390/w12061562, 2020.
Lai, K., Porter, J. R., Amodeo, M., Miller, D., Marston, M., and Armal, S.: A natural language processing approach to understanding context in the extraction and geocoding of historical floods, storms, and adaptation measures, Inf. Process. Manag., 59, 102735, https://doi.org/10.1016/j.ipm.2021.102735, 2022.
Lee, D. and Seung, H.: Learning the parts of objects by non-negative matrix factorization, Nature, 401, 788–791, https://doi.org/10.1038/44565, 1999.
Lee, D. and Seung, H.: Algorithms for non-negative matrix factorization, Adv. Neur. In., 13, 556–562, 2001.
Liu, X., Kar, B., Montiel Ishino, F. A., Zhang, C., and Williams, F.: Assessing the reliability of relevant tweets and validation using manual and automatic approaches for flood risk communication, ISPRS international J. Geo-Info., 9, 532, 2020.
Loche, M., Alvioli, M., Marchesini, I., Bakka, H., and Lombardo, L.: Landslide susceptibility maps of Italy: lesson learnt from dealing with multiple landslide types and the uneven spatial distribution of the national inventory, Earth-Sci. Rev., 232, 104125, https://doi.org/10.1016/j.earscirev.2022.104125, 2022.
MacEachren, A. M., Jaiswal, A., Robinson, A. C., Pezanowski, S., Savelyev, A., Mitra, P., and Blanford, J.: SensePlace2: GeoTwitter analytics support for situational awareness, in: 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), Providence, Rhode Island, USA 23–28 October, 2011, https://doi.org/10.1109/VAST.2011.6102456, 181–190, IEEE, 2011.
Madruga de Brito, M., Kuhlicke, C., and Marx, A.: Near-real-time drought impact assessment: a text mining approach on the 2018/2019 drought in Germany, Environ. Res. Lett., 15, 104035, https://doi.org/10.1088/1748-9326/aba4ca, 2020.
Madruga de Brito, M., Sodoge, J., Kreibich, H., and Kuhlicke, C.: Comprehensive assessment of flood socioeconomic impacts through text-mining, Water Resour. Res., 61, 1, https://doi.org/10.1029/2024WR037813, 2025.
Mason, D. C., Horritt, M. S., Hunter, N. M., and Bates, P. D.: Use of fused airborne scanning laser altimetry and digital map data for urban flood modelling, Hydrol. Process., 21, 1436–1447, 2007.
Meena, S. R., Soares, L. P., Grohmann, C. H., van Westen, C., Bhuyan, K., Singh, R. P., Floris, M., and Catani, F.: Landslide detection in the Himalayas using machine learning algorithms and U-Net, Landslides, 19, 1209–1229, https://doi.org/10.1007/s10346-022-01861-3, 2022.
Newton, J. G.: Development of sinkholes resulting from man's activities in the Eastern United States, U. S. Geol. Surv. Circular, https://doi.org/10.3133/cir968, Vol. 968, 1987.
OpenAI: DALL⋅E: AI image generation model, https://openai.com/dall-e (last access: 5 January 2025), 2025.
Orhan, O., Haghshenas Haghighi, M., Demir, V., Gökkaya, E., Gutiérrez, F., and Al-Halbouni, D.: Spatial and temporal patterns of land subsidence and sinkhole occurrence in the Konya Endorheic Basin, Turkey, Geosciences, 14, 5, https://doi.org/10.3390/geosciences14010005, 2023.
Ozdemir, H., Sampson, C. C., de Almeida, G. A. M., and Bates, P. D.: Evaluating scale and roughness effects in urban flood modelling using terrestrial LIDAR data, Hydrol. Earth Syst. Sci., 17, 4015–4030, https://doi.org/10.5194/hess-17-4015-2013, 2013.
Öztürk, M. G., Bekar, İ., and Tavşanoğlu, Ç.: Rethinking lightning-induced fires: spatial variability and implications for management policies, Forest Ecol. Manag., 572, 122262, https://doi.org/10.1016/j.foreco.2024.122262, 2024.
Paatero, P.: Least squares formulation of robust non-negative factor analysis, Chemometr. Intell. Lab., 37, 23–35, 1997.
Paatero, P. and Tapper, U.: Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values, Environmetrics, 5, 111–126, 1994.
Parise, M., De Waele, J., and Gutierrez, F.: Engineering and environmental problems in karst – an introduction, Eng. Geol., 99, 91–94, https://doi.org/10.1016/j.enggeo.2007.11.009, 2008.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.: Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12, 2825–2830, 2011.
Peng, M. and Zhang, L.: Analysis of human risks due to dam break floods – Part 2: Application to Tangjiashan landslide dam failure, Nat. Hazards, 64, 1899–1923, https://doi.org/10.1007/s11069-012-0336-9, 2012.
Petley, D.: Global patterns of loss of life from landslides, Geology, 40, 927–930, https://doi.org/10.1130/G33217.1, 2012.
Pita Costa, J., Rei, L., Bezak, N., Mikoš, M., Massri, M. B., Novalija, I., and Leban, G.: Towards improved knowledge about water-related extremes based on news media information captured using artificial intelligence, Int. J. Disaster Risk Re., 100, 104172, https://doi.org/10.1016/j.ijdrr.2023.104172, 2024.
Rana, K., Ozturk, U., and Malik, N.: Landslide geometry reveals its trigger, Geophys. Res. Lett., 48, e2020GL090848, https://doi.org/10.1029/2020GL090848, 2021.
Rehurek, R. and Sojka, P.: Software framework for topic modelling with large corpora, in: Proc. LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, 22 May, University of Malta, 46–50, https://doi.org/10.13140/2.1.2393.1847, 2010.
Restrepo-Estrada, C., de Andrade, S. C., Abe, N., Fava, M. C., Mendiondo, E. M., and de Albuquerque, J. P.: Geo-social media as a proxy for hydrometeorological data for streamflow estimation and to improve flood monitoring, Comput. Geosci., 111, 148–158, https://doi.org/10.1016/j.cageo.2017.10.010, 2018.
Röder, M., Both, A., and Hinneburg, A.: Exploring the space of topic coherence measures, in: Proc. 8th ACM Int. Conf. Web Search Data Min., Shanghai China, 2–6 February 2015, 399–408, https://doi.org/10.1145/2684822.2685324, 2015.
San-Miguel-Ayanz, J., Schulte, E., Schmuck, G., Camia, A., Strobl, P., Liberta, G., Giovando, C., Boca, R., Sedano, F., Kempeneers, P., McInerney, D., Whitmore, C., Santos de Oliveira, S., Rodrigues, M., Durrant, T., Corti, P., Oehler, F., Vilar, L., and Amatulli, G.: Comprehensive monitoring of wildfires in Europe: The European Forest Fire Information System (EFFIS), in: Approaches to Managing Disaster – Assessing Hazards, Emergencies and Disaster Impacts, edited by: Tiefenbacher, J., InTech, 87–105, ISBN 978-953-51-0294-6, 2012.
Sodoge, J., Kuhlicke, C., and Madruga de Brito, M.: Automatized spatio-temporal detection of drought impacts from newspaper articles using natural language processing and machine learning, Weather Clim. Extremes, 41, 100574, https://doi.org/10.1016/j.wace.2023.100574, 2023.
Sodoge, J., Kuhlicke, C., Mahecha, M. D., and de Brito, M. M.: Text mining uncovers the unique dynamics of socio-economic impacts of the 2018–2022 multi-year drought in Germany, Nat. Hazards Earth Syst. Sci., 24, 1757–1777, https://doi.org/10.5194/nhess-24-1757-2024, 2024.
Stein, L., Mukkavilli, S. K., Pfitzmann, B. M., Staar, P. W. J., Ozturk, U., Berrospi, C., Brunschwiler, T., and Wagener, T.: Wealth over woe: global biases in hydro-hazard research, Earths Future, 12, e2024EF004590, https://doi.org/10.1029/2024EF004590, 2024.
Syed, S. and Spruit, M.: Full-text or abstract? Examining topic coherence scores using latent Dirichlet allocation, in: 2017 IEEE Int. Conf. Data Sci. Adv. Anal. (DSAA), Tokyo, Japanm 19–21 October 2017, 165–174 pp., IEEE, https://doi.org/10.1109/DSAA.2017.61, 2017.
Tanyaş, H., van Westen, C. J., Allstadt, K. E., Nowicki Jessee, M. A., Görüm, T., Jibson, R. W., Godt, J. W., Sato, H. P., Schmitt, R. G., Marc, O., and Hovius, N.: Presentation and analysis of a worldwide database of earthquake-induced landslide inventories, J. Geophys. Res. Earth, 122, 1991–2015, https://doi.org/10.1002/2017JF004236, 2017.
Tanyaş, H., Görüm, T., Fadel, I., Yıldırım, C., and Lombardo, L.: An open dataset for landslides triggered by the 2016 Mw 7.8 Kaikōura earthquake, New Zealand, Landslides, 19, 1405–1420, https://doi.org/10.1007/s10346-022-01869-9, 2022.
Tatli, H. and Türkeş, M.: Climatological evaluation of Haines Forest Fire Weather Index over the Mediterranean Basin, Meteorol. Appl., 21, 545–552, https://doi.org/10.1002/met.1367, 2014.
Taylor, F. E., Malamud, B. D., Freeborough, K., and Demeritt, D.: Enriching Great Britain's national landslide database by searching newspaper archives, Geomorphology, 249, 52–68, 2015.
Türkeş, M. and Tatli, H.: Use of the spectral clustering to determine coherent precipitation regions in Turkey for the period 1929–2007, Int. J. Climatol., 31, 2055–2067, https://doi.org/10.1002/joc.2212, 2011.
Yetmen, H. and Aytaç, A. S.: Influence of the meteorological conditions on forest fires in winter and spring in Eastern Black Sea Region: case study on Çamburnu (Sürmene) forest fire, J. Curr. Res. Soc. Sci., 7, 1–15, https://doi.org/10.26579/jocress-7.2.26, 2017.
UNISDR: Sendai Framework for Disaster Risk Reduction 2015–2030, United Nations Office for Disaster Risk Reduction, Geneva, 2015.
U.S. Geological Survey: Magnitude 7.9 – Eastern Sichuan, China, 12 May 2008 06:28:01 UTC, https://earthquake.usgs.gov/earthquakes/eventpage/usp000g650 (last access: 10 May 2024), 2008.
Waltham, A. C. and Fookes, P. G.: Engineering classification of karst ground conditions, Q. J. Eng. Geol. Hydroge., 36, 101–118, 2003.
Short summary
We demonstrate an approach for the development of inventories from internet sources to geolocalize geohazard incidents. We created a tool that autonomously gets news, processes it using natural language processing, and then builds inventories. Consequently, we present spatiotemporal inventories for geohazards, resulting in a total of 13 940 incidents between 1997 and 2023 in Türkiye. Our alternative and easy-to-implement development inventory method aids geohazard management and resilience.
We demonstrate an approach for the development of inventories from internet sources to...
Altmetrics
Final-revised paper
Preprint