{"title":"NLP-driven crowdsourcing for urban flood monitoring: insights from mumbai","authors":"Aniket Navalkar , Shrabani S. Tripathy , Mayank Gupta , Sanghita Basu , Puja Tripathy , Archismita Banerjee , Sheeba Sekharan , Raghu Murtugudde , Subimal Ghosh","doi":"10.1016/j.scs.2025.106795","DOIUrl":null,"url":null,"abstract":"<div><div>With increasing extreme rainfall events, densely populated megacities face recurrent flooding, necessitating real-time flood monitoring systems for citizens. Social media platforms like Twitter (now X) can complement conventional sensor-based flood monitoring. However, their utility is limited by low geotagging and a data stream cluttered with extraneous content. We take the case study of Mumbai to demonstrate how Natural Language Processing (NLP) can be leveraged to identify flood-relevant tweets. Applying NLP to a historical Twitter dataset (2017 – 2022), we match the text patterns in tweets against lexical keyword datasets to geocode them and classify their sentiment as ‘positive’ or ‘negative’. We observe a decline in the daily positivity ratio of tweets with increasing daily average rainfall for all years. We find that negative tweets are concentrated in areas of orographic rainfall, higher population density, and lower elevations. Further, we analysed extreme rainfall impacts on urban transportation using a network theory-based global efficiency loss (GEL) metric. For this, we configured Q-NEAT, a GIS-based network model, to perform time-optimized routing by integrating geolocated negative tweets alongside hotspots identified by the municipal corporation. We found that Twitter explained an additional 1.13 %, 20.96 %, and 67 % of the total GEL detected on the extreme rainfall days, 16th July 2021 (199.87 mm), 2nd July 2019 (200.12 mm), and 25th June 2018 (153.14 mm). Such impacts are not fully detected by the municipal corporation’s thumb rule for identifying waterlogging based on precipitation amounts and field surveys. Our findings underscore the advantages of crowdsourced data for enhancing urban flood monitoring and impact assessment.</div></div>","PeriodicalId":48659,"journal":{"name":"Sustainable Cities and Society","volume":"132 ","pages":"Article 106795"},"PeriodicalIF":12.0000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sustainable Cities and Society","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210670725006699","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
With increasing extreme rainfall events, densely populated megacities face recurrent flooding, necessitating real-time flood monitoring systems for citizens. Social media platforms like Twitter (now X) can complement conventional sensor-based flood monitoring. However, their utility is limited by low geotagging and a data stream cluttered with extraneous content. We take the case study of Mumbai to demonstrate how Natural Language Processing (NLP) can be leveraged to identify flood-relevant tweets. Applying NLP to a historical Twitter dataset (2017 – 2022), we match the text patterns in tweets against lexical keyword datasets to geocode them and classify their sentiment as ‘positive’ or ‘negative’. We observe a decline in the daily positivity ratio of tweets with increasing daily average rainfall for all years. We find that negative tweets are concentrated in areas of orographic rainfall, higher population density, and lower elevations. Further, we analysed extreme rainfall impacts on urban transportation using a network theory-based global efficiency loss (GEL) metric. For this, we configured Q-NEAT, a GIS-based network model, to perform time-optimized routing by integrating geolocated negative tweets alongside hotspots identified by the municipal corporation. We found that Twitter explained an additional 1.13 %, 20.96 %, and 67 % of the total GEL detected on the extreme rainfall days, 16th July 2021 (199.87 mm), 2nd July 2019 (200.12 mm), and 25th June 2018 (153.14 mm). Such impacts are not fully detected by the municipal corporation’s thumb rule for identifying waterlogging based on precipitation amounts and field surveys. Our findings underscore the advantages of crowdsourced data for enhancing urban flood monitoring and impact assessment.
期刊介绍:
Sustainable Cities and Society (SCS) is an international journal that focuses on fundamental and applied research to promote environmentally sustainable and socially resilient cities. The journal welcomes cross-cutting, multi-disciplinary research in various areas, including:
1. Smart cities and resilient environments;
2. Alternative/clean energy sources, energy distribution, distributed energy generation, and energy demand reduction/management;
3. Monitoring and improving air quality in built environment and cities (e.g., healthy built environment and air quality management);
4. Energy efficient, low/zero carbon, and green buildings/communities;
5. Climate change mitigation and adaptation in urban environments;
6. Green infrastructure and BMPs;
7. Environmental Footprint accounting and management;
8. Urban agriculture and forestry;
9. ICT, smart grid and intelligent infrastructure;
10. Urban design/planning, regulations, legislation, certification, economics, and policy;
11. Social aspects, impacts and resiliency of cities;
12. Behavior monitoring, analysis and change within urban communities;
13. Health monitoring and improvement;
14. Nexus issues related to sustainable cities and societies;
15. Smart city governance;
16. Decision Support Systems for trade-off and uncertainty analysis for improved management of cities and society;
17. Big data, machine learning, and artificial intelligence applications and case studies;
18. Critical infrastructure protection, including security, privacy, forensics, and reliability issues of cyber-physical systems.
19. Water footprint reduction and urban water distribution, harvesting, treatment, reuse and management;
20. Waste reduction and recycling;
21. Wastewater collection, treatment and recycling;
22. Smart, clean and healthy transportation systems and infrastructure;