Application of machine learning-based post-processing to improve crowd-sourced urban rainfall categorizations

IF 3.2 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Applied Computing and Geosciences Pub Date : 2025-06-01 DOI:10.1016/j.acags.2025.100255

Mohammad Ashar Hussain , Venkatesh Budamala , Rajarshi Das Bhowmik

{"title":"Application of machine learning-based post-processing to improve crowd-sourced urban rainfall categorizations","authors":"Mohammad Ashar Hussain , Venkatesh Budamala , Rajarshi Das Bhowmik","doi":"10.1016/j.acags.2025.100255","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential to provide insights into the spatio-temporal variability of urban rainfall. However, crowdsourcing often suffers from inaccuracies in rainfall classification due to inadequately trained participants. This study investigates whether machine learning models can reduce misclassification in crowd-sourced rainfall reports under a synthetic framework. A state-of-the-art stochastic rainfall generator is deployed to simulate high-resolution rainfall over Bangalore, India, traditionally monitored by only two rain gauge stations. The study assumes that the 'synthetic' crowd reports qualitative descriptions of two rainfall characteristics—intensity and duration—based on which a categorization of a rainfall event (normal/moderate/severe) is issued. Ten scenarios are introduced to represent varying degrees of misclassification in the crowd reports. Two machine learning models, random forest and logistic regression, are employed to address these misclassifications and improve the resulting rainfall categorization. The findings indicate that while the random forest model outperforms logistic regression, its performance declines as misclassification rates increase. Moreover, the study highlights that increasing the number of participants significantly enhances the post-processing performance, emphasizing the importance of properly training the crowd for accurate reporting.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100255"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential to provide insights into the spatio-temporal variability of urban rainfall. However, crowdsourcing often suffers from inaccuracies in rainfall classification due to inadequately trained participants. This study investigates whether machine learning models can reduce misclassification in crowd-sourced rainfall reports under a synthetic framework. A state-of-the-art stochastic rainfall generator is deployed to simulate high-resolution rainfall over Bangalore, India, traditionally monitored by only two rain gauge stations. The study assumes that the 'synthetic' crowd reports qualitative descriptions of two rainfall characteristics—intensity and duration—based on which a categorization of a rainfall event (normal/moderate/severe) is issued. Ten scenarios are introduced to represent varying degrees of misclassification in the crowd reports. Two machine learning models, random forest and logistic regression, are employed to address these misclassifications and improve the resulting rainfall categorization. The findings indicate that while the random forest model outperforms logistic regression, its performance declines as misclassification rates increase. Moreover, the study highlights that increasing the number of participants significantly enhances the post-processing performance, emphasizing the importance of properly training the crowd for accurate reporting.

查看原文本刊更多论文

基于机器学习的后处理应用于改进众包城市降雨分类

近年来，公民科学作为传统监测系统的替代方案，在水文气象科学领域获得了极大的关注，同时也提高了人们对自然过程的认识。群众参与降雨报告，被称为众包降雨，有可能提供对城市降雨时空变化的见解。然而，由于参与者训练不足，众包在降雨分类方面经常存在不准确的问题。本研究探讨了在合成框架下，机器学习模型是否可以减少众包降雨报告中的错误分类。部署了最先进的随机降雨发生器来模拟印度班加罗尔的高分辨率降雨，传统上只有两个雨量站监测。研究假设“合成”人群报告两种降雨特征（强度和持续时间）的定性描述，并以此为基础发布降雨事件的分类（正常/中等/严重）。引入了十个场景来表示人群报告中不同程度的错误分类。两种机器学习模型，随机森林和逻辑回归，被用来解决这些错误分类，并改进最终的降雨分类。研究结果表明，虽然随机森林模型优于逻辑回归，但其性能随着误分类率的增加而下降。此外，该研究强调，增加参与者的数量显著提高后处理性能，强调了正确训练人群准确报告的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊