Application of machine learning-based post-processing to improve crowd-sourced urban rainfall categorizations

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Mohammad Ashar Hussain , Venkatesh Budamala , Rajarshi Das Bhowmik
{"title":"Application of machine learning-based post-processing to improve crowd-sourced urban rainfall categorizations","authors":"Mohammad Ashar Hussain ,&nbsp;Venkatesh Budamala ,&nbsp;Rajarshi Das Bhowmik","doi":"10.1016/j.acags.2025.100255","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential to provide insights into the spatio-temporal variability of urban rainfall. However, crowdsourcing often suffers from inaccuracies in rainfall classification due to inadequately trained participants. This study investigates whether machine learning models can reduce misclassification in crowd-sourced rainfall reports under a synthetic framework. A state-of-the-art stochastic rainfall generator is deployed to simulate high-resolution rainfall over Bangalore, India, traditionally monitored by only two rain gauge stations. The study assumes that the 'synthetic' crowd reports qualitative descriptions of two rainfall characteristics—intensity and duration—based on which a categorization of a rainfall event (normal/moderate/severe) is issued. Ten scenarios are introduced to represent varying degrees of misclassification in the crowd reports. Two machine learning models, random forest and logistic regression, are employed to address these misclassifications and improve the resulting rainfall categorization. The findings indicate that while the random forest model outperforms logistic regression, its performance declines as misclassification rates increase. Moreover, the study highlights that increasing the number of participants significantly enhances the post-processing performance, emphasizing the importance of properly training the crowd for accurate reporting.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100255"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, citizen science has gained significant attention in the hydrometeorological sciences as an alternative to traditional monitoring systems while also raising awareness of natural processes. Crowd participation in reporting rainfall, known as crowdsourcing rainfall, has the potential to provide insights into the spatio-temporal variability of urban rainfall. However, crowdsourcing often suffers from inaccuracies in rainfall classification due to inadequately trained participants. This study investigates whether machine learning models can reduce misclassification in crowd-sourced rainfall reports under a synthetic framework. A state-of-the-art stochastic rainfall generator is deployed to simulate high-resolution rainfall over Bangalore, India, traditionally monitored by only two rain gauge stations. The study assumes that the 'synthetic' crowd reports qualitative descriptions of two rainfall characteristics—intensity and duration—based on which a categorization of a rainfall event (normal/moderate/severe) is issued. Ten scenarios are introduced to represent varying degrees of misclassification in the crowd reports. Two machine learning models, random forest and logistic regression, are employed to address these misclassifications and improve the resulting rainfall categorization. The findings indicate that while the random forest model outperforms logistic regression, its performance declines as misclassification rates increase. Moreover, the study highlights that increasing the number of participants significantly enhances the post-processing performance, emphasizing the importance of properly training the crowd for accurate reporting.
基于机器学习的后处理应用于改进众包城市降雨分类
近年来,公民科学作为传统监测系统的替代方案,在水文气象科学领域获得了极大的关注,同时也提高了人们对自然过程的认识。群众参与降雨报告,被称为众包降雨,有可能提供对城市降雨时空变化的见解。然而,由于参与者训练不足,众包在降雨分类方面经常存在不准确的问题。本研究探讨了在合成框架下,机器学习模型是否可以减少众包降雨报告中的错误分类。部署了最先进的随机降雨发生器来模拟印度班加罗尔的高分辨率降雨,传统上只有两个雨量站监测。研究假设“合成”人群报告两种降雨特征(强度和持续时间)的定性描述,并以此为基础发布降雨事件的分类(正常/中等/严重)。引入了十个场景来表示人群报告中不同程度的错误分类。两种机器学习模型,随机森林和逻辑回归,被用来解决这些错误分类,并改进最终的降雨分类。研究结果表明,虽然随机森林模型优于逻辑回归,但其性能随着误分类率的增加而下降。此外,该研究强调,增加参与者的数量显著提高后处理性能,强调了正确训练人群准确报告的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Computing and Geosciences
Applied Computing and Geosciences Computer Science-General Computer Science
CiteScore
5.50
自引率
0.00%
发文量
23
审稿时长
5 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信