A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal

IF 5.7 1区 地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY
Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen
{"title":"A Bayesian model for quantifying errors in citizen science data: application to rainfall observations from Nepal","authors":"Jessica A. Eisma, Gerrit Schoups, Jeffrey C. Davids, Nick van de Giesen","doi":"10.5194/hess-27-3565-2023","DOIUrl":null,"url":null,"abstract":"Abstract. High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.","PeriodicalId":13143,"journal":{"name":"Hydrology and Earth System Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":5.7000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hydrology and Earth System Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/hess-27-3565-2023","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract. High-quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that (1) each CS observation is subject to a specific error type, each with its own bias and noise, and (2) an observation's error type depends on the static error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the single error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, identifies five error types and sorts CSs into four static, model-inferred communities. In the case study, 73 % of CSs submitted data with errors in fewer than 5 % of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS's assigned community, coupled with model-inferred error probabilities, can identify observations that require verification and provides an opportunity for targeted re-training of CSs based on mistake tendencies.
公民科学数据误差量化的贝叶斯模型:在尼泊尔降雨观测中的应用
摘要高质量的公民科学数据有助于推动科学走向新的发现和对未被观察到的现象的更深入的理解。然而,公民科学家数据的错误结构必须有明确的定义。在公民科学项目中,提交的观察结果中的错误各不相同,它们的发生可能取决于cs的特定特征。本研究建立了CS数据误差类型的图形贝叶斯推理模型。该模型假设:(1)每个CS观测值都有特定的误差类型,每个观测值都有自己的偏差和噪声;(2)观测值的误差类型取决于CS的静态误差社区,而静态误差社区又与提交观测值的CS的特性有关。给定一组CS观测值和相应的真值,模型可以针对特定应用进行校准,产生(i)错误类型和错误社区的数量,(ii)每种错误类型的偏差和噪声,(iii)每个错误社区的错误分布,以及(iv)每个CS所属的单个错误社区。该模型应用于尼泊尔CS降雨观测,确定了5种错误类型,并将CS分为4个静态的、模型推断的群落。在案例研究中,73%的CSs提交的数据误差小于5%。其余的CSs提交的数据有单位、半月板、未知和异常值错误。CS的指定社区,加上模型推断的错误概率,可以识别需要验证的观察结果,并为基于错误倾向的CS提供有针对性的重新训练机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Hydrology and Earth System Sciences
Hydrology and Earth System Sciences 地学-地球科学综合
CiteScore
10.10
自引率
7.90%
发文量
273
审稿时长
15 months
期刊介绍: Hydrology and Earth System Sciences (HESS) is a not-for-profit international two-stage open-access journal for the publication of original research in hydrology. HESS encourages and supports fundamental and applied research that advances the understanding of hydrological systems, their role in providing water for ecosystems and society, and the role of the water cycle in the functioning of the Earth system. A multi-disciplinary approach is encouraged that broadens the hydrological perspective and the advancement of hydrological science through integration with other cognate sciences and cross-fertilization across disciplinary boundaries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信