Automated quality control for mobile data collection

Benjamin E. Birnbaum, B. DeRenzi, A. Flaxman, Neal Lesh
{"title":"Automated quality control for mobile data collection","authors":"Benjamin E. Birnbaum, B. DeRenzi, A. Flaxman, Neal Lesh","doi":"10.1145/2160601.2160603","DOIUrl":null,"url":null,"abstract":"Systematic interviewer error is a potential issue in any health survey, and it can be especially pernicious in low- and middle-income countries, where survey teams may face problems of limited supervision, chaotic environments, language barriers, and low literacy. Survey teams in such environments could benefit from software that leverages mobile data collection tools to provide solutions for automated data quality control. As a first step in the creation of such software, we investigate and test several algorithms that find anomalous patterns in data. We validate the algorithms using one labeled data set and two unlabeled data sets from two community outreach programs in East Africa. In the labeled set, some of the data is known to be fabricated and some is believed to be relatively accurate. The unlabeled sets are from actual field operations. We demonstrate the feasibility of tools for automated data quality control by showing that the algorithms detect the fake data in the labeled set with a high sensitivity and specificity, and that they detect compelling anomalies in the unlabeled sets.","PeriodicalId":153059,"journal":{"name":"ACM DEV '12","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM DEV '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2160601.2160603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

Systematic interviewer error is a potential issue in any health survey, and it can be especially pernicious in low- and middle-income countries, where survey teams may face problems of limited supervision, chaotic environments, language barriers, and low literacy. Survey teams in such environments could benefit from software that leverages mobile data collection tools to provide solutions for automated data quality control. As a first step in the creation of such software, we investigate and test several algorithms that find anomalous patterns in data. We validate the algorithms using one labeled data set and two unlabeled data sets from two community outreach programs in East Africa. In the labeled set, some of the data is known to be fabricated and some is believed to be relatively accurate. The unlabeled sets are from actual field operations. We demonstrate the feasibility of tools for automated data quality control by showing that the algorithms detect the fake data in the labeled set with a high sensitivity and specificity, and that they detect compelling anomalies in the unlabeled sets.
自动质量控制移动数据收集
在任何健康调查中,系统性访谈者错误都是一个潜在的问题,在低收入和中等收入国家尤其有害,因为在这些国家,调查小组可能面临监督有限、环境混乱、语言障碍和识字率低的问题。这种环境中的调查团队可以从利用移动数据收集工具的软件中受益,该软件为自动数据质量控制提供解决方案。作为创建此类软件的第一步,我们研究并测试了几种在数据中发现异常模式的算法。我们使用来自东非两个社区外展项目的一个标记数据集和两个未标记数据集来验证算法。在标记的集合中,一些数据已知是捏造的,而一些数据被认为是相对准确的。未标记的集合来自实际的现场操作。我们证明了自动化数据质量控制工具的可行性,表明算法以高灵敏度和特异性检测标记集中的虚假数据,并且它们检测未标记集中的引人注目的异常。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信