Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project

2011 IEEE Seventh International Conference on e-Science Workshops Pub Date : 2011-12-05 DOI:10.1109/eScienceW.2011.13

S. Kelling, Jun Yu, Jeff Gerbracht, Weng-Keen Wong

{"title":"Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project","authors":"S. Kelling, Jun Yu, Jeff Gerbracht, Weng-Keen Wong","doi":"10.1109/eScienceW.2011.13","DOIUrl":null,"url":null,"abstract":"Research projects that use the efforts of volunteers (â€œcitizen scientistsâ€) to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Seventh International Conference on e-Science Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScienceW.2011.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Research projects that use the efforts of volunteers (â€œcitizen scientistsâ€) to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.

查看原文本刊更多论文

紧急过滤器:大规模公民科学项目中的自动数据验证

利用志愿者( - œcitizen科学家)的努力收集生物发生数据的研究项目必须解决观察者可变性和物种错误识别的问题。虽然公民科学项目可以吸引大量志愿者来收集大量数据，但它们很容易包含报告错误。eBird是一个公民科学项目，吸引了成千上万的志愿者来收集鸟类观测数据，我们在这个项目上的经验表明，志愿者专家需要付出巨大的努力来筛选数据，识别异常值，并在数据库中标记它们。但是eBird收集的数据量越来越大，给这些志愿者专家带来了巨大的负担。为了最大限度地减少这种人工工作，我们探索了是否可以使用先前收集的eBird数据来创建从数据中产生的自动质量过滤器。我们通过两个步骤来完成这个过程。首先，基于数据的方法检测异常值(即，在一年中给定区域和一周的异常观测值)。接下来，使用一种新的机器学习方法来估计观察者的专业知识，以决定是否应该标记不寻常的观察。我们的初步发现表明，这种自动化过程可靠地识别异常值，并准确地将它们分类为错误或代表潜在有价值的观察结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Seventh International Conference on e-Science Workshops

自引率

0.00%

发文量