S. Kelling, Jun Yu, Jeff Gerbracht, Weng-Keen Wong
{"title":"Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project","authors":"S. Kelling, Jun Yu, Jeff Gerbracht, Weng-Keen Wong","doi":"10.1109/eScienceW.2011.13","DOIUrl":null,"url":null,"abstract":"Research projects that use the efforts of volunteers (“citizen scientistsâ€) to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Seventh International Conference on e-Science Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScienceW.2011.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Research projects that use the efforts of volunteers (“citizen scientistsâ€) to collect data on organism occurrence must address issues of observer variability and species misidentification. While citizen science projects can engage a very large number of volunteers to collect volumes of data, they are prone to contain reporting errors. Our experience with eBird, a citizen science project that engages tens of thousands of volunteers to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. But the increasing volume of data being collected by eBird places a huge burden on these volunteer experts. In order to minimize this human effort, we explored whether previously collected eBird data can be used to create automated quality filters that emerge from the data. We do this through a two-step process. First a data-based method detects outliers (i.e., observations that are unusual for a given region and week of the year). Next, a novel machine learning method that estimates observer expertise is used to decide if the unusual observation should be flagged or not. Our preliminary findings indicate that this automated process reliably identifies outliers and accurately classifies them as either an error or represents a potentially valuable observation.