Increasing Trust in New Data Sources: Crowdsourcing Image Classification for Ecology

IF 1.8 3区数学 Q1 STATISTICS & PROBABILITY

International Statistical Review Pub Date : 2023-05-21 DOI:10.1111/insr.12542

Edgar Santos-Fernandez, Julie Vercelloni, Aiden Price, Grace Heron, Bryce Christensen, Erin E. Peterson, Kerrie Mengersen

{"title":"Increasing Trust in New Data Sources: Crowdsourcing Image Classification for Ecology","authors":"Edgar Santos-Fernandez, Julie Vercelloni, Aiden Price, Grace Heron, Bryce Christensen, Erin E. Peterson, Kerrie Mengersen","doi":"10.1111/insr.12542","DOIUrl":null,"url":null,"abstract":"<p>Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"92 1","pages":"43-61"},"PeriodicalIF":1.8000,"publicationDate":"2023-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12542","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/insr.12542","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.

Abstract Image

查看原文本刊更多论文

增加对新数据源的信任:生态众包图像分类

众包方法促进了非专家生产科学信息。这种形式的公民科学(CS)正在成为许多领域补充数据的关键来源，为数据驱动的决策提供信息，并研究具有挑战性的问题。然而，对这些数据有效性的担忧往往限制了它们的效用。在本文中，我们着重于利用公民科学数据来解决环境保护中的复杂挑战。我们从三个角度考虑这个问题。首先，我们提出了文献扫描的论文，已采用贝叶斯模型与公民科学在生态学。其次，我们比较了几种流行的多数投票算法，并引入了一个贝叶斯项目反应模型，该模型在调整了参与者分类图像的难度后，估计和解释了参与者的能力。该模型还允许参与者根据能力分组。第三，我们将该模型应用于一个案例研究中，该案例涉及澳大利亚大堡礁水下图像中的珊瑚分类。我们表明，该模型在一般情况下取得了优异的结果，对于困难的任务，仅使用专家组和经验丰富的参与者的加权共识方法产生了更好的绩效指标。此外，我们发现，参与者学习，因为他们有更多的分类机会，这大大提高了他们的能力随着时间的推移。总的来说，本文证明了当这些数据得到适当分析时，CS回答复杂和具有挑战性的生态问题的可行性。这是未来工作的动力，以提高这一新兴数据来源的有效性和可信度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Statistical Review 数学-统计学与概率论

CiteScore

4.30

自引率

5.00%

发文量

审稿时长

>12 weeks

期刊介绍： International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.