Hugo Resende , Álvaro L. Fazenda , Fábio A.M. Cappabianco , Fabio A. Faria
{"title":"提高公民科学运动数据的可靠性,用于热带森林的森林砍伐检测","authors":"Hugo Resende , Álvaro L. Fazenda , Fábio A.M. Cappabianco , Fabio A. Faria","doi":"10.1016/j.future.2025.108081","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, citizen science (CS) campaigns leveraging crowdsourcing have proven effective in generating large datasets across various fields such as environmental monitoring, and astronomy. However, the quality of volunteer-contributed data remains a challenge, as inconsistent responses often arise from inattentiveness and rapid analyses. To increase reliability in the generation of labeled datasets in citizen science campaigns, this paper proposes the combination of outlier detection techniques (Z-Score, Tukey and Median Absolute Deviation) to remove unreliable voluntary contributions, followed by exclusion of tasks with high Shannon entropy, that is, without consensus of volunteers. To validate this methodology, a case study was conducted using three CS campaigns from the ForestEyes project, which employs citizen science and machine learning to detect deforested areas. The results showed that applying those statistical techniques to filter contributions based on response time of the volunteers joining with median entropy filter led to a growth of up to 20 % of accuracy in campaigns, highlighting the importance of integrating statistical techniques and variability to improve the CS results.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108081"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Increasing the reliability of citizen science campaign data for deforestation detection in tropical forests\",\"authors\":\"Hugo Resende , Álvaro L. Fazenda , Fábio A.M. Cappabianco , Fabio A. Faria\",\"doi\":\"10.1016/j.future.2025.108081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, citizen science (CS) campaigns leveraging crowdsourcing have proven effective in generating large datasets across various fields such as environmental monitoring, and astronomy. However, the quality of volunteer-contributed data remains a challenge, as inconsistent responses often arise from inattentiveness and rapid analyses. To increase reliability in the generation of labeled datasets in citizen science campaigns, this paper proposes the combination of outlier detection techniques (Z-Score, Tukey and Median Absolute Deviation) to remove unreliable voluntary contributions, followed by exclusion of tasks with high Shannon entropy, that is, without consensus of volunteers. To validate this methodology, a case study was conducted using three CS campaigns from the ForestEyes project, which employs citizen science and machine learning to detect deforested areas. The results showed that applying those statistical techniques to filter contributions based on response time of the volunteers joining with median entropy filter led to a growth of up to 20 % of accuracy in campaigns, highlighting the importance of integrating statistical techniques and variability to improve the CS results.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"175 \",\"pages\":\"Article 108081\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25003759\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25003759","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Increasing the reliability of citizen science campaign data for deforestation detection in tropical forests
In recent years, citizen science (CS) campaigns leveraging crowdsourcing have proven effective in generating large datasets across various fields such as environmental monitoring, and astronomy. However, the quality of volunteer-contributed data remains a challenge, as inconsistent responses often arise from inattentiveness and rapid analyses. To increase reliability in the generation of labeled datasets in citizen science campaigns, this paper proposes the combination of outlier detection techniques (Z-Score, Tukey and Median Absolute Deviation) to remove unreliable voluntary contributions, followed by exclusion of tasks with high Shannon entropy, that is, without consensus of volunteers. To validate this methodology, a case study was conducted using three CS campaigns from the ForestEyes project, which employs citizen science and machine learning to detect deforested areas. The results showed that applying those statistical techniques to filter contributions based on response time of the volunteers joining with median entropy filter led to a growth of up to 20 % of accuracy in campaigns, highlighting the importance of integrating statistical techniques and variability to improve the CS results.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.