Conformal taxonomic validation: A semi-automated validation framework for citizen science records

IF 5.8 2区 环境科学与生态学 Q1 ECOLOGY
Matthieu de Castelbajac , Sandra Bringay , Arnaud Sallaberry , Maximilien Servajean , Clémence Epinoux , Juan Carlos Molinero , Delphine Bonnet
{"title":"Conformal taxonomic validation: A semi-automated validation framework for citizen science records","authors":"Matthieu de Castelbajac ,&nbsp;Sandra Bringay ,&nbsp;Arnaud Sallaberry ,&nbsp;Maximilien Servajean ,&nbsp;Clémence Epinoux ,&nbsp;Juan Carlos Molinero ,&nbsp;Delphine Bonnet","doi":"10.1016/j.ecoinf.2025.103290","DOIUrl":null,"url":null,"abstract":"<div><div>Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scientific research. Validating large volumes of citizen science data remains an important challenge. In this paper, we present a semi-automated validation framework that combines a deep learning classifier with conformal prediction to generate sets of plausible taxonomic labels at multiple ranks, while providing rigorous control over prediction confidence. Extensive evaluation was carried out using 25,000 jellyfish records, both with and without prior validation, as well as against 800 expert-validated entries. Our results show that the method frequently produces singleton prediction sets that can be accepted automatically, offering a high-confidence and scalable solution for validating marine citizen science data.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103290"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125002997","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scientific research. Validating large volumes of citizen science data remains an important challenge. In this paper, we present a semi-automated validation framework that combines a deep learning classifier with conformal prediction to generate sets of plausible taxonomic labels at multiple ranks, while providing rigorous control over prediction confidence. Extensive evaluation was carried out using 25,000 jellyfish records, both with and without prior validation, as well as against 800 expert-validated entries. Our results show that the method frequently produces singleton prediction sets that can be accepted automatically, offering a high-confidence and scalable solution for validating marine citizen science data.
适形分类验证:公民科学记录的半自动验证框架
公民科学记录是海洋生物多样性数据的宝贵来源,特别是在标准化采样活动在空间或时间范围有限的情况下。然而,这些记录往往包含偏差和错误,通常需要专家验证才能可靠地支持科学研究。验证大量的公民科学数据仍然是一个重要的挑战。在本文中,我们提出了一个半自动验证框架,该框架将深度学习分类器与保形预测相结合,在多个等级上生成合理的分类标签集,同时提供对预测置信度的严格控制。对25000条水母记录进行了广泛的评估,包括有和没有事先验证的记录,以及800条专家验证的记录。我们的研究结果表明,该方法经常产生可自动接受的单例预测集,为验证海洋公民科学数据提供了高置信度和可扩展的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ecological Informatics
Ecological Informatics 环境科学-生态学
CiteScore
8.30
自引率
11.80%
发文量
346
审稿时长
46 days
期刊介绍: The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信