C. Chen, S. P. Kyathanahally, M. Reyes, S. Merkli, E. Merz, E. Francazi, M. Hoege, F. Pomati, M. Baity-Jesi
{"title":"生产浮游生物分类器,对数据集的变化是稳健的","authors":"C. Chen, S. P. Kyathanahally, M. Reyes, S. Merkli, E. Merz, E. Francazi, M. Hoege, F. Pomati, M. Baity-Jesi","doi":"10.1002/lom3.10659","DOIUrl":null,"url":null,"abstract":"<p>Modern plankton high-throughput monitoring relies on deep learning classifiers for species recognition in water ecosystems. Despite satisfactory nominal performances, a significant challenge arises from dataset shift, which causes performances to drop during deployment. In our study, we integrate the ZooLake dataset, which consists of dark-field images of lake plankton (Kyathanahally et al. 2021a), with manually annotated images from 10 independent days of deployment, serving as <i>test cells</i> to benchmark out-of-dataset (OOD) performances. Our analysis reveals instances where classifiers, initially performing well in in-dataset conditions, encounter notable failures in practical scenarios. For example, a MobileNet with a 92% nominal test accuracy shows a 77% OOD accuracy. We systematically investigate conditions leading to OOD performance drops and propose a preemptive assessment method to identify potential pitfalls when classifying new data, and pinpoint features in OOD images that adversely impact classification. We present a three-step pipeline: (i) identifying OOD degradation compared to nominal test performance, (ii) conducting a diagnostic analysis of degradation causes, and (iii) providing solutions. We find that ensembles of BEiT vision transformers, with targeted augmentations addressing OOD robustness, geometric ensembling, and rotation-based test-time augmentation, constitute the most robust model, which we call <i>BEsT</i>. It achieves an 83% OOD accuracy, with errors concentrated on container classes. Moreover, it exhibits lower sensitivity to dataset shift, and reproduces well the plankton abundances. Our proposed pipeline is applicable to generic plankton classifiers, contingent on the availability of suitable test cells. By identifying critical shortcomings and offering practical procedures to fortify models against dataset shift, our study contributes to the development of more reliable plankton classification technologies.</p>","PeriodicalId":18145,"journal":{"name":"Limnology and Oceanography: Methods","volume":"23 1","pages":"39-66"},"PeriodicalIF":2.1000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Producing plankton classifiers that are robust to dataset shift\",\"authors\":\"C. Chen, S. P. Kyathanahally, M. Reyes, S. Merkli, E. Merz, E. Francazi, M. Hoege, F. Pomati, M. Baity-Jesi\",\"doi\":\"10.1002/lom3.10659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Modern plankton high-throughput monitoring relies on deep learning classifiers for species recognition in water ecosystems. Despite satisfactory nominal performances, a significant challenge arises from dataset shift, which causes performances to drop during deployment. In our study, we integrate the ZooLake dataset, which consists of dark-field images of lake plankton (Kyathanahally et al. 2021a), with manually annotated images from 10 independent days of deployment, serving as <i>test cells</i> to benchmark out-of-dataset (OOD) performances. Our analysis reveals instances where classifiers, initially performing well in in-dataset conditions, encounter notable failures in practical scenarios. For example, a MobileNet with a 92% nominal test accuracy shows a 77% OOD accuracy. We systematically investigate conditions leading to OOD performance drops and propose a preemptive assessment method to identify potential pitfalls when classifying new data, and pinpoint features in OOD images that adversely impact classification. We present a three-step pipeline: (i) identifying OOD degradation compared to nominal test performance, (ii) conducting a diagnostic analysis of degradation causes, and (iii) providing solutions. We find that ensembles of BEiT vision transformers, with targeted augmentations addressing OOD robustness, geometric ensembling, and rotation-based test-time augmentation, constitute the most robust model, which we call <i>BEsT</i>. It achieves an 83% OOD accuracy, with errors concentrated on container classes. Moreover, it exhibits lower sensitivity to dataset shift, and reproduces well the plankton abundances. Our proposed pipeline is applicable to generic plankton classifiers, contingent on the availability of suitable test cells. By identifying critical shortcomings and offering practical procedures to fortify models against dataset shift, our study contributes to the development of more reliable plankton classification technologies.</p>\",\"PeriodicalId\":18145,\"journal\":{\"name\":\"Limnology and Oceanography: Methods\",\"volume\":\"23 1\",\"pages\":\"39-66\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Limnology and Oceanography: Methods\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/lom3.10659\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"LIMNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Limnology and Oceanography: Methods","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/lom3.10659","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"LIMNOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
现代浮游生物高通量监测依赖于深度学习分类器在水生态系统中的物种识别。尽管名义上的性能令人满意,但数据集迁移带来了重大挑战,这会导致部署期间性能下降。在我们的研究中,我们整合了ZooLake数据集,其中包括湖泊浮游生物的暗场图像(Kyathanahally et al. 2021a),以及来自10个独立部署日的手动注释图像,作为基准测试数据集外(OOD)性能的测试单元。我们的分析揭示了分类器最初在数据集条件下表现良好的实例,在实际场景中遇到明显的故障。例如,具有92%标称测试精度的MobileNet显示77%的OOD精度。我们系统地研究了导致OOD性能下降的条件,并提出了一种先发制人的评估方法,以识别分类新数据时的潜在陷阱,并确定OOD图像中对分类产生不利影响的特征。我们提出了一个三步流程:(i)识别与标称测试性能相比的OOD降解,(ii)对降解原因进行诊断分析,(iii)提供解决方案。我们发现,具有针对OOD鲁棒性的增强、几何集成和基于旋转的测试时间增强的BEiT视觉变压器的集成构成了最鲁棒的模型,我们称之为BEsT。它达到了83%的OOD准确率,错误集中在容器类上。此外,它对数据集移位的敏感性较低,并能很好地再现浮游生物丰度。我们建议的管道适用于一般的浮游生物分类器,取决于合适的测试细胞的可用性。通过识别关键缺陷并提供实用程序来加强模型对抗数据集转移,我们的研究有助于开发更可靠的浮游生物分类技术。
Producing plankton classifiers that are robust to dataset shift
Modern plankton high-throughput monitoring relies on deep learning classifiers for species recognition in water ecosystems. Despite satisfactory nominal performances, a significant challenge arises from dataset shift, which causes performances to drop during deployment. In our study, we integrate the ZooLake dataset, which consists of dark-field images of lake plankton (Kyathanahally et al. 2021a), with manually annotated images from 10 independent days of deployment, serving as test cells to benchmark out-of-dataset (OOD) performances. Our analysis reveals instances where classifiers, initially performing well in in-dataset conditions, encounter notable failures in practical scenarios. For example, a MobileNet with a 92% nominal test accuracy shows a 77% OOD accuracy. We systematically investigate conditions leading to OOD performance drops and propose a preemptive assessment method to identify potential pitfalls when classifying new data, and pinpoint features in OOD images that adversely impact classification. We present a three-step pipeline: (i) identifying OOD degradation compared to nominal test performance, (ii) conducting a diagnostic analysis of degradation causes, and (iii) providing solutions. We find that ensembles of BEiT vision transformers, with targeted augmentations addressing OOD robustness, geometric ensembling, and rotation-based test-time augmentation, constitute the most robust model, which we call BEsT. It achieves an 83% OOD accuracy, with errors concentrated on container classes. Moreover, it exhibits lower sensitivity to dataset shift, and reproduces well the plankton abundances. Our proposed pipeline is applicable to generic plankton classifiers, contingent on the availability of suitable test cells. By identifying critical shortcomings and offering practical procedures to fortify models against dataset shift, our study contributes to the development of more reliable plankton classification technologies.
期刊介绍:
Limnology and Oceanography: Methods (ISSN 1541-5856) is a companion to ASLO''s top-rated journal Limnology and Oceanography, and articles are held to the same high standards. In order to provide the most rapid publication consistent with high standards, Limnology and Oceanography: Methods appears in electronic format only, and the entire submission and review system is online. Articles are posted as soon as they are accepted and formatted for publication.
Limnology and Oceanography: Methods will consider manuscripts whose primary focus is methodological, and that deal with problems in the aquatic sciences. Manuscripts may present new measurement equipment, techniques for analyzing observations or samples, methods for understanding and interpreting information, analyses of metadata to examine the effectiveness of approaches, invited and contributed reviews and syntheses, and techniques for communicating and teaching in the aquatic sciences.