Tobias Schanz, Klas Ove Möller, S. Rühl, D. S. Greenberg
{"title":"基于无标签图像特征学习和概率校准的海洋生物鲁棒检测","authors":"Tobias Schanz, Klas Ove Möller, S. Rühl, D. S. Greenberg","doi":"10.1088/2632-2153/ace417","DOIUrl":null,"url":null,"abstract":"Advances in in situ marine life imaging have significantly increased the size and quality of available datasets, but automatic image analysis has not kept pace. Machine learning has shown promise for image processing, but its effectiveness is limited by several open challenges: the requirement for large expert-labeled training datasets, disagreement among experts, under-representation of various species and unreliable or overconfident predictions. To overcome these obstacles for automated underwater imaging, we combine and test recent developments in deep classifier networks and self-supervised feature learning. We use unlabeled images for pretraining deep neural networks to extract task-relevant image features, allowing learning algorithms to cope with scarcity in expert labels, and carefully evaluate performance in subsequent label-based tasks. Performance on rare classes is improved by applying data rebalancing together with a Bayesian correction to avoid biasing inferred in situ class frequencies. A divergence-based loss allows training on multiple, conflicting labels for the same image, leading to better estimates of uncertainty which we quantify with a novel accuracy measure. Together, these techniques can reduce the required label counts ∼100-fold while maintaining the accuracy of standard supervised training, shorten training time, cope with expert disagreement and reduce overconfidence.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust detection of marine life with label-free image feature learning and probability calibration\",\"authors\":\"Tobias Schanz, Klas Ove Möller, S. Rühl, D. S. Greenberg\",\"doi\":\"10.1088/2632-2153/ace417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advances in in situ marine life imaging have significantly increased the size and quality of available datasets, but automatic image analysis has not kept pace. Machine learning has shown promise for image processing, but its effectiveness is limited by several open challenges: the requirement for large expert-labeled training datasets, disagreement among experts, under-representation of various species and unreliable or overconfident predictions. To overcome these obstacles for automated underwater imaging, we combine and test recent developments in deep classifier networks and self-supervised feature learning. We use unlabeled images for pretraining deep neural networks to extract task-relevant image features, allowing learning algorithms to cope with scarcity in expert labels, and carefully evaluate performance in subsequent label-based tasks. Performance on rare classes is improved by applying data rebalancing together with a Bayesian correction to avoid biasing inferred in situ class frequencies. A divergence-based loss allows training on multiple, conflicting labels for the same image, leading to better estimates of uncertainty which we quantify with a novel accuracy measure. Together, these techniques can reduce the required label counts ∼100-fold while maintaining the accuracy of standard supervised training, shorten training time, cope with expert disagreement and reduce overconfidence.\",\"PeriodicalId\":33757,\"journal\":{\"name\":\"Machine Learning Science and Technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2023-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Learning Science and Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1088/2632-2153/ace417\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/ace417","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Robust detection of marine life with label-free image feature learning and probability calibration
Advances in in situ marine life imaging have significantly increased the size and quality of available datasets, but automatic image analysis has not kept pace. Machine learning has shown promise for image processing, but its effectiveness is limited by several open challenges: the requirement for large expert-labeled training datasets, disagreement among experts, under-representation of various species and unreliable or overconfident predictions. To overcome these obstacles for automated underwater imaging, we combine and test recent developments in deep classifier networks and self-supervised feature learning. We use unlabeled images for pretraining deep neural networks to extract task-relevant image features, allowing learning algorithms to cope with scarcity in expert labels, and carefully evaluate performance in subsequent label-based tasks. Performance on rare classes is improved by applying data rebalancing together with a Bayesian correction to avoid biasing inferred in situ class frequencies. A divergence-based loss allows training on multiple, conflicting labels for the same image, leading to better estimates of uncertainty which we quantify with a novel accuracy measure. Together, these techniques can reduce the required label counts ∼100-fold while maintaining the accuracy of standard supervised training, shorten training time, cope with expert disagreement and reduce overconfidence.
期刊介绍:
Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.