Hongsheng Bi, Yunhao Cheng, Xuemin Cheng, Mark C. Benfield, David G. Kimmel, Haiyong Zheng, Sabrina Groves, Kezhen Ying
{"title":"Taming the data deluge: A novel end-to-end deep learning system for classifying marine biological and environmental images","authors":"Hongsheng Bi, Yunhao Cheng, Xuemin Cheng, Mark C. Benfield, David G. Kimmel, Haiyong Zheng, Sabrina Groves, Kezhen Ying","doi":"10.1002/lom3.10591","DOIUrl":null,"url":null,"abstract":"<p>Underwater imaging enables nondestructive plankton sampling at frequencies, durations, and resolutions unattainable by traditional methods. These systems necessitate automated processes to identify organisms efficiently. Early underwater image processing used a standard approach: binarizing images to segment targets, then integrating deep learning models for classification. While intuitive, this infrastructure has limitations in handling high concentrations of biotic and abiotic particles, rapid changes in dominant taxa, and highly variable target sizes. To address these challenges, we introduce a new framework that starts with a scene classifier to capture large within-image variation, such as disparities in the layout of particles and dominant taxa. After scene classification, scene-specific Mask regional convolutional neural network (Mask R-CNN) models are trained to separate target objects into different groups. The procedure allows information to be extracted from different image types, while minimizing potential bias for commonly occurring features. Using in situ coastal plankton images, we compared the scene-specific models to the Mask R-CNN model encompassing all scene categories as a single full model. Results showed that the scene-specific approach outperformed the full model by achieving a 20% accuracy improvement in complex noisy images. The full model yielded counts that were up to 78% lower than those enumerated by the scene-specific model for some small-sized plankton groups. We further tested the framework on images from a benthic video camera and an imaging sonar system with good results. The integration of scene classification, which groups similar images together, can improve the accuracy of detection and classification for complex marine biological images.</p>","PeriodicalId":18145,"journal":{"name":"Limnology and Oceanography: Methods","volume":"22 1","pages":"47-64"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Limnology and Oceanography: Methods","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/lom3.10591","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"LIMNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Underwater imaging enables nondestructive plankton sampling at frequencies, durations, and resolutions unattainable by traditional methods. These systems necessitate automated processes to identify organisms efficiently. Early underwater image processing used a standard approach: binarizing images to segment targets, then integrating deep learning models for classification. While intuitive, this infrastructure has limitations in handling high concentrations of biotic and abiotic particles, rapid changes in dominant taxa, and highly variable target sizes. To address these challenges, we introduce a new framework that starts with a scene classifier to capture large within-image variation, such as disparities in the layout of particles and dominant taxa. After scene classification, scene-specific Mask regional convolutional neural network (Mask R-CNN) models are trained to separate target objects into different groups. The procedure allows information to be extracted from different image types, while minimizing potential bias for commonly occurring features. Using in situ coastal plankton images, we compared the scene-specific models to the Mask R-CNN model encompassing all scene categories as a single full model. Results showed that the scene-specific approach outperformed the full model by achieving a 20% accuracy improvement in complex noisy images. The full model yielded counts that were up to 78% lower than those enumerated by the scene-specific model for some small-sized plankton groups. We further tested the framework on images from a benthic video camera and an imaging sonar system with good results. The integration of scene classification, which groups similar images together, can improve the accuracy of detection and classification for complex marine biological images.
期刊介绍:
Limnology and Oceanography: Methods (ISSN 1541-5856) is a companion to ASLO''s top-rated journal Limnology and Oceanography, and articles are held to the same high standards. In order to provide the most rapid publication consistent with high standards, Limnology and Oceanography: Methods appears in electronic format only, and the entire submission and review system is online. Articles are posted as soon as they are accepted and formatted for publication.
Limnology and Oceanography: Methods will consider manuscripts whose primary focus is methodological, and that deal with problems in the aquatic sciences. Manuscripts may present new measurement equipment, techniques for analyzing observations or samples, methods for understanding and interpreting information, analyses of metadata to examine the effectiveness of approaches, invited and contributed reviews and syntheses, and techniques for communicating and teaching in the aquatic sciences.