Enhancing Fish Counting in Sonar Images With Multitask Learning and Local–Global Feature Interaction

IF 4.3 2区 综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Yuhang Wang;Qunyong Wu;Shiyu Yang;Mengmeng Li;Keyue Wang;Xuanyu Chen
{"title":"Enhancing Fish Counting in Sonar Images With Multitask Learning and Local–Global Feature Interaction","authors":"Yuhang Wang;Qunyong Wu;Shiyu Yang;Mengmeng Li;Keyue Wang;Xuanyu Chen","doi":"10.1109/JSEN.2025.3562927","DOIUrl":null,"url":null,"abstract":"Accurate fish counting is crucial for environmental monitoring and management, with sonar imaging providing a nonintrusive way to gather data in previously inaccessible underwater environments. However, the strong visual similarity between fish and noise in sonar images presents significant challenges in achieving high counting accuracy. Existing methods rely on attention maps to emphasize fish regions but do not fully capture the discriminative features between fish and noise, limiting counting accuracy. To address this, we propose the local-global multitask transformer (LGMFormer) to enhance fish counting in sonar images. The model employs an encoder-decoder architecture, with density map regression as the primary task and multiclass semantic segmentation as an auxiliary task. By predicting multiclass segmentation maps, the shared network layers fully learn discriminative features between fish and noise. We also develop a segmentation-enhanced density head (SEDH) to further strengthen the connection between tasks. Within LGMFormer, the local-global feature interaction (LGFI) module is designed to fuse local spatial detail features with global correlation features for more precise fish counting. Additionally, a high-level feature guidance (HLFG) module is developed to retain more detail during the feature fusion process between the encoder and decoder. We also develop an automatic image segmentation method based on Otsu’s thresholding to create multiclass segmentation labels for sonar images. Extensive experiments on a public sonar fish-counting dataset demonstrate that LGMFormer outperforms state-of-the-art methods in counting accuracy, reducing the mean absolute error (MAE) of count prediction by 17.6% and the false recognition rate by 42.9%. The source code will be available at <uri>https://github.com/camerayuhang/LGMFormer</uri>","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 12","pages":"21775-21791"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10979195/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate fish counting is crucial for environmental monitoring and management, with sonar imaging providing a nonintrusive way to gather data in previously inaccessible underwater environments. However, the strong visual similarity between fish and noise in sonar images presents significant challenges in achieving high counting accuracy. Existing methods rely on attention maps to emphasize fish regions but do not fully capture the discriminative features between fish and noise, limiting counting accuracy. To address this, we propose the local-global multitask transformer (LGMFormer) to enhance fish counting in sonar images. The model employs an encoder-decoder architecture, with density map regression as the primary task and multiclass semantic segmentation as an auxiliary task. By predicting multiclass segmentation maps, the shared network layers fully learn discriminative features between fish and noise. We also develop a segmentation-enhanced density head (SEDH) to further strengthen the connection between tasks. Within LGMFormer, the local-global feature interaction (LGFI) module is designed to fuse local spatial detail features with global correlation features for more precise fish counting. Additionally, a high-level feature guidance (HLFG) module is developed to retain more detail during the feature fusion process between the encoder and decoder. We also develop an automatic image segmentation method based on Otsu’s thresholding to create multiclass segmentation labels for sonar images. Extensive experiments on a public sonar fish-counting dataset demonstrate that LGMFormer outperforms state-of-the-art methods in counting accuracy, reducing the mean absolute error (MAE) of count prediction by 17.6% and the false recognition rate by 42.9%. The source code will be available at https://github.com/camerayuhang/LGMFormer
多任务学习和局部-全局特征交互增强声纳图像中的鱼类计数
准确的鱼类计数对于环境监测和管理至关重要,声纳成像提供了一种非侵入式的方法,可以在以前无法进入的水下环境中收集数据。然而,在声纳图像中,鱼和噪声之间强烈的视觉相似性对实现高计数精度提出了重大挑战。现有的方法依赖于注意图来强调鱼类区域,但不能完全捕捉到鱼类和噪音之间的区别特征,限制了计数的准确性。为了解决这个问题,我们提出了局部-全局多任务转换器(LGMFormer)来增强声纳图像中的鱼类计数。该模型采用编码器-解码器架构,以密度图回归为主要任务,多类语义分割为辅助任务。通过预测多类分割图,共享网络层充分学习了鱼和噪声之间的判别特征。我们还开发了一个分段增强密度头(SEDH),以进一步加强任务之间的联系。在LGMFormer中,局部-全局特征交互(LGFI)模块旨在融合局部空间细节特征和全局相关特征,以实现更精确的鱼类计数。此外,开发了高级特征指导(HLFG)模块,以便在编码器和解码器之间的特征融合过程中保留更多细节。我们还开发了一种基于Otsu阈值的自动图像分割方法,为声纳图像创建多类分割标签。在公共声纳鱼类计数数据集上进行的大量实验表明,LGMFormer在计数精度方面优于最先进的方法,计数预测的平均绝对误差(MAE)降低了17.6%,错误识别率降低了42.9%。源代码可从https://github.com/camerayuhang/LGMFormer获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Sensors Journal
IEEE Sensors Journal 工程技术-工程:电子与电气
CiteScore
7.70
自引率
14.00%
发文量
2058
审稿时长
5.2 months
期刊介绍: The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信