Enhancing Fish Counting in Sonar Images With Multitask Learning and Local–Global Feature Interaction

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Journal Pub Date : 2025-04-28 DOI:10.1109/JSEN.2025.3562927

Yuhang Wang;Qunyong Wu;Shiyu Yang;Mengmeng Li;Keyue Wang;Xuanyu Chen

{"title":"Enhancing Fish Counting in Sonar Images With Multitask Learning and Local–Global Feature Interaction","authors":"Yuhang Wang;Qunyong Wu;Shiyu Yang;Mengmeng Li;Keyue Wang;Xuanyu Chen","doi":"10.1109/JSEN.2025.3562927","DOIUrl":null,"url":null,"abstract":"Accurate fish counting is crucial for environmental monitoring and management, with sonar imaging providing a nonintrusive way to gather data in previously inaccessible underwater environments. However, the strong visual similarity between fish and noise in sonar images presents significant challenges in achieving high counting accuracy. Existing methods rely on attention maps to emphasize fish regions but do not fully capture the discriminative features between fish and noise, limiting counting accuracy. To address this, we propose the local-global multitask transformer (LGMFormer) to enhance fish counting in sonar images. The model employs an encoder-decoder architecture, with density map regression as the primary task and multiclass semantic segmentation as an auxiliary task. By predicting multiclass segmentation maps, the shared network layers fully learn discriminative features between fish and noise. We also develop a segmentation-enhanced density head (SEDH) to further strengthen the connection between tasks. Within LGMFormer, the local-global feature interaction (LGFI) module is designed to fuse local spatial detail features with global correlation features for more precise fish counting. Additionally, a high-level feature guidance (HLFG) module is developed to retain more detail during the feature fusion process between the encoder and decoder. We also develop an automatic image segmentation method based on Otsu’s thresholding to create multiclass segmentation labels for sonar images. Extensive experiments on a public sonar fish-counting dataset demonstrate that LGMFormer outperforms state-of-the-art methods in counting accuracy, reducing the mean absolute error (MAE) of count prediction by 17.6% and the false recognition rate by 42.9%. The source code will be available at <uri>https://github.com/camerayuhang/LGMFormer</uri>","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 12","pages":"21775-21791"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10979195/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate fish counting is crucial for environmental monitoring and management, with sonar imaging providing a nonintrusive way to gather data in previously inaccessible underwater environments. However, the strong visual similarity between fish and noise in sonar images presents significant challenges in achieving high counting accuracy. Existing methods rely on attention maps to emphasize fish regions but do not fully capture the discriminative features between fish and noise, limiting counting accuracy. To address this, we propose the local-global multitask transformer (LGMFormer) to enhance fish counting in sonar images. The model employs an encoder-decoder architecture, with density map regression as the primary task and multiclass semantic segmentation as an auxiliary task. By predicting multiclass segmentation maps, the shared network layers fully learn discriminative features between fish and noise. We also develop a segmentation-enhanced density head (SEDH) to further strengthen the connection between tasks. Within LGMFormer, the local-global feature interaction (LGFI) module is designed to fuse local spatial detail features with global correlation features for more precise fish counting. Additionally, a high-level feature guidance (HLFG) module is developed to retain more detail during the feature fusion process between the encoder and decoder. We also develop an automatic image segmentation method based on Otsu’s thresholding to create multiclass segmentation labels for sonar images. Extensive experiments on a public sonar fish-counting dataset demonstrate that LGMFormer outperforms state-of-the-art methods in counting accuracy, reducing the mean absolute error (MAE) of count prediction by 17.6% and the false recognition rate by 42.9%. The source code will be available at https://github.com/camerayuhang/LGMFormer

查看原文本刊更多论文

多任务学习和局部-全局特征交互增强声纳图像中的鱼类计数

准确的鱼类计数对于环境监测和管理至关重要，声纳成像提供了一种非侵入式的方法，可以在以前无法进入的水下环境中收集数据。然而，在声纳图像中，鱼和噪声之间强烈的视觉相似性对实现高计数精度提出了重大挑战。现有的方法依赖于注意图来强调鱼类区域，但不能完全捕捉到鱼类和噪音之间的区别特征，限制了计数的准确性。为了解决这个问题，我们提出了局部-全局多任务转换器（LGMFormer）来增强声纳图像中的鱼类计数。该模型采用编码器-解码器架构，以密度图回归为主要任务，多类语义分割为辅助任务。通过预测多类分割图，共享网络层充分学习了鱼和噪声之间的判别特征。我们还开发了一个分段增强密度头（SEDH），以进一步加强任务之间的联系。在LGMFormer中，局部-全局特征交互（LGFI）模块旨在融合局部空间细节特征和全局相关特征，以实现更精确的鱼类计数。此外，开发了高级特征指导（HLFG）模块，以便在编码器和解码器之间的特征融合过程中保留更多细节。我们还开发了一种基于Otsu阈值的自动图像分割方法，为声纳图像创建多类分割标签。在公共声纳鱼类计数数据集上进行的大量实验表明，LGMFormer在计数精度方面优于最先进的方法，计数预测的平均绝对误差（MAE）降低了17.6%，错误识别率降低了42.9%。源代码可从https://github.com/camerayuhang/LGMFormer获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice