{"title":"Enhancing Fish Counting in Sonar Images With Multitask Learning and Local–Global Feature Interaction","authors":"Yuhang Wang;Qunyong Wu;Shiyu Yang;Mengmeng Li;Keyue Wang;Xuanyu Chen","doi":"10.1109/JSEN.2025.3562927","DOIUrl":null,"url":null,"abstract":"Accurate fish counting is crucial for environmental monitoring and management, with sonar imaging providing a nonintrusive way to gather data in previously inaccessible underwater environments. However, the strong visual similarity between fish and noise in sonar images presents significant challenges in achieving high counting accuracy. Existing methods rely on attention maps to emphasize fish regions but do not fully capture the discriminative features between fish and noise, limiting counting accuracy. To address this, we propose the local-global multitask transformer (LGMFormer) to enhance fish counting in sonar images. The model employs an encoder-decoder architecture, with density map regression as the primary task and multiclass semantic segmentation as an auxiliary task. By predicting multiclass segmentation maps, the shared network layers fully learn discriminative features between fish and noise. We also develop a segmentation-enhanced density head (SEDH) to further strengthen the connection between tasks. Within LGMFormer, the local-global feature interaction (LGFI) module is designed to fuse local spatial detail features with global correlation features for more precise fish counting. Additionally, a high-level feature guidance (HLFG) module is developed to retain more detail during the feature fusion process between the encoder and decoder. We also develop an automatic image segmentation method based on Otsu’s thresholding to create multiclass segmentation labels for sonar images. Extensive experiments on a public sonar fish-counting dataset demonstrate that LGMFormer outperforms state-of-the-art methods in counting accuracy, reducing the mean absolute error (MAE) of count prediction by 17.6% and the false recognition rate by 42.9%. The source code will be available at <uri>https://github.com/camerayuhang/LGMFormer</uri>","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 12","pages":"21775-21791"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10979195/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate fish counting is crucial for environmental monitoring and management, with sonar imaging providing a nonintrusive way to gather data in previously inaccessible underwater environments. However, the strong visual similarity between fish and noise in sonar images presents significant challenges in achieving high counting accuracy. Existing methods rely on attention maps to emphasize fish regions but do not fully capture the discriminative features between fish and noise, limiting counting accuracy. To address this, we propose the local-global multitask transformer (LGMFormer) to enhance fish counting in sonar images. The model employs an encoder-decoder architecture, with density map regression as the primary task and multiclass semantic segmentation as an auxiliary task. By predicting multiclass segmentation maps, the shared network layers fully learn discriminative features between fish and noise. We also develop a segmentation-enhanced density head (SEDH) to further strengthen the connection between tasks. Within LGMFormer, the local-global feature interaction (LGFI) module is designed to fuse local spatial detail features with global correlation features for more precise fish counting. Additionally, a high-level feature guidance (HLFG) module is developed to retain more detail during the feature fusion process between the encoder and decoder. We also develop an automatic image segmentation method based on Otsu’s thresholding to create multiclass segmentation labels for sonar images. Extensive experiments on a public sonar fish-counting dataset demonstrate that LGMFormer outperforms state-of-the-art methods in counting accuracy, reducing the mean absolute error (MAE) of count prediction by 17.6% and the false recognition rate by 42.9%. The source code will be available at https://github.com/camerayuhang/LGMFormer
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice