Signal Processing-Image Communication最新文献

筛选
英文 中文
Sparse modeling for image inpainting: A multi-scale morphological patch-based k-SVD and group-based PCA 图像绘制的稀疏建模:基于多尺度形态学斑块的k-SVD和基于分组的PCA
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-05-15 DOI: 10.1016/j.image.2025.117341
Amit Soni Arya, Susanta Mukhopadhyay
{"title":"Sparse modeling for image inpainting: A multi-scale morphological patch-based k-SVD and group-based PCA","authors":"Amit Soni Arya,&nbsp;Susanta Mukhopadhyay","doi":"10.1016/j.image.2025.117341","DOIUrl":"10.1016/j.image.2025.117341","url":null,"abstract":"<div><div>Image inpainting, a crucial task in image restoration, aims to reconstruct highly degraded images with missing pixels while preserving structural and textural integrity. Traditional patch-based and group-based sparse representation methods often struggle with visual artifacts and over-smoothing, limiting their effectiveness. To address these challenges, we propose a novel multi-scale morphological patch-based and group-based sparse representation learning approach for image inpainting. Our method enhances image inpainting by integrating morphological patch-based sparse representation (M-PSR) learning using k-singular value decomposition (k-SVD) and group-based sparse representation using principal component analysis (PCA) to construct adaptive dictionaries for improved reconstruction accuracy. Additionally, we employ the alternating direction method of multipliers (ADMM) to optimize the integration of morphological patch and group based sparse representations, enhancing restoration quality. Extensive experiments on various degraded images demonstrate that our approach outperforms state-of-the-art methods in terms of the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM). The proposed method effectively reconstructs images corrupted by missing pixels, scratches, and text inlays, achieving superior structural coherence and perceptual quality. This work contributes a robust and efficient solution for image inpainting, offering significant advances in sparse modeling and morphological image processing.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117341"},"PeriodicalIF":3.4,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial expression transformation for anime-style image based on decoder control and attention mask 基于解码器控制和注意面具的动画风格图像面部表情变换
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-05-06 DOI: 10.1016/j.image.2025.117343
Xinhao Rao , Weidong Min , Ziyang Deng , Mengxue Liu
{"title":"Facial expression transformation for anime-style image based on decoder control and attention mask","authors":"Xinhao Rao ,&nbsp;Weidong Min ,&nbsp;Ziyang Deng ,&nbsp;Mengxue Liu","doi":"10.1016/j.image.2025.117343","DOIUrl":"10.1016/j.image.2025.117343","url":null,"abstract":"<div><div>Human facial expression transformation has been extensively studied using Generative Adversarial Networks (GANs) recently. GANs have also shown successful attempts in transforming anime-style images. However, current methods for anime pictures fail to refine the expression control efficiently, leading to control effects weaker than expected. Moreover, it remains challenging to maintain the original anime face identity information while transforming. To address these issues, we propose an expression transformation method for anime-style images. In order to enhance the control effect of discrete emoticon tags, a mapping network is proposed to map them to high-dimensional control information, which is then injected into the network multiple times during transformation. Additionally, for better maintaining the anime face identity information while transforming, an integrated attention mask mechanism is introduced to enable the network's expression control to focus on the expression-related features, while avoiding affecting the unrelated features. Finally, we conduct a large number of experiments to verify the validity of the proposed method, and both quantitative and qualitative evaluations are carried out. The results demonstrate the superiority of our proposed method compared to existing methods based on multi-domain image-to-image translation.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117343"},"PeriodicalIF":3.4,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view contrastive learning for unsupervised 3D model retrieval and classification 无监督三维模型检索与分类的多视图对比学习
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-05-06 DOI: 10.1016/j.image.2025.117333
Wenhui Li , Zhenghao Fang , Dan Song , Weizhi Nie , Xuanya Li , An-An Liu
{"title":"Multi-view contrastive learning for unsupervised 3D model retrieval and classification","authors":"Wenhui Li ,&nbsp;Zhenghao Fang ,&nbsp;Dan Song ,&nbsp;Weizhi Nie ,&nbsp;Xuanya Li ,&nbsp;An-An Liu","doi":"10.1016/j.image.2025.117333","DOIUrl":"10.1016/j.image.2025.117333","url":null,"abstract":"<div><div>Unsupervised 3D model retrieval and classification have attracted a lot of attention due to wide applications. Although much progress has been achieved, they remain challenging due to the lack of supervised information to optimize neural network learning. Existing unsupervised methods usually utilized clustering algorithms to generate pseudo labels for 3D models. However, the clustering algorithms cannot fully mine the multi-view structure information and misguide the unsupervised learning process due to the noise information. To cope with the above limitation, this paper proposes a Multi-View Contrastive Learning (MVCL) method, which fully takes advantage of multi-view structure information to optimize the neural network. Specifically, we propose a multi-view grouping scheme and multi-view contrastive learning scheme to mine the self-supervised information and learn discriminative feature representation. The multi-view grouping scheme divides the multiple views of each 3D model into two groups and minimizes the group-level difference, which facilitates exploring the internal characteristics of 3D structural information. To learn the relationships among multiple views in an unsupervised manner, we propose a two-stream asymmetrical framework including the main network and the subsidiary network to guarantee the discrimination of the learned feature. Extensive 3D model retrieval and classification experiments are conducted on two challenging datasets, demonstrating the superiority of this method.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117333"},"PeriodicalIF":3.4,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143917401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive structural compensation enhancement based on multi-scale illumination estimation 基于多尺度光照估计的自适应结构补偿增强
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-05-05 DOI: 10.1016/j.image.2025.117332
Yong Luo , Qiuming Liu , Xuejing Jiang , Le Qin , Zhenzhen Luo
{"title":"Adaptive structural compensation enhancement based on multi-scale illumination estimation","authors":"Yong Luo ,&nbsp;Qiuming Liu ,&nbsp;Xuejing Jiang ,&nbsp;Le Qin ,&nbsp;Zhenzhen Luo","doi":"10.1016/j.image.2025.117332","DOIUrl":"10.1016/j.image.2025.117332","url":null,"abstract":"<div><div>In real-world scenes, lighting conditions are often variable and uncontrollable, such as non-uniform lighting, low lighting, and overexposure. These uncontrolled lighting conditions can degrade image quality and visibility. However, the majority of existing image enhancement techniques are typically designed for specific lighting conditions. Consequently, when applied to images in uncontrolled lighting, these techniques are prone to result in insufficient visibility, distortion, overexposure, and even information loss. In this paper, to address the limitations of existing methods, we introduce an innovative and effective method for enhancing uncontrolled lighting images through adaptive structural compensation. Firstly, a joint filtering algorithm for illumination estimation is developed to effectively mitigate texture, edge and noise interference during illumination estimation. Secondly, we developed a multi-scale illumination estimation algorithm for the purpose of constructing a structural compensation map. This map is then used to control brightness compensation for different areas of the image. Finally, a two-stage compensation fusion strategy is designed to adaptively reconstruct the brightness distribution and effectively improve image visibility. Extensive experimental results indicate that our method outperforms some state-of-the-art approaches in improving the quality and visibility of images under uncontrolled lighting conditions.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117332"},"PeriodicalIF":3.4,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143917400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information disentanglement for unsupervised domain adaptive Oracle Bone Inscriptions detection 无监督域自适应甲骨文检测的信息解纠缠
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-05-01 DOI: 10.1016/j.image.2025.117334
Feng Gao , Yongge Liu , Deng Li , Xu Chen , Runhua Jiang , Yahong Han
{"title":"Information disentanglement for unsupervised domain adaptive Oracle Bone Inscriptions detection","authors":"Feng Gao ,&nbsp;Yongge Liu ,&nbsp;Deng Li ,&nbsp;Xu Chen ,&nbsp;Runhua Jiang ,&nbsp;Yahong Han","doi":"10.1016/j.image.2025.117334","DOIUrl":"10.1016/j.image.2025.117334","url":null,"abstract":"<div><div>The detection of Oracle Bone Inscriptions (OBIs) is the foundation of studying the OBIs via computer technology. Oracle bone inscription data includes rubbings, handwriting, and photos. Currently, most detection methods primarily focus on rubbings and rely on large-scale annotated datasets. However, it is necessary to detect oracle bone inscriptions on both handwriting and photo domains in practical applications. Additionally, annotating handwriting and photos is time-consuming and requires expert knowledge. An effective solution is to directly transfer the knowledge learned from the existing public dataset to the unlabeled target domain. However, the domain shift between domains heavily degrades the performance of this solution. To alleviate this problem and based on the characteristics of different domains of oracle bone, in this paper, we propose an information disentanglement method for the Unsupervised Domain Adaptive (UDA) OBIs detection to improve the detection performance of OBIs in both handwriting and photos. Specifically, we construct an image content encoder and a style encoder module to decouple the oracle bone image information. Then, a reconstruction decoder is constructed to reconstruct the source domain image guided by the target domain image information to reduce the shift between domains. To demonstrate the effectiveness of our method, we constructed an OBI detection benchmark that contains three domains: rubbing, handwriting, and photo. Extensive experiments verified the effectiveness and generality of our method on domain adaptive OBIs detection. Compared to other state-of-the-art UDAOD methods, our approach achieves an improvement of 0.5% and 0.6% in mAP for handwriting and photos, respectively.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117334"},"PeriodicalIF":3.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143902465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auxiliary captioning: Bridging image–text matching and image captioning 辅助字幕:连接图像-文本匹配和图像字幕
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-04-30 DOI: 10.1016/j.image.2025.117337
Hui Li , Jimin Xiao , Mingjie Sun , Eng Gee Lim , Yao Zhao
{"title":"Auxiliary captioning: Bridging image–text matching and image captioning","authors":"Hui Li ,&nbsp;Jimin Xiao ,&nbsp;Mingjie Sun ,&nbsp;Eng Gee Lim ,&nbsp;Yao Zhao","doi":"10.1016/j.image.2025.117337","DOIUrl":"10.1016/j.image.2025.117337","url":null,"abstract":"<div><div>The image–text matching task, where one query image (text) is provided to seek its corresponding text (image) in a gallery, has drawn increasing attention recently. Conventional methods try to directly map the image and text to one latent-aligned feature space for matching. Achieving an ideal feature alignment is arduous due to the fact that the significant content of the image is not highlighted. To overcome this limitation, we propose to use an auxiliary captioning step to enhance the image feature, where the image feature is fused with the text feature of the captioning output. In this way, the captioning output feature, sharing similar space distribution with candidate texts, can provide high-level semantic information to facilitate locating the significant content in an image. To optimize the auxiliary captioning output, we introduce a new metric, Caption-to-Text (C2T), representing the retrieval performance between the auxiliary captioning output and the ground-truth matching texts. By integrating our C2T score as a reward in our image captioning reinforcement learning framework, our image captioning model can generate more suitable sentences for the auxiliary image–text matching. Extensive experiments on MSCOCO and Flickr30k demonstrate our method’s superiority, which achieves absolute improvements of 5.7% (R@1) on Flickr30k and 3.2% (R@1) on MSCOCO over baseline approaches, outperforming state-of-the-art models without complex architectural modifications.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117337"},"PeriodicalIF":3.4,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143917402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing the complexity of distributed video coding by improving the image enhancement post processing 通过改进图像增强后处理,降低分布式视频编码的复杂度
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-04-29 DOI: 10.1016/j.image.2025.117339
Djamel Eddine Boudechiche , Said Benierbah
{"title":"Reducing the complexity of distributed video coding by improving the image enhancement post processing","authors":"Djamel Eddine Boudechiche ,&nbsp;Said Benierbah","doi":"10.1016/j.image.2025.117339","DOIUrl":"10.1016/j.image.2025.117339","url":null,"abstract":"<div><div>The main attractive feature of distributed video coding (DVC) is its use of low-complexity encoders, which are required by low-resource networked applications. Unfortunately, the performance of the currently proposed DVC systems is not yet convincing, and further improvements in the rate, distortion, and complexity tradeoff of DVC are necessary to make it more attractive for use in practical applications. This requires finding new ways to exploit side information in reducing the transmitted rate and improving the quality of the decoded frames. This paper proposes improving DVC by exploiting image enhancement post-processing at the decoder. In this way, we can either improve the quality of the decoded frames for a given rate or reduce the number of transmitted bits for the same quality and hence reduce the complexity of the encoder. To do this, we used a conditional generative adversarial network (cGAN) to restore more of the details discarded by quantization, with the help of side information. We also evaluated numerous existing deep learning-based enhancement methods for DVC and compared them to our proposed model. The results show a reduction in the number of DVC coding operations by 46 % and an improvement in rate-distortion performance and subjective visual quality. Furthermore, despite reducing its complexity, our DVC codec outperformed the DISCOVER codec with an average Bjøntegaard PSNR of 0.925 dB.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117339"},"PeriodicalIF":3.4,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143899898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MA-MNN: Multi-flow attentive memristive neural network for multi-task image restoration MA-MNN:用于多任务图像恢复的多流注意记忆神经网络
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-04-28 DOI: 10.1016/j.image.2025.117336
Peng He , Lin Zhang , Yu Yang , Yue Zhou , Shukai Duan , Xiaofang Hu
{"title":"MA-MNN: Multi-flow attentive memristive neural network for multi-task image restoration","authors":"Peng He ,&nbsp;Lin Zhang ,&nbsp;Yu Yang ,&nbsp;Yue Zhou ,&nbsp;Shukai Duan ,&nbsp;Xiaofang Hu","doi":"10.1016/j.image.2025.117336","DOIUrl":"10.1016/j.image.2025.117336","url":null,"abstract":"<div><div>Images taken in rainy, hazy, and low-light environments severely hinder the performance of outdoor computer vision systems. Most data-driven image restoration methods are task-specific and computationally intensive, whereas the capture and processing of degraded images occur largely in end-side devices with limited computing resources. Motivated by addressing the above issues, a novel software and hardware co-designed image restoration method named multi-flow attentive memristive neural network (MA-MNN) is proposed in this paper, which combines a deep learning algorithm and the nanoscale device memristor. The multi-level complementary spatial contextual information is exploited by the multi-flow aggregation block. The dense connection design is adopted to provide smooth transportation across units and alleviate the vanishing-gradient. The supervised calibration block is designed to facilitate achieving the dual-attention mechanism that helps the model identify and re-calibrate the transformed features. Besides, a hardware implementation scheme based on memristors is designed to provide low energy consumption solutions for embedded applications. Extensive experiments in image deraining, image dehazing and low-light image enhancement have shown that the proposed method is highly competitive with over 20 state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117336"},"PeriodicalIF":3.4,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143899899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An unsupervised fusion method for infrared and visible image under low-light condition based on Generative Adversarial Networks 基于生成对抗网络的弱光条件下红外与可见光图像无监督融合方法
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-04-24 DOI: 10.1016/j.image.2025.117324
Shuai Yang, Yuan Gao, Shiwei Ma
{"title":"An unsupervised fusion method for infrared and visible image under low-light condition based on Generative Adversarial Networks","authors":"Shuai Yang,&nbsp;Yuan Gao,&nbsp;Shiwei Ma","doi":"10.1016/j.image.2025.117324","DOIUrl":"10.1016/j.image.2025.117324","url":null,"abstract":"<div><div>The aim of fusing infrared and visible images is to achieve high-quality images by enhancing textural details and obtaining complementary benefits. However, the existing methods for fusing infrared and visible images are suitable only normal lighting scenes. The details of the visible image under low-light conditions are not discernible. Achieving complementarity between the image contours and textural details is challenging between the infrared image and the visible image. With the intention of addressing the challenge of poor quality of infrared and visible light fusion images under low light conditions, a novel unsupervised fusion method for infrared and visible image under low_light condition (referred to as UFIVL) is presented in this paper. Specifically, the proposed method effectively enhances the low-light regions of visible light images while reducing noise. To incorporate style features of the image into the reconstruction of content features, a sparse-connection dense structure is designed. An adaptive contrast-limited histogram equalization loss function is introduced to improve contrast and brightness in the fused image. The joint gradient loss is proposed to extract clearer texture features under low-light conditions. This end-to-end method generates fused images with enhanced contrast and rich details. Furthermore, considering the issues in existing public datasets, a dataset for individuals and objects in low-light conditions (LLHO <span><span>https://github.com/alex551781/LLHO</span><svg><path></path></svg></span>) is proposed. On the ground of the experimental results, we can conclude that the proposed method generates fusion images with higher subjective and objective quantification scores on both the LLVIP public dataset and the LLHO self-built dataset. Additionally, we apply the fusion images generated by UFIVL method to the advanced computer vision task of target detection, resulting in a significant improvement in detection performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117324"},"PeriodicalIF":3.4,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-fish tracking with underwater image enhancement by deep network in marine ecosystems 海洋生态系统中基于深度网络的水下图像增强多鱼跟踪
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-04-23 DOI: 10.1016/j.image.2025.117321
Prerana Mukherjee , Srimanta Mandal , Koteswar Rao Jerripothula , Vrishabhdhwaj Maharshi , Kashish Katara
{"title":"Multi-fish tracking with underwater image enhancement by deep network in marine ecosystems","authors":"Prerana Mukherjee ,&nbsp;Srimanta Mandal ,&nbsp;Koteswar Rao Jerripothula ,&nbsp;Vrishabhdhwaj Maharshi ,&nbsp;Kashish Katara","doi":"10.1016/j.image.2025.117321","DOIUrl":"10.1016/j.image.2025.117321","url":null,"abstract":"<div><div>Tracking marine life plays a crucial role in understanding migration patterns, movements, and population growth of underwater species. Deep learning-based fish-tracking networks have been actively researched and developed, yielding promising results. In this work, we propose an end-to-end deep learning framework for tracking fish in unconstrained marine environments. The core innovation of our approach is a Siamese-based architecture integrated with an image enhancement module, designed to measure appearance similarity effectively. The enhancement module consists of convolutional layers and a squeeze-and-excitation block, pre-trained on degraded and clean image pairs to address underwater distortions. This enhanced feature representation is leveraged within the Siamese framework to compute an appearance similarity score, which is further refined using prediction scores based on fish movement patterns. To ensure robust tracking, we combine the appearance similarity score, prediction score, and IoU-based similarity score to generate fish trajectories using the Hungarian algorithm. Our framework significantly reduces ID switches by 35.6% on the Fish4Knowledge dataset and 3.8% on the GMOT-40 fish category, all while maintaining high tracking accuracy. The source code of this work is available here: <span><span>https://github.com/srimanta-mandal/Multi-Fish-Tracking-with-Underwater-Image-Enhancement</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117321"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信