Cross-modal information interaction of binocular predictive networks for RGBT tracking

IF 2.9 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-07-14 DOI:10.1016/j.dsp.2025.105473

Jianming Chen , Dingjian Li , Xiangjin Zeng , Yaman Jing , Zhenbo Ren , Jianglei Di , Yuwen Qin

{"title":"Cross-modal information interaction of binocular predictive networks for RGBT tracking","authors":"Jianming Chen , Dingjian Li , Xiangjin Zeng , Yaman Jing , Zhenbo Ren , Jianglei Di , Yuwen Qin","doi":"10.1016/j.dsp.2025.105473","DOIUrl":null,"url":null,"abstract":"<div><div>RGBT tracking aims to aggregate the information from both visible and thermal infrared modalities to achieve visual object tracking. Although plenty of RGBT tracking methods have been proposed, they usually lead to target loss or tracking drift due to the inability to effectively extract useful feature information contained in the multimodal information. To handle this problem, we propose a cross-modal information interaction binocular prediction network. Firstly, a deep, multi-branch feature extraction network is constructed based on Siamese networks to fully exploit the semantic features of images from different optical modalities. The designed image feature enhancement modules are utilized to effectively capture and enhance object features, thereby improving tracking performance. Secondly, a fusion scheme is developed to achieve bidirectional fusion of multimodal features, leveraging complementary cross-modal information to retain distinguishable object characteristics across different modalities. Finally, the anchor-free concept is introduced into the RGBT object tracking domain and combined with a Peak Adaptive Selection (PAS) module to design a binocular prediction network, making the tracker more flexible and versatile. Evaluation experiments conducted on three standard RGBT tracking datasets, namely GTOT, RGBT234, and LasHeR, demonstrate that the modifications made to the baseline Siamese network architecture are effective. The proposed tracker is competitive with existing state-of-the-art (SOTA) methods, achieving comparable results in terms of precision and success rate. The key advantage of the proposed method lies in the robust fusion of multimodal features and the flexibility introduced by the anchor-free prediction design, which contribute to the stability of the proposed tracker across various scenarios. Code is released at <span><span>https://github.com/JMChenl/RGBT-tracking.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105473"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004956","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

RGBT tracking aims to aggregate the information from both visible and thermal infrared modalities to achieve visual object tracking. Although plenty of RGBT tracking methods have been proposed, they usually lead to target loss or tracking drift due to the inability to effectively extract useful feature information contained in the multimodal information. To handle this problem, we propose a cross-modal information interaction binocular prediction network. Firstly, a deep, multi-branch feature extraction network is constructed based on Siamese networks to fully exploit the semantic features of images from different optical modalities. The designed image feature enhancement modules are utilized to effectively capture and enhance object features, thereby improving tracking performance. Secondly, a fusion scheme is developed to achieve bidirectional fusion of multimodal features, leveraging complementary cross-modal information to retain distinguishable object characteristics across different modalities. Finally, the anchor-free concept is introduced into the RGBT object tracking domain and combined with a Peak Adaptive Selection (PAS) module to design a binocular prediction network, making the tracker more flexible and versatile. Evaluation experiments conducted on three standard RGBT tracking datasets, namely GTOT, RGBT234, and LasHeR, demonstrate that the modifications made to the baseline Siamese network architecture are effective. The proposed tracker is competitive with existing state-of-the-art (SOTA) methods, achieving comparable results in terms of precision and success rate. The key advantage of the proposed method lies in the robust fusion of multimodal features and the flexibility introduced by the anchor-free prediction design, which contribute to the stability of the proposed tracker across various scenarios. Code is released at https://github.com/JMChenl/RGBT-tracking.git.

查看原文本刊更多论文

双目预测网络在rbt跟踪中的跨模态信息交互

RGBT跟踪的目的是聚合可见光和热红外两种模式的信息，以实现视觉目标跟踪。虽然已经提出了大量的RGBT跟踪方法，但由于不能有效地提取多模态信息中包含的有用特征信息，往往导致目标丢失或跟踪漂移。为了解决这一问题，我们提出了一种跨模态信息交互双目预测网络。首先，基于Siamese网络构建深度多分支特征提取网络，充分挖掘不同光模态图像的语义特征；利用所设计的图像特征增强模块有效地捕获和增强目标特征，从而提高跟踪性能。其次，提出了一种融合方案，实现多模态特征的双向融合，利用互补的跨模态信息保留不同模态下可区分的目标特征；最后，将无锚点概念引入RGBT目标跟踪领域，并结合峰值自适应选择（Peak Adaptive Selection， PAS）模块设计双目预测网络，使跟踪器更加灵活和通用。在GTOT、RGBT234和LasHeR三个标准RGBT跟踪数据集上进行的评估实验表明，对基线Siamese网络架构的修改是有效的。所提出的跟踪器与现有的最先进的（SOTA）方法具有竞争力，在精度和成功率方面取得了相当的结果。该方法的主要优点在于多模态特征的鲁棒融合和无锚预测设计带来的灵活性，这有助于该跟踪器在各种场景下的稳定性。代码发布在https://github.com/JMChenl/RGBT-tracking.git。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,