Jianming Chen , Dingjian Li , Xiangjin Zeng , Yaman Jing , Zhenbo Ren , Jianglei Di , Yuwen Qin
{"title":"Cross-modal information interaction of binocular predictive networks for RGBT tracking","authors":"Jianming Chen , Dingjian Li , Xiangjin Zeng , Yaman Jing , Zhenbo Ren , Jianglei Di , Yuwen Qin","doi":"10.1016/j.dsp.2025.105473","DOIUrl":null,"url":null,"abstract":"<div><div>RGBT tracking aims to aggregate the information from both visible and thermal infrared modalities to achieve visual object tracking. Although plenty of RGBT tracking methods have been proposed, they usually lead to target loss or tracking drift due to the inability to effectively extract useful feature information contained in the multimodal information. To handle this problem, we propose a cross-modal information interaction binocular prediction network. Firstly, a deep, multi-branch feature extraction network is constructed based on Siamese networks to fully exploit the semantic features of images from different optical modalities. The designed image feature enhancement modules are utilized to effectively capture and enhance object features, thereby improving tracking performance. Secondly, a fusion scheme is developed to achieve bidirectional fusion of multimodal features, leveraging complementary cross-modal information to retain distinguishable object characteristics across different modalities. Finally, the anchor-free concept is introduced into the RGBT object tracking domain and combined with a Peak Adaptive Selection (PAS) module to design a binocular prediction network, making the tracker more flexible and versatile. Evaluation experiments conducted on three standard RGBT tracking datasets, namely GTOT, RGBT234, and LasHeR, demonstrate that the modifications made to the baseline Siamese network architecture are effective. The proposed tracker is competitive with existing state-of-the-art (SOTA) methods, achieving comparable results in terms of precision and success rate. The key advantage of the proposed method lies in the robust fusion of multimodal features and the flexibility introduced by the anchor-free prediction design, which contribute to the stability of the proposed tracker across various scenarios. Code is released at <span><span>https://github.com/JMChenl/RGBT-tracking.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105473"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004956","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
RGBT tracking aims to aggregate the information from both visible and thermal infrared modalities to achieve visual object tracking. Although plenty of RGBT tracking methods have been proposed, they usually lead to target loss or tracking drift due to the inability to effectively extract useful feature information contained in the multimodal information. To handle this problem, we propose a cross-modal information interaction binocular prediction network. Firstly, a deep, multi-branch feature extraction network is constructed based on Siamese networks to fully exploit the semantic features of images from different optical modalities. The designed image feature enhancement modules are utilized to effectively capture and enhance object features, thereby improving tracking performance. Secondly, a fusion scheme is developed to achieve bidirectional fusion of multimodal features, leveraging complementary cross-modal information to retain distinguishable object characteristics across different modalities. Finally, the anchor-free concept is introduced into the RGBT object tracking domain and combined with a Peak Adaptive Selection (PAS) module to design a binocular prediction network, making the tracker more flexible and versatile. Evaluation experiments conducted on three standard RGBT tracking datasets, namely GTOT, RGBT234, and LasHeR, demonstrate that the modifications made to the baseline Siamese network architecture are effective. The proposed tracker is competitive with existing state-of-the-art (SOTA) methods, achieving comparable results in terms of precision and success rate. The key advantage of the proposed method lies in the robust fusion of multimodal features and the flexibility introduced by the anchor-free prediction design, which contribute to the stability of the proposed tracker across various scenarios. Code is released at https://github.com/JMChenl/RGBT-tracking.git.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,