Transformer-Based RGBT Tracking With Spatio-Temporal Information Fusion

IF 4.3 2区 综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Di Yuan;Haiping Zhang;Qiao Liu;Xiaojun Chang;Zhenyu He
{"title":"Transformer-Based RGBT Tracking With Spatio-Temporal Information Fusion","authors":"Di Yuan;Haiping Zhang;Qiao Liu;Xiaojun Chang;Zhenyu He","doi":"10.1109/JSEN.2025.3575188","DOIUrl":null,"url":null,"abstract":"RGB-thermal (RGBT) tracking usually uses an RGB tracker as the base model, and then uses the RGBT dataset to fully fine-tune the model. These methods ignore the differences in target features in the RGB domain and the TIR domain. At the same time, existing RGBT trackers match the spatial features of the initial template and the search image, ignoring the role of temporal information in RGBT tracking, resulting in the failure of the tracker to track in complex scenarios such as change in the appearance of the target and occlusion. To address the above problems, we propose a simple and efficient tracker called STTrack. The tracker adopts a symmetric dual-stream structure, which consists of several fine tuning transformer (FT Transformer) encoders, a prediction head, and an online update module. Specifically, the FT Transformer encoder first adds some trainable parameters to the frozen pretrained RGB-based tracker, transfers the feature extraction capability from the RGB domain to the TIR domain, and enhances the model’s perception of cross-modal data; second, the output features of the RGB and TIR modalities are fused and fed into the prediction head to obtain the target’s position; finally, the online update module obtains an online template with temporal information, which complements the spatial information provided by the initial template. The spatio-temporal information provided by the dual templates improves the RGBT tracker’s ability to locate targets in complex environments. Extensive quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance on four most popular RGBT benchmarks and runs at 32 FPS in real time.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 13","pages":"25386-25396"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/11026807/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

RGB-thermal (RGBT) tracking usually uses an RGB tracker as the base model, and then uses the RGBT dataset to fully fine-tune the model. These methods ignore the differences in target features in the RGB domain and the TIR domain. At the same time, existing RGBT trackers match the spatial features of the initial template and the search image, ignoring the role of temporal information in RGBT tracking, resulting in the failure of the tracker to track in complex scenarios such as change in the appearance of the target and occlusion. To address the above problems, we propose a simple and efficient tracker called STTrack. The tracker adopts a symmetric dual-stream structure, which consists of several fine tuning transformer (FT Transformer) encoders, a prediction head, and an online update module. Specifically, the FT Transformer encoder first adds some trainable parameters to the frozen pretrained RGB-based tracker, transfers the feature extraction capability from the RGB domain to the TIR domain, and enhances the model’s perception of cross-modal data; second, the output features of the RGB and TIR modalities are fused and fed into the prediction head to obtain the target’s position; finally, the online update module obtains an online template with temporal information, which complements the spatial information provided by the initial template. The spatio-temporal information provided by the dual templates improves the RGBT tracker’s ability to locate targets in complex environments. Extensive quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance on four most popular RGBT benchmarks and runs at 32 FPS in real time.
基于变压器的时空信息融合rbt跟踪
RGB-thermal (RGBT)跟踪通常使用RGB跟踪器作为基本模型,然后使用RGB数据集对模型进行全面微调。这些方法忽略了RGB域和TIR域目标特征的差异。同时,现有的RGBT跟踪器将初始模板的空间特征与搜索图像进行匹配,忽略了时间信息在RGBT跟踪中的作用,导致跟踪器在目标外观变化、遮挡等复杂场景下无法进行跟踪。为了解决上述问题,我们提出了一种简单高效的跟踪器STTrack。跟踪器采用对称双流结构,由多个微调变压器(FT transformer)编码器、预测头和在线更新模块组成。具体来说,FT Transformer编码器首先在冷冻预训练的RGB跟踪器中加入一些可训练参数,将特征提取能力从RGB域转移到TIR域,增强模型对跨模态数据的感知能力;其次,将RGB和TIR模态的输出特征融合并输入到预测头中,得到目标的位置;最后,在线更新模块获得具有时间信息的在线模板,以补充初始模板提供的空间信息。双模板提供的时空信息提高了RGBT跟踪器在复杂环境中定位目标的能力。大量的定量和定性实验表明,我们的方法在四种最流行的RGBT基准测试中实现了最先进的性能,并以32 FPS的速度实时运行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Sensors Journal
IEEE Sensors Journal 工程技术-工程:电子与电气
CiteScore
7.70
自引率
14.00%
发文量
2058
审稿时长
5.2 months
期刊介绍: The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信