Di Yuan;Haiping Zhang;Qiao Liu;Xiaojun Chang;Zhenyu He
{"title":"Transformer-Based RGBT Tracking With Spatio-Temporal Information Fusion","authors":"Di Yuan;Haiping Zhang;Qiao Liu;Xiaojun Chang;Zhenyu He","doi":"10.1109/JSEN.2025.3575188","DOIUrl":null,"url":null,"abstract":"RGB-thermal (RGBT) tracking usually uses an RGB tracker as the base model, and then uses the RGBT dataset to fully fine-tune the model. These methods ignore the differences in target features in the RGB domain and the TIR domain. At the same time, existing RGBT trackers match the spatial features of the initial template and the search image, ignoring the role of temporal information in RGBT tracking, resulting in the failure of the tracker to track in complex scenarios such as change in the appearance of the target and occlusion. To address the above problems, we propose a simple and efficient tracker called STTrack. The tracker adopts a symmetric dual-stream structure, which consists of several fine tuning transformer (FT Transformer) encoders, a prediction head, and an online update module. Specifically, the FT Transformer encoder first adds some trainable parameters to the frozen pretrained RGB-based tracker, transfers the feature extraction capability from the RGB domain to the TIR domain, and enhances the model’s perception of cross-modal data; second, the output features of the RGB and TIR modalities are fused and fed into the prediction head to obtain the target’s position; finally, the online update module obtains an online template with temporal information, which complements the spatial information provided by the initial template. The spatio-temporal information provided by the dual templates improves the RGBT tracker’s ability to locate targets in complex environments. Extensive quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance on four most popular RGBT benchmarks and runs at 32 FPS in real time.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 13","pages":"25386-25396"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/11026807/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
RGB-thermal (RGBT) tracking usually uses an RGB tracker as the base model, and then uses the RGBT dataset to fully fine-tune the model. These methods ignore the differences in target features in the RGB domain and the TIR domain. At the same time, existing RGBT trackers match the spatial features of the initial template and the search image, ignoring the role of temporal information in RGBT tracking, resulting in the failure of the tracker to track in complex scenarios such as change in the appearance of the target and occlusion. To address the above problems, we propose a simple and efficient tracker called STTrack. The tracker adopts a symmetric dual-stream structure, which consists of several fine tuning transformer (FT Transformer) encoders, a prediction head, and an online update module. Specifically, the FT Transformer encoder first adds some trainable parameters to the frozen pretrained RGB-based tracker, transfers the feature extraction capability from the RGB domain to the TIR domain, and enhances the model’s perception of cross-modal data; second, the output features of the RGB and TIR modalities are fused and fed into the prediction head to obtain the target’s position; finally, the online update module obtains an online template with temporal information, which complements the spatial information provided by the initial template. The spatio-temporal information provided by the dual templates improves the RGBT tracker’s ability to locate targets in complex environments. Extensive quantitative and qualitative experiments demonstrate that our approach achieves state-of-the-art performance on four most popular RGBT benchmarks and runs at 32 FPS in real time.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice