{"title":"Dual-Level Modality De-Biasing for RGB-T Tracking","authors":"Yufan Hu;Zekai Shao;Bin Fan;Hongmin Liu","doi":"10.1109/TIP.2025.3562077","DOIUrl":null,"url":null,"abstract":"RGB-T tracking aims to effectively leverage the complement ability of visual (RGB) and infrared (TIR) modalities to achieve robust tracking performance in various scenarios. Existing RGB-T tracking methods typically adopt backbone networks pre-trained on large-scale RGB datasets, which can lead to a predisposition toward RGB image patterns. RGB and TIR modalities also exhibit inconsistent responses to regions with diverse properties, resulting in imbalances in tracking decisions. We refer to these issues as feature-level and decision-level biases in the TIR modality. In this paper, we propose a novel dual-level modality de-biasing framework for RGB-T tracking to eliminate the inherent feature and decision-level biases. Specifically, we propose a joint infrared-fusion adapter, comprising an infrared-aware adapter and a cross-fusion adapter, designed to adaptively mitigate feature-level biases and utilize complementary information between the two modalities. In addition to implicit feature-level adjustment, we propose a response-decoupled distillation strategy to explicitly alleviate decision-level biases, aiming to achieve consistently accurate decision-making between the RGB and TIR modalities. Extensive experiments on several popular RGB-T tracking benchmarks validate the effectiveness of our proposed method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2667-2679"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10975100/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
RGB-T tracking aims to effectively leverage the complement ability of visual (RGB) and infrared (TIR) modalities to achieve robust tracking performance in various scenarios. Existing RGB-T tracking methods typically adopt backbone networks pre-trained on large-scale RGB datasets, which can lead to a predisposition toward RGB image patterns. RGB and TIR modalities also exhibit inconsistent responses to regions with diverse properties, resulting in imbalances in tracking decisions. We refer to these issues as feature-level and decision-level biases in the TIR modality. In this paper, we propose a novel dual-level modality de-biasing framework for RGB-T tracking to eliminate the inherent feature and decision-level biases. Specifically, we propose a joint infrared-fusion adapter, comprising an infrared-aware adapter and a cross-fusion adapter, designed to adaptively mitigate feature-level biases and utilize complementary information between the two modalities. In addition to implicit feature-level adjustment, we propose a response-decoupled distillation strategy to explicitly alleviate decision-level biases, aiming to achieve consistently accurate decision-making between the RGB and TIR modalities. Extensive experiments on several popular RGB-T tracking benchmarks validate the effectiveness of our proposed method.