Reversible Modal Conversion Model for Thermal Infrared Tracking

IF 2.3 4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia Pub Date : 2023-07-01 DOI:10.1109/MMUL.2023.3239136

Yufei Zha, Fan Li, Huanyu Li, Peng Zhang, Wei Huang

{"title":"Reversible Modal Conversion Model for Thermal Infrared Tracking","authors":"Yufei Zha, Fan Li, Huanyu Li, Peng Zhang, Wei Huang","doi":"10.1109/MMUL.2023.3239136","DOIUrl":null,"url":null,"abstract":"Learning powerful CNN representation of the target is a key issue for thermal infrared (TIR) tracking. The lack of massive training TIR data is one of the obstacles to training the network in an end-to-end way from the scratch. Compared to the time-consuming and labor-intensive method of heavily relabeling data, we obtain trainable TIR images by leveraging the massive annotated RGB images in this article. Unlike the traditional image generation models, a modal reversible module is designed to maximize the information propagation between RGB and TIR modals in this work. The advantage is that this module can preserve the modal information as possible when the network is conducted on a large number of aligned RGBT image pairs. Additionally, the fake-TIR features generated by the proposed module are also integrated to enhance the target representation ability when TIR tracking is on-the-fly. To verify the proposed method, we conduct sufficient experiments on both single-modal TIR and multimodal RGBT tracking datasets. In single-modal TIR tracking, the performance of our method is improved by 2.8% and 0.94% on success rate compared with the SOTA on LSOTB-TIR and PTB-TIR dataset. In multimodal RGBT fusion tracking, the proposed method is tested on the RGBT234 and VOT-RGBT2020 datasets and the results have also reached the performance of SOTA.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"8-24"},"PeriodicalIF":2.3000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE MultiMedia","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MMUL.2023.3239136","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Learning powerful CNN representation of the target is a key issue for thermal infrared (TIR) tracking. The lack of massive training TIR data is one of the obstacles to training the network in an end-to-end way from the scratch. Compared to the time-consuming and labor-intensive method of heavily relabeling data, we obtain trainable TIR images by leveraging the massive annotated RGB images in this article. Unlike the traditional image generation models, a modal reversible module is designed to maximize the information propagation between RGB and TIR modals in this work. The advantage is that this module can preserve the modal information as possible when the network is conducted on a large number of aligned RGBT image pairs. Additionally, the fake-TIR features generated by the proposed module are also integrated to enhance the target representation ability when TIR tracking is on-the-fly. To verify the proposed method, we conduct sufficient experiments on both single-modal TIR and multimodal RGBT tracking datasets. In single-modal TIR tracking, the performance of our method is improved by 2.8% and 0.94% on success rate compared with the SOTA on LSOTB-TIR and PTB-TIR dataset. In multimodal RGBT fusion tracking, the proposed method is tested on the RGBT234 and VOT-RGBT2020 datasets and the results have also reached the performance of SOTA.

查看原文本刊更多论文

热红外跟踪的可逆模态转换模型

学习对目标的强大CNN表示是热红外(TIR)跟踪的关键问题。缺乏大量的训练TIR数据是从头开始以端到端方式训练网络的障碍之一。与耗时费力地重标注数据的方法相比，我们利用本文中大量注释的RGB图像获得了可训练的TIR图像。与传统的图像生成模型不同，本文设计了模态可逆模块，以最大限度地提高RGB和TIR模态之间的信息传播。该模块的优点是在对大量对齐的RGBT图像对进行网络时，可以尽可能地保留模态信息。此外，还集成了该模块生成的伪TIR特征，增强了动态TIR跟踪时的目标表示能力。为了验证所提出的方法，我们在单模态TIR和多模态RGBT跟踪数据集上进行了充分的实验。在单模态TIR跟踪中，与LSOTB-TIR和PTB-TIR数据集上的SOTA相比，该方法的成功率分别提高了2.8%和0.94%。在多模态RGBT234和vote - rgbt2020数据集上对该方法进行了测试，结果也达到了SOTA的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE MultiMedia 工程技术-计算机：理论方法

CiteScore

6.40

自引率

3.10%

发文量

审稿时长

>12 weeks

期刊介绍： The magazine contains technical information covering a broad range of issues in multimedia systems and applications. Articles discuss research as well as advanced practice in hardware/software and are expected to span the range from theory to working systems. Especially encouraged are papers discussing experiences with new or advanced systems and subsystems. To avoid unnecessary overlap with existing publications, acceptable papers must have a significant focus on aspects unique to multimedia systems and applications. These aspects are likely to be related to the special needs of multimedia information compared to other electronic data, for example, the size requirements of digital media and the importance of time in the representation of such media. The following list is not exhaustive, but is representative of the topics that are covered: Hardware and software for media compression, coding & processing; Media representations & standards for storage, editing, interchange, transmission & presentation; Hardware platforms supporting multimedia applications; Operating systems suitable for multimedia applications; Storage devices & technologies for multimedia information; Network technologies, protocols, architectures & delivery techniques intended for multimedia; Synchronization issues; Multimedia databases; Formalisms for multimedia information systems & applications; Programming paradigms & languages for multimedia; Multimedia user interfaces; Media creation integration editing & management; Creation & modification of multimedia applications.