SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

The Journal of Supercomputing Pub Date : 2024-08-16 DOI:10.1007/s11227-024-06443-9

Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu, Kaixiang Yan

{"title":"SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning","authors":"Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu, Kaixiang Yan","doi":"10.1007/s11227-024-06443-9","DOIUrl":null,"url":null,"abstract":"<p>In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"78 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06443-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.

Abstract Image

查看原文本刊更多论文

SiamMGT：通过图注意和可靠的模态权重学习实现鲁棒 RGBT 跟踪

近年来，基于连体网络的 RGBT 跟踪器因其均衡的精度和效率而备受关注。然而，这些跟踪器通常依赖于固定大小的目标模板和搜索区域之间的特征相似性匹配，当目标的比例、形状发生剧烈变化或发生遮挡时，跟踪性能可能会不尽如人意。此外，虽然这些跟踪器通常针对不同的模式采用特征级融合，但它们经常忽略决策级融合的好处，这可能会降低它们的灵活性和独立性。本文提出了一种新颖的连体跟踪器，通过图关注和可靠的模态加权实现鲁棒 RGBT 跟踪。具体来说，本文构建了一个模态特征交互学习网络，对每种模态的局部特征进行双向学习，同时提取它们各自的特征。随后，多模态图注意网络用于匹配模板和搜索区域的局部特征，生成更准确、更稳健的相似性响应。最后，设计了一个模态融合预测网络，对两种模态响应进行决策级自适应融合，利用它们的互补性进行预测。在三个大规模 RGBT 基准上进行的广泛实验表明，与其他最先进的跟踪器相比，该系统具有出色的跟踪能力。代码将在 https://github.com/genglizhi/SiamMGT 上共享。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journal of Supercomputing

自引率

0.00%

发文量