Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu, Kaixiang Yan
{"title":"SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning","authors":"Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu, Kaixiang Yan","doi":"10.1007/s11227-024-06443-9","DOIUrl":null,"url":null,"abstract":"<p>In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"78 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06443-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.