Ye Wang;Shaohui Mei;Mingyang Ma;Yuheng Liu;Yuru Su
{"title":"HTACPE: A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker","authors":"Ye Wang;Shaohui Mei;Mingyang Ma;Yuheng Liu;Yuru Su","doi":"10.1109/TMM.2024.3521819","DOIUrl":null,"url":null,"abstract":"Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2384-2398"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10820018/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.