鲁棒视觉跟踪的分层注意增强相关细化

IF 8.4 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-06-03 DOI:10.1109/TITS.2025.3570076

Si Chen;Rui Xu;Yan Yan;Yang Hua;Da-Han Wang;Shunzhi Zhu

{"title":"鲁棒视觉跟踪的分层注意增强相关细化","authors":"Si Chen;Rui Xu;Yan Yan;Yang Hua;Da-Han Wang;Shunzhi Zhu","doi":"10.1109/TITS.2025.3570076","DOIUrl":null,"url":null,"abstract":"In recent years, visual tracking has witnessed remarkable advancements with the exploration of feature extraction and correlation modeling techniques. However, inadequate robustness of either the backbone network or the correlation operation continues to plague existing trackers, leading to frustrating drift when confronted with similar distractors or cluttered backgrounds. To address this problem, we propose a hierarchical attention-enhanced correlation refinement network (HarNet) for achieving robust visual tracking. Specifically, a gated dual-view attention (GDA) module is first designed to aggregate the intra-layer attention and the inter-layer self-attention based on a fusion gate, so as to enhance hierarchical feature representations of the template. Meanwhile, a target-aware attention (TA) module introduces the template information to the inter-layer self-attention, which can highlight the target information in the search region. Moreover, a graph guided correlation (GGC) module leverages the pixel-to-local and pixel-to-global correlations to fully exploit both local- and global-spatial information between the template and the search region, and then uses the graph convolutional network (GCN) to further learn the node relationships of the correlation map for more finegrained correlations. Thus, with the above three elaborately designed modules, the HarNet is beneficial for the enhancement of feature representation and the precise localization of the target. Extensive experiments on popular visual tracking datasets (including OTB100, VOT2016, VOT2018, VOT2019, UAV123, UAV20L, GOT-10k, and LaSOT) demonstrate the superiority of our proposed method against several state-of-the-art tracking methods.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 7","pages":"9370-9386"},"PeriodicalIF":8.4000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hierarchical Attention-Enhanced Correlation Refinement for Robust Visual Tracking\",\"authors\":\"Si Chen;Rui Xu;Yan Yan;Yang Hua;Da-Han Wang;Shunzhi Zhu\",\"doi\":\"10.1109/TITS.2025.3570076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, visual tracking has witnessed remarkable advancements with the exploration of feature extraction and correlation modeling techniques. However, inadequate robustness of either the backbone network or the correlation operation continues to plague existing trackers, leading to frustrating drift when confronted with similar distractors or cluttered backgrounds. To address this problem, we propose a hierarchical attention-enhanced correlation refinement network (HarNet) for achieving robust visual tracking. Specifically, a gated dual-view attention (GDA) module is first designed to aggregate the intra-layer attention and the inter-layer self-attention based on a fusion gate, so as to enhance hierarchical feature representations of the template. Meanwhile, a target-aware attention (TA) module introduces the template information to the inter-layer self-attention, which can highlight the target information in the search region. Moreover, a graph guided correlation (GGC) module leverages the pixel-to-local and pixel-to-global correlations to fully exploit both local- and global-spatial information between the template and the search region, and then uses the graph convolutional network (GCN) to further learn the node relationships of the correlation map for more finegrained correlations. Thus, with the above three elaborately designed modules, the HarNet is beneficial for the enhancement of feature representation and the precise localization of the target. Extensive experiments on popular visual tracking datasets (including OTB100, VOT2016, VOT2018, VOT2019, UAV123, UAV20L, GOT-10k, and LaSOT) demonstrate the superiority of our proposed method against several state-of-the-art tracking methods.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 7\",\"pages\":\"9370-9386\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11023144/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11023144/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

近年来，随着特征提取和相关建模技术的探索，视觉跟踪取得了显著的进展。然而，主干网或相关操作的鲁棒性不足继续困扰着现有的跟踪器，当面对类似的干扰物或杂乱的背景时，导致令人沮丧的漂移。为了解决这个问题，我们提出了一种分层注意力增强相关细化网络（HarNet）来实现鲁棒的视觉跟踪。具体而言，首先设计了门控双视图注意（GDA）模块，基于融合门对层内注意和层间自注意进行聚合，增强模板的层次化特征表征；同时，目标感知注意（TA）模块将模板信息引入到层间自注意中，可以突出显示搜索区域内的目标信息。此外，图形引导相关性（GGC）模块利用像素到局部和像素到全局的相关性，充分利用模板与搜索区域之间的局部和全局空间信息，然后使用图形卷积网络（GCN）进一步学习相关图的节点关系，以获得更细粒度的相关性。因此，通过以上三个精心设计的模块，HarNet有利于增强特征表示和精确定位目标。在流行的视觉跟踪数据集（包括OTB100， VOT2016, VOT2018, VOT2019, UAV123, UAV20L， GOT-10k和LaSOT）上进行的大量实验表明，我们提出的方法相对于几种最先进的跟踪方法具有优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hierarchical Attention-Enhanced Correlation Refinement for Robust Visual Tracking

In recent years, visual tracking has witnessed remarkable advancements with the exploration of feature extraction and correlation modeling techniques. However, inadequate robustness of either the backbone network or the correlation operation continues to plague existing trackers, leading to frustrating drift when confronted with similar distractors or cluttered backgrounds. To address this problem, we propose a hierarchical attention-enhanced correlation refinement network (HarNet) for achieving robust visual tracking. Specifically, a gated dual-view attention (GDA) module is first designed to aggregate the intra-layer attention and the inter-layer self-attention based on a fusion gate, so as to enhance hierarchical feature representations of the template. Meanwhile, a target-aware attention (TA) module introduces the template information to the inter-layer self-attention, which can highlight the target information in the search region. Moreover, a graph guided correlation (GGC) module leverages the pixel-to-local and pixel-to-global correlations to fully exploit both local- and global-spatial information between the template and the search region, and then uses the graph convolutional network (GCN) to further learn the node relationships of the correlation map for more finegrained correlations. Thus, with the above three elaborately designed modules, the HarNet is beneficial for the enhancement of feature representation and the precise localization of the target. Extensive experiments on popular visual tracking datasets (including OTB100, VOT2016, VOT2018, VOT2019, UAV123, UAV20L, GOT-10k, and LaSOT) demonstrate the superiority of our proposed method against several state-of-the-art tracking methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.