atranss：基于双重注意改进单目标跟踪

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-07-31 DOI:10.1016/j.jvcir.2025.104553

Haichao Liu , Jiangwei Qin , Haoyu Liang , Miao Yu , Shijia Lou , Yang Luo

{"title":"atranss：基于双重注意改进单目标跟踪","authors":"Haichao Liu , Jiangwei Qin , Haoyu Liang , Miao Yu , Shijia Lou , Yang Luo","doi":"10.1016/j.jvcir.2025.104553","DOIUrl":null,"url":null,"abstract":"<div><div>The current mainstream Siamese-based object tracking methods usually match the local regions of two video frames. This regional association method ignores the global features of object modeling. To solve the robustness of long-term object tracking and improve the efficiency of object tracking to a certain extent, we propose a new tracking framework based on the dual attention mechanism, named ATrans. Our core design is based on the flexibility of the attention mechanism. We propose a dual attention module to obtain more precise features and enhance the robustness of feature extraction by paying attention to contextual information. We construct our ATrans tracking framework by stacking multiple encoders with dual attention modules and a decoder and placing a localization head on top. In addition, to solve the drift problem in the long-term object tracking process, we add an online update mechanism to the encoder structure to dynamically update the target template to enhance the robustness of the long-term tracking process. At the same time, to further improve the efficiency of the model, we propose a background removal module to reduce the amount of computation by discarding unnecessary background areas during the object tracking process. Experiments show that our tracker performs well on large datasets such as Lasot, Got10k, and TrackingNet.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104553"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ATrans: Improving single object tracking based on dual attention\",\"authors\":\"Haichao Liu , Jiangwei Qin , Haoyu Liang , Miao Yu , Shijia Lou , Yang Luo\",\"doi\":\"10.1016/j.jvcir.2025.104553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The current mainstream Siamese-based object tracking methods usually match the local regions of two video frames. This regional association method ignores the global features of object modeling. To solve the robustness of long-term object tracking and improve the efficiency of object tracking to a certain extent, we propose a new tracking framework based on the dual attention mechanism, named ATrans. Our core design is based on the flexibility of the attention mechanism. We propose a dual attention module to obtain more precise features and enhance the robustness of feature extraction by paying attention to contextual information. We construct our ATrans tracking framework by stacking multiple encoders with dual attention modules and a decoder and placing a localization head on top. In addition, to solve the drift problem in the long-term object tracking process, we add an online update mechanism to the encoder structure to dynamically update the target template to enhance the robustness of the long-term tracking process. At the same time, to further improve the efficiency of the model, we propose a background removal module to reduce the amount of computation by discarding unnecessary background areas during the object tracking process. Experiments show that our tracker performs well on large datasets such as Lasot, Got10k, and TrackingNet.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"111 \",\"pages\":\"Article 104553\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320325001671\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001671","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

目前主流的基于暹罗体的目标跟踪方法通常是匹配两个视频帧的局部区域。这种区域关联方法忽略了对象建模的全局特征。为了解决长期目标跟踪的鲁棒性问题，在一定程度上提高目标跟踪的效率，我们提出了一种新的基于双注意机制的跟踪框架，命名为atranss。我们的核心设计是基于注意力机制的灵活性。我们提出了一种双关注模块，通过关注上下文信息来获得更精确的特征，增强特征提取的鲁棒性。我们通过堆叠带有双注意模块和解码器的多个编码器来构建我们的atranss跟踪框架，并将定位放在顶部。此外，为了解决长期目标跟踪过程中的漂移问题，我们在编码器结构中增加了在线更新机制来动态更新目标模板，以增强长期跟踪过程的鲁棒性。同时，为了进一步提高模型的效率，我们提出了背景去除模块，通过在目标跟踪过程中丢弃不必要的背景区域来减少计算量。实验表明，我们的跟踪器在Lasot、Got10k和TrackingNet等大型数据集上表现良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ATrans: Improving single object tracking based on dual attention

The current mainstream Siamese-based object tracking methods usually match the local regions of two video frames. This regional association method ignores the global features of object modeling. To solve the robustness of long-term object tracking and improve the efficiency of object tracking to a certain extent, we propose a new tracking framework based on the dual attention mechanism, named ATrans. Our core design is based on the flexibility of the attention mechanism. We propose a dual attention module to obtain more precise features and enhance the robustness of feature extraction by paying attention to contextual information. We construct our ATrans tracking framework by stacking multiple encoders with dual attention modules and a decoder and placing a localization head on top. In addition, to solve the drift problem in the long-term object tracking process, we add an online update mechanism to the encoder structure to dynamically update the target template to enhance the robustness of the long-term tracking process. At the same time, to further improve the efficiency of the model, we propose a background removal module to reduce the amount of computation by discarding unnecessary background areas during the object tracking process. Experiments show that our tracker performs well on large datasets such as Lasot, Got10k, and TrackingNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.