DPTrack:用于视觉目标跟踪的双提示引导网络

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Kang Liu, Long Liu, Jiaqi Wang, Pingyan Hu, Yunhe Wang
{"title":"DPTrack:用于视觉目标跟踪的双提示引导网络","authors":"Kang Liu,&nbsp;Long Liu,&nbsp;Jiaqi Wang,&nbsp;Pingyan Hu,&nbsp;Yunhe Wang","doi":"10.1016/j.eswa.2025.128974","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, prompt-based spatio-temporal trackers have achieved impressive advancements. However, these methods tend to incorporate only limited spatio-temporal information through prompts, such as recursive prompts or historical prompts, which fail to fully exploit spatio-temporal context information in video sequences, i.e., the instantaneous states of nearby frame and the consistent states of past frames. This oversight inhibits further performance improvement in complex scenes. To tackle this issue, we resort to prompt learning and present a dual-prompt guided visual tracking network (DPTrack), consisting of an instantaneous prompt network (IPN) and a spatio-temporal consistency prompt network (ST-CPN). Specifically, the IPN captures the appearance changes of the nearby frame to generate instantaneous prompt, which are directly involved in feature extraction and information interaction. The ST-CPN learns a set of learnable prompts to summarize the spatio-temporal consistency of previous frames, and then iteratively guides the search features to emphasize the target embedding. In this way, the dual-prompt exploits rich spatio-temporal cues, enhancing the adaptability and robustness of the tracker. Furthermore, we introduce a spatio-temporal pooling collection mechanism (SPC) to maintain consistency and adapt to the appearance changes. Extensive experiments on seven benchmarks prove that the proposed DPTrack achieves very promising tracking performance. Our DPTrack achieves 77.6 % AO on GOT-10k and 52.8 % AUC on LaSOT<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>x</mi><mi>t</mi></mrow></msub></math></span>. Notably, it obtains 73.9 % AUC, 81.8 % P, and 84.5 % P<span><math><msub><mrow></mrow><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></math></span> on LaSOT, 60.8 % EAO, 78.2 % A, and 89.3 % R on VOT2020, outperforming the remarkable trackers and demonstrating its superiority.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"296 ","pages":"Article 128974"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DPTrack: Dual-prompt guided network for visual object tracking\",\"authors\":\"Kang Liu,&nbsp;Long Liu,&nbsp;Jiaqi Wang,&nbsp;Pingyan Hu,&nbsp;Yunhe Wang\",\"doi\":\"10.1016/j.eswa.2025.128974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, prompt-based spatio-temporal trackers have achieved impressive advancements. However, these methods tend to incorporate only limited spatio-temporal information through prompts, such as recursive prompts or historical prompts, which fail to fully exploit spatio-temporal context information in video sequences, i.e., the instantaneous states of nearby frame and the consistent states of past frames. This oversight inhibits further performance improvement in complex scenes. To tackle this issue, we resort to prompt learning and present a dual-prompt guided visual tracking network (DPTrack), consisting of an instantaneous prompt network (IPN) and a spatio-temporal consistency prompt network (ST-CPN). Specifically, the IPN captures the appearance changes of the nearby frame to generate instantaneous prompt, which are directly involved in feature extraction and information interaction. The ST-CPN learns a set of learnable prompts to summarize the spatio-temporal consistency of previous frames, and then iteratively guides the search features to emphasize the target embedding. In this way, the dual-prompt exploits rich spatio-temporal cues, enhancing the adaptability and robustness of the tracker. Furthermore, we introduce a spatio-temporal pooling collection mechanism (SPC) to maintain consistency and adapt to the appearance changes. Extensive experiments on seven benchmarks prove that the proposed DPTrack achieves very promising tracking performance. Our DPTrack achieves 77.6 % AO on GOT-10k and 52.8 % AUC on LaSOT<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>x</mi><mi>t</mi></mrow></msub></math></span>. Notably, it obtains 73.9 % AUC, 81.8 % P, and 84.5 % P<span><math><msub><mrow></mrow><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></math></span> on LaSOT, 60.8 % EAO, 78.2 % A, and 89.3 % R on VOT2020, outperforming the remarkable trackers and demonstrating its superiority.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"296 \",\"pages\":\"Article 128974\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425025916\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425025916","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

最近,基于提示的时空追踪器取得了令人印象深刻的进展。然而,这些方法往往只通过递归提示或历史提示等提示信息来包含有限的时空信息,不能充分利用视频序列中的时空上下文信息,即附近帧的瞬时状态和过去帧的一致状态。这种疏忽抑制了复杂场景中性能的进一步提高。为了解决这一问题,我们采用提示学习的方法,提出了一种双提示引导视觉跟踪网络(DPTrack),由瞬时提示网络(IPN)和时空一致性提示网络(ST-CPN)组成。具体来说,IPN捕获附近帧的外观变化,产生瞬时提示,直接参与特征提取和信息交互。ST-CPN学习一组可学习的提示来总结前一帧的时空一致性,然后迭代引导搜索特征强调目标嵌入。这样,双提示利用了丰富的时空线索,增强了跟踪器的适应性和鲁棒性。此外,我们引入了一种时空池收集机制(SPC)来保持一致性并适应外观变化。在7个基准测试上的大量实验证明了所提出的DPTrack具有很好的跟踪性能。我们的DPTrack在GOT-10k上达到77.6%的AO,在LaSOText上达到52.8%的AUC。值得注意的是,它在LaSOT上获得了73.9%的AUC、81.8%的P和84.5%的PNorm,在VOT2020上获得了60.8%的EAO、78.2%的A和89.3%的R,超过了其他优秀的跟踪器,显示了它的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DPTrack: Dual-prompt guided network for visual object tracking
Recently, prompt-based spatio-temporal trackers have achieved impressive advancements. However, these methods tend to incorporate only limited spatio-temporal information through prompts, such as recursive prompts or historical prompts, which fail to fully exploit spatio-temporal context information in video sequences, i.e., the instantaneous states of nearby frame and the consistent states of past frames. This oversight inhibits further performance improvement in complex scenes. To tackle this issue, we resort to prompt learning and present a dual-prompt guided visual tracking network (DPTrack), consisting of an instantaneous prompt network (IPN) and a spatio-temporal consistency prompt network (ST-CPN). Specifically, the IPN captures the appearance changes of the nearby frame to generate instantaneous prompt, which are directly involved in feature extraction and information interaction. The ST-CPN learns a set of learnable prompts to summarize the spatio-temporal consistency of previous frames, and then iteratively guides the search features to emphasize the target embedding. In this way, the dual-prompt exploits rich spatio-temporal cues, enhancing the adaptability and robustness of the tracker. Furthermore, we introduce a spatio-temporal pooling collection mechanism (SPC) to maintain consistency and adapt to the appearance changes. Extensive experiments on seven benchmarks prove that the proposed DPTrack achieves very promising tracking performance. Our DPTrack achieves 77.6 % AO on GOT-10k and 52.8 % AUC on LaSOText. Notably, it obtains 73.9 % AUC, 81.8 % P, and 84.5 % PNorm on LaSOT, 60.8 % EAO, 78.2 % A, and 89.3 % R on VOT2020, outperforming the remarkable trackers and demonstrating its superiority.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信