Kang Liu, Long Liu, Jiaqi Wang, Pingyan Hu, Yunhe Wang
{"title":"DPTrack:用于视觉目标跟踪的双提示引导网络","authors":"Kang Liu, Long Liu, Jiaqi Wang, Pingyan Hu, Yunhe Wang","doi":"10.1016/j.eswa.2025.128974","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, prompt-based spatio-temporal trackers have achieved impressive advancements. However, these methods tend to incorporate only limited spatio-temporal information through prompts, such as recursive prompts or historical prompts, which fail to fully exploit spatio-temporal context information in video sequences, i.e., the instantaneous states of nearby frame and the consistent states of past frames. This oversight inhibits further performance improvement in complex scenes. To tackle this issue, we resort to prompt learning and present a dual-prompt guided visual tracking network (DPTrack), consisting of an instantaneous prompt network (IPN) and a spatio-temporal consistency prompt network (ST-CPN). Specifically, the IPN captures the appearance changes of the nearby frame to generate instantaneous prompt, which are directly involved in feature extraction and information interaction. The ST-CPN learns a set of learnable prompts to summarize the spatio-temporal consistency of previous frames, and then iteratively guides the search features to emphasize the target embedding. In this way, the dual-prompt exploits rich spatio-temporal cues, enhancing the adaptability and robustness of the tracker. Furthermore, we introduce a spatio-temporal pooling collection mechanism (SPC) to maintain consistency and adapt to the appearance changes. Extensive experiments on seven benchmarks prove that the proposed DPTrack achieves very promising tracking performance. Our DPTrack achieves 77.6 % AO on GOT-10k and 52.8 % AUC on LaSOT<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>x</mi><mi>t</mi></mrow></msub></math></span>. Notably, it obtains 73.9 % AUC, 81.8 % P, and 84.5 % P<span><math><msub><mrow></mrow><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></math></span> on LaSOT, 60.8 % EAO, 78.2 % A, and 89.3 % R on VOT2020, outperforming the remarkable trackers and demonstrating its superiority.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"296 ","pages":"Article 128974"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DPTrack: Dual-prompt guided network for visual object tracking\",\"authors\":\"Kang Liu, Long Liu, Jiaqi Wang, Pingyan Hu, Yunhe Wang\",\"doi\":\"10.1016/j.eswa.2025.128974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, prompt-based spatio-temporal trackers have achieved impressive advancements. However, these methods tend to incorporate only limited spatio-temporal information through prompts, such as recursive prompts or historical prompts, which fail to fully exploit spatio-temporal context information in video sequences, i.e., the instantaneous states of nearby frame and the consistent states of past frames. This oversight inhibits further performance improvement in complex scenes. To tackle this issue, we resort to prompt learning and present a dual-prompt guided visual tracking network (DPTrack), consisting of an instantaneous prompt network (IPN) and a spatio-temporal consistency prompt network (ST-CPN). Specifically, the IPN captures the appearance changes of the nearby frame to generate instantaneous prompt, which are directly involved in feature extraction and information interaction. The ST-CPN learns a set of learnable prompts to summarize the spatio-temporal consistency of previous frames, and then iteratively guides the search features to emphasize the target embedding. In this way, the dual-prompt exploits rich spatio-temporal cues, enhancing the adaptability and robustness of the tracker. Furthermore, we introduce a spatio-temporal pooling collection mechanism (SPC) to maintain consistency and adapt to the appearance changes. Extensive experiments on seven benchmarks prove that the proposed DPTrack achieves very promising tracking performance. Our DPTrack achieves 77.6 % AO on GOT-10k and 52.8 % AUC on LaSOT<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>x</mi><mi>t</mi></mrow></msub></math></span>. Notably, it obtains 73.9 % AUC, 81.8 % P, and 84.5 % P<span><math><msub><mrow></mrow><mrow><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi></mrow></msub></math></span> on LaSOT, 60.8 % EAO, 78.2 % A, and 89.3 % R on VOT2020, outperforming the remarkable trackers and demonstrating its superiority.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"296 \",\"pages\":\"Article 128974\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425025916\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425025916","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
DPTrack: Dual-prompt guided network for visual object tracking
Recently, prompt-based spatio-temporal trackers have achieved impressive advancements. However, these methods tend to incorporate only limited spatio-temporal information through prompts, such as recursive prompts or historical prompts, which fail to fully exploit spatio-temporal context information in video sequences, i.e., the instantaneous states of nearby frame and the consistent states of past frames. This oversight inhibits further performance improvement in complex scenes. To tackle this issue, we resort to prompt learning and present a dual-prompt guided visual tracking network (DPTrack), consisting of an instantaneous prompt network (IPN) and a spatio-temporal consistency prompt network (ST-CPN). Specifically, the IPN captures the appearance changes of the nearby frame to generate instantaneous prompt, which are directly involved in feature extraction and information interaction. The ST-CPN learns a set of learnable prompts to summarize the spatio-temporal consistency of previous frames, and then iteratively guides the search features to emphasize the target embedding. In this way, the dual-prompt exploits rich spatio-temporal cues, enhancing the adaptability and robustness of the tracker. Furthermore, we introduce a spatio-temporal pooling collection mechanism (SPC) to maintain consistency and adapt to the appearance changes. Extensive experiments on seven benchmarks prove that the proposed DPTrack achieves very promising tracking performance. Our DPTrack achieves 77.6 % AO on GOT-10k and 52.8 % AUC on LaSOT. Notably, it obtains 73.9 % AUC, 81.8 % P, and 84.5 % P on LaSOT, 60.8 % EAO, 78.2 % A, and 89.3 % R on VOT2020, outperforming the remarkable trackers and demonstrating its superiority.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.