基于先验知识驱动的rgb -事件跟踪混合提示学习

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-04-10 DOI:10.1109/TCSVT.2025.3559614

Mianzhao Wang;Fan Shi;Xu Cheng;Shengyong Chen

{"title":"基于先验知识驱动的rgb -事件跟踪混合提示学习","authors":"Mianzhao Wang;Fan Shi;Xu Cheng;Shengyong Chen","doi":"10.1109/TCSVT.2025.3559614","DOIUrl":null,"url":null,"abstract":"Event data can asynchronously capture variations in light intensity, thereby implicitly providing valuable complementary cues for RGB-Event tracking. Existing methods typically employ a direct interaction mechanism to fuse RGB and event data. However, due to differences in imaging mechanisms, the representational disparity between these two data types is not fixed, which can lead to tracking failures in certain challenging scenarios. To address this issue, we propose a novel prior knowledge-driven hybrid prompter learning framework for RGB-Event tracking. Specifically, we develop a frame-event hybrid prompter that leverages prior tracking knowledge from the foundation model as intermediate modal support to mitigate the heterogeneity between RGB and event data. By leveraging its rich prior tracking knowledge, the intermediate modal reduces the gap between the dense RGB and sparse event data interactions, effectively guiding complementary learning between modalities. Meanwhile, to mitigate the internal learning disparities between the lightweight hybrid prompter and the deep transformer model, we introduce a pseudo-prompt learning strategy that lies between full fine-tuning and partial fine-tuning. This strategy adopts a divide-and-conquer approach to assign different learning rates to modules with distinct functions, effectively reducing the dominant influence of RGB information in complex scenarios. Extensive experiments conducted on two public RGB-Event tracking datasets show that the proposed HPL outperforms state-of-the-art tracking methods, achieving exceptional performance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8679-8691"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prior Knowledge-Driven Hybrid Prompter Learning for RGB-Event Tracking\",\"authors\":\"Mianzhao Wang;Fan Shi;Xu Cheng;Shengyong Chen\",\"doi\":\"10.1109/TCSVT.2025.3559614\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Event data can asynchronously capture variations in light intensity, thereby implicitly providing valuable complementary cues for RGB-Event tracking. Existing methods typically employ a direct interaction mechanism to fuse RGB and event data. However, due to differences in imaging mechanisms, the representational disparity between these two data types is not fixed, which can lead to tracking failures in certain challenging scenarios. To address this issue, we propose a novel prior knowledge-driven hybrid prompter learning framework for RGB-Event tracking. Specifically, we develop a frame-event hybrid prompter that leverages prior tracking knowledge from the foundation model as intermediate modal support to mitigate the heterogeneity between RGB and event data. By leveraging its rich prior tracking knowledge, the intermediate modal reduces the gap between the dense RGB and sparse event data interactions, effectively guiding complementary learning between modalities. Meanwhile, to mitigate the internal learning disparities between the lightweight hybrid prompter and the deep transformer model, we introduce a pseudo-prompt learning strategy that lies between full fine-tuning and partial fine-tuning. This strategy adopts a divide-and-conquer approach to assign different learning rates to modules with distinct functions, effectively reducing the dominant influence of RGB information in complex scenarios. Extensive experiments conducted on two public RGB-Event tracking datasets show that the proposed HPL outperforms state-of-the-art tracking methods, achieving exceptional performance.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"8679-8691\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10962221/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10962221/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

事件数据可以异步捕获光强度的变化，从而隐式地为rgb事件跟踪提供有价值的补充线索。现有的方法通常采用直接交互机制来融合RGB和事件数据。然而，由于成像机制的差异，这两种数据类型之间的表示差异并不是固定的，这可能导致在某些具有挑战性的场景中跟踪失败。为了解决这个问题，我们提出了一种新的先验知识驱动的混合提示学习框架，用于RGB-Event跟踪。具体来说，我们开发了一个框架-事件混合提示器，它利用基础模型的先前跟踪知识作为中间模态支持，以减轻RGB和事件数据之间的异质性。通过利用其丰富的先验跟踪知识，中间模态减少了密集RGB和稀疏事件数据交互之间的差距，有效地指导了模态之间的互补学习。同时，为了缓解轻量级混合提示器和深度变压器模型之间的内部学习差异，我们引入了一种介于完全微调和部分微调之间的伪提示学习策略。该策略采用分而治之的方法，对不同功能的模块分配不同的学习率，有效降低了RGB信息在复杂场景下的主导影响。在两个公共rgb -事件跟踪数据集上进行的大量实验表明，所提出的HPL优于最先进的跟踪方法，实现了卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prior Knowledge-Driven Hybrid Prompter Learning for RGB-Event Tracking

Event data can asynchronously capture variations in light intensity, thereby implicitly providing valuable complementary cues for RGB-Event tracking. Existing methods typically employ a direct interaction mechanism to fuse RGB and event data. However, due to differences in imaging mechanisms, the representational disparity between these two data types is not fixed, which can lead to tracking failures in certain challenging scenarios. To address this issue, we propose a novel prior knowledge-driven hybrid prompter learning framework for RGB-Event tracking. Specifically, we develop a frame-event hybrid prompter that leverages prior tracking knowledge from the foundation model as intermediate modal support to mitigate the heterogeneity between RGB and event data. By leveraging its rich prior tracking knowledge, the intermediate modal reduces the gap between the dense RGB and sparse event data interactions, effectively guiding complementary learning between modalities. Meanwhile, to mitigate the internal learning disparities between the lightweight hybrid prompter and the deep transformer model, we introduce a pseudo-prompt learning strategy that lies between full fine-tuning and partial fine-tuning. This strategy adopts a divide-and-conquer approach to assign different learning rates to modules with distinct functions, effectively reducing the dominant influence of RGB information in complex scenarios. Extensive experiments conducted on two public RGB-Event tracking datasets show that the proposed HPL outperforms state-of-the-art tracking methods, achieving exceptional performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.