Jiaxiong Liu, Bo Wang, Zhen Tan, Jinpu Zhang, Hui Shen, Dewen Hu
{"title":"Tracking Any Point with Frame-Event Fusion Network at High Frame Rate","authors":"Jiaxiong Liu, Bo Wang, Zhen Tan, Jinpu Zhang, Hui Shen, Dewen Hu","doi":"arxiv-2409.11953","DOIUrl":null,"url":null,"abstract":"Tracking any point based on image frames is constrained by frame rates,\nleading to instability in high-speed scenarios and limited generalization in\nreal-world applications. To overcome these limitations, we propose an\nimage-event fusion point tracker, FE-TAP, which combines the contextual\ninformation from image frames with the high temporal resolution of events,\nachieving high frame rate and robust point tracking under various challenging\nconditions. Specifically, we designed an Evolution Fusion module (EvoFusion) to\nmodel the image generation process guided by events. This module can\neffectively integrate valuable information from both modalities operating at\ndifferent frequencies. To achieve smoother point trajectories, we employed a\ntransformer-based refinement strategy that updates the point's trajectories and\nfeatures iteratively. Extensive experiments demonstrate that our method\noutperforms state-of-the-art approaches, particularly improving expected\nfeature age by 24$\\%$ on EDS datasets. Finally, we qualitatively validated the\nrobustness of our algorithm in real driving scenarios using our custom-designed\nhigh-resolution image-event synchronization device. Our source code will be\nreleased at https://github.com/ljx1002/FE-TAP.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Tracking any point based on image frames is constrained by frame rates,
leading to instability in high-speed scenarios and limited generalization in
real-world applications. To overcome these limitations, we propose an
image-event fusion point tracker, FE-TAP, which combines the contextual
information from image frames with the high temporal resolution of events,
achieving high frame rate and robust point tracking under various challenging
conditions. Specifically, we designed an Evolution Fusion module (EvoFusion) to
model the image generation process guided by events. This module can
effectively integrate valuable information from both modalities operating at
different frequencies. To achieve smoother point trajectories, we employed a
transformer-based refinement strategy that updates the point's trajectories and
features iteratively. Extensive experiments demonstrate that our method
outperforms state-of-the-art approaches, particularly improving expected
feature age by 24$\%$ on EDS datasets. Finally, we qualitatively validated the
robustness of our algorithm in real driving scenarios using our custom-designed
high-resolution image-event synchronization device. Our source code will be
released at https://github.com/ljx1002/FE-TAP.