Luda Zhao , Yihua Hu , Xing Yang , Yicheng Wang , Zhenglei Dou , Yan Zhang
{"title":"Voxel Pillar Multi-frame Cross Attention Network for sparse point cloud robust single object tracking","authors":"Luda Zhao , Yihua Hu , Xing Yang , Yicheng Wang , Zhenglei Dou , Yan Zhang","doi":"10.1016/j.patcog.2025.111771","DOIUrl":null,"url":null,"abstract":"<div><div>Single object tracking (SOT) within dynamic point cloud sequences is critically important in autonomous driving, remote sensing navigation, and smart industrial applications, etc. Point cloud collected via various LiDAR becomes sparse due to sensor-related and environmental disturbances, leading to tracking inaccuracies driven by the limited robustness of existing SOT algorithms. To mitigate these challenges, we propose a Voxel Pillar Multi-frame Cross Attention Network (VPMCAN) designed for sparse point cloud robust tracking. VPMCAN employs a voxel-based encoding of pillar information for feature extraction and utilizes a dense pyramid network for the extraction of multi-scale sparse data. The integration of multi-frame and cross-attention mechanisms during feature fusion allows for an effective balance between global and local features, significantly enhancing the target’s long-term tracking robustness. Additionally, VPMCAN’s design prioritizes lightweight architecture, to ensure hardware-friendly implementation. To showcase its efficacy, we constructed a maritime point cloud video sequences dataset and conducted extensive experiments across KITTI, nuScenes and Waymo datasets. Results reveal VPMCAN’s optimal performance in non-sparse scenes and a remarkable 32.5% improvement over state-of-the-art algorithms in sparse scenes, averaging over a 20% performance increase. This highlights the efficacy of the lightweight point cloud SOT algorithm in robustly tracking sparse targets, suggesting promising practical applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111771"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325004315","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Single object tracking (SOT) within dynamic point cloud sequences is critically important in autonomous driving, remote sensing navigation, and smart industrial applications, etc. Point cloud collected via various LiDAR becomes sparse due to sensor-related and environmental disturbances, leading to tracking inaccuracies driven by the limited robustness of existing SOT algorithms. To mitigate these challenges, we propose a Voxel Pillar Multi-frame Cross Attention Network (VPMCAN) designed for sparse point cloud robust tracking. VPMCAN employs a voxel-based encoding of pillar information for feature extraction and utilizes a dense pyramid network for the extraction of multi-scale sparse data. The integration of multi-frame and cross-attention mechanisms during feature fusion allows for an effective balance between global and local features, significantly enhancing the target’s long-term tracking robustness. Additionally, VPMCAN’s design prioritizes lightweight architecture, to ensure hardware-friendly implementation. To showcase its efficacy, we constructed a maritime point cloud video sequences dataset and conducted extensive experiments across KITTI, nuScenes and Waymo datasets. Results reveal VPMCAN’s optimal performance in non-sparse scenes and a remarkable 32.5% improvement over state-of-the-art algorithms in sparse scenes, averaging over a 20% performance increase. This highlights the efficacy of the lightweight point cloud SOT algorithm in robustly tracking sparse targets, suggesting promising practical applications.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.