稀疏点云体素柱多帧交叉关注网络鲁棒单目标跟踪

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-05-12 DOI:10.1016/j.patcog.2025.111771

Luda Zhao , Yihua Hu , Xing Yang , Yicheng Wang , Zhenglei Dou , Yan Zhang

{"title":"稀疏点云体素柱多帧交叉关注网络鲁棒单目标跟踪","authors":"Luda Zhao , Yihua Hu , Xing Yang , Yicheng Wang , Zhenglei Dou , Yan Zhang","doi":"10.1016/j.patcog.2025.111771","DOIUrl":null,"url":null,"abstract":"<div><div>Single object tracking (SOT) within dynamic point cloud sequences is critically important in autonomous driving, remote sensing navigation, and smart industrial applications, etc. Point cloud collected via various LiDAR becomes sparse due to sensor-related and environmental disturbances, leading to tracking inaccuracies driven by the limited robustness of existing SOT algorithms. To mitigate these challenges, we propose a Voxel Pillar Multi-frame Cross Attention Network (VPMCAN) designed for sparse point cloud robust tracking. VPMCAN employs a voxel-based encoding of pillar information for feature extraction and utilizes a dense pyramid network for the extraction of multi-scale sparse data. The integration of multi-frame and cross-attention mechanisms during feature fusion allows for an effective balance between global and local features, significantly enhancing the target’s long-term tracking robustness. Additionally, VPMCAN’s design prioritizes lightweight architecture, to ensure hardware-friendly implementation. To showcase its efficacy, we constructed a maritime point cloud video sequences dataset and conducted extensive experiments across KITTI, nuScenes and Waymo datasets. Results reveal VPMCAN’s optimal performance in non-sparse scenes and a remarkable 32.5% improvement over state-of-the-art algorithms in sparse scenes, averaging over a 20% performance increase. This highlights the efficacy of the lightweight point cloud SOT algorithm in robustly tracking sparse targets, suggesting promising practical applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111771"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Voxel Pillar Multi-frame Cross Attention Network for sparse point cloud robust single object tracking\",\"authors\":\"Luda Zhao , Yihua Hu , Xing Yang , Yicheng Wang , Zhenglei Dou , Yan Zhang\",\"doi\":\"10.1016/j.patcog.2025.111771\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Single object tracking (SOT) within dynamic point cloud sequences is critically important in autonomous driving, remote sensing navigation, and smart industrial applications, etc. Point cloud collected via various LiDAR becomes sparse due to sensor-related and environmental disturbances, leading to tracking inaccuracies driven by the limited robustness of existing SOT algorithms. To mitigate these challenges, we propose a Voxel Pillar Multi-frame Cross Attention Network (VPMCAN) designed for sparse point cloud robust tracking. VPMCAN employs a voxel-based encoding of pillar information for feature extraction and utilizes a dense pyramid network for the extraction of multi-scale sparse data. The integration of multi-frame and cross-attention mechanisms during feature fusion allows for an effective balance between global and local features, significantly enhancing the target’s long-term tracking robustness. Additionally, VPMCAN’s design prioritizes lightweight architecture, to ensure hardware-friendly implementation. To showcase its efficacy, we constructed a maritime point cloud video sequences dataset and conducted extensive experiments across KITTI, nuScenes and Waymo datasets. Results reveal VPMCAN’s optimal performance in non-sparse scenes and a remarkable 32.5% improvement over state-of-the-art algorithms in sparse scenes, averaging over a 20% performance increase. This highlights the efficacy of the lightweight point cloud SOT algorithm in robustly tracking sparse targets, suggesting promising practical applications.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"167 \",\"pages\":\"Article 111771\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325004315\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325004315","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

动态点云序列中的单目标跟踪（SOT）在自动驾驶、遥感导航和智能工业等应用中至关重要。由于传感器相关和环境干扰，通过各种激光雷达收集的点云变得稀疏，导致现有SOT算法鲁棒性有限导致跟踪不准确。为了缓解这些挑战，我们提出了一种用于稀疏点云鲁棒跟踪的体素柱多帧交叉注意网络（VPMCAN）。VPMCAN采用基于体素的柱信息编码进行特征提取，利用密集金字塔网络提取多尺度稀疏数据。在特征融合过程中，多帧和交叉注意机制的集成使得全局和局部特征之间的有效平衡，显著增强了目标的长期跟踪鲁棒性。此外，VPMCAN的设计优先考虑轻量级架构，以确保硬件友好的实现。为了展示其有效性，我们构建了一个海上点云视频序列数据集，并在KITTI、nuScenes和Waymo数据集上进行了广泛的实验。结果表明，VPMCAN在非稀疏场景中具有最佳性能，在稀疏场景中比最先进的算法提高了32.5%，平均性能提高了20%以上。这突出了轻量级点云SOT算法在鲁棒跟踪稀疏目标方面的有效性，表明了有前景的实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Voxel Pillar Multi-frame Cross Attention Network for sparse point cloud robust single object tracking

Single object tracking (SOT) within dynamic point cloud sequences is critically important in autonomous driving, remote sensing navigation, and smart industrial applications, etc. Point cloud collected via various LiDAR becomes sparse due to sensor-related and environmental disturbances, leading to tracking inaccuracies driven by the limited robustness of existing SOT algorithms. To mitigate these challenges, we propose a Voxel Pillar Multi-frame Cross Attention Network (VPMCAN) designed for sparse point cloud robust tracking. VPMCAN employs a voxel-based encoding of pillar information for feature extraction and utilizes a dense pyramid network for the extraction of multi-scale sparse data. The integration of multi-frame and cross-attention mechanisms during feature fusion allows for an effective balance between global and local features, significantly enhancing the target’s long-term tracking robustness. Additionally, VPMCAN’s design prioritizes lightweight architecture, to ensure hardware-friendly implementation. To showcase its efficacy, we constructed a maritime point cloud video sequences dataset and conducted extensive experiments across KITTI, nuScenes and Waymo datasets. Results reveal VPMCAN’s optimal performance in non-sparse scenes and a remarkable 32.5% improvement over state-of-the-art algorithms in sparse scenes, averaging over a 20% performance increase. This highlights the efficacy of the lightweight point cloud SOT algorithm in robustly tracking sparse targets, suggesting promising practical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.