基于分组相对自关注的点云场景流估计

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-01 DOI:10.1016/j.imavis.2024.105368

Xuezhi Xiang , Xiankun Zhou , Yingxin Wei , Xi Wang , Yulong Qiao

{"title":"基于分组相对自关注的点云场景流估计","authors":"Xuezhi Xiang , Xiankun Zhou , Yingxin Wei , Xi Wang , Yulong Qiao","doi":"10.1016/j.imavis.2024.105368","DOIUrl":null,"url":null,"abstract":"<div><div>3D scene flow estimation is a fundamental task in computer vision, which aims to estimate the 3D motions of point clouds. The point cloud is disordered, and the point density in the local area of the same object is non-uniform. The features extracted by previous methods are not discriminative enough to obtain accurate scene flow. Besides, scene flow may be misestimated when two adjacent frames of point clouds have large movements. From our observation, the quality of point cloud feature extraction and the correlations of two-frame point clouds directly affect the accuracy of scene flow estimation. Therefore, we propose an improved self-attention structure named Grouped Relative Self-Attention (GRSA) that simultaneously utilizes the grouping operation and offset subtraction operation with normalization refinement to process point clouds. Specifically, we embed the Grouped Relative Self-Attention (GRSA) into feature extraction and each stage of flow refinement to gain lightweight but efficient self-attention respectively, which can extract discriminative point features and enhance the point correlations to be more adaptable to long-distance movements. Furthermore, we use a comprehensive loss function to avoid outliers and obtain robust results. We evaluate our method on the FlyingThings3D and KITTI datasets and achieve superior performance. In particular, our method outperforms all other methods on the FlyingThings3D dataset, where Outliers achieves a 16.9% improvement. On the KITTI dataset, Outliers also achieves a 6.7% improvement.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105368"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scene flow estimation from point cloud based on grouped relative self-attention\",\"authors\":\"Xuezhi Xiang , Xiankun Zhou , Yingxin Wei , Xi Wang , Yulong Qiao\",\"doi\":\"10.1016/j.imavis.2024.105368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>3D scene flow estimation is a fundamental task in computer vision, which aims to estimate the 3D motions of point clouds. The point cloud is disordered, and the point density in the local area of the same object is non-uniform. The features extracted by previous methods are not discriminative enough to obtain accurate scene flow. Besides, scene flow may be misestimated when two adjacent frames of point clouds have large movements. From our observation, the quality of point cloud feature extraction and the correlations of two-frame point clouds directly affect the accuracy of scene flow estimation. Therefore, we propose an improved self-attention structure named Grouped Relative Self-Attention (GRSA) that simultaneously utilizes the grouping operation and offset subtraction operation with normalization refinement to process point clouds. Specifically, we embed the Grouped Relative Self-Attention (GRSA) into feature extraction and each stage of flow refinement to gain lightweight but efficient self-attention respectively, which can extract discriminative point features and enhance the point correlations to be more adaptable to long-distance movements. Furthermore, we use a comprehensive loss function to avoid outliers and obtain robust results. We evaluate our method on the FlyingThings3D and KITTI datasets and achieve superior performance. In particular, our method outperforms all other methods on the FlyingThings3D dataset, where Outliers achieves a 16.9% improvement. On the KITTI dataset, Outliers also achieves a 6.7% improvement.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"154 \",\"pages\":\"Article 105368\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624004736\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004736","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

三维场景流估计是计算机视觉中的一项基本任务，其目的是估计点云的三维运动。点云是无序的，同一物体局部区域的点密度是不均匀的。以往方法提取的特征判别能力不足，无法获得准确的场景流。此外，当相邻的两帧点云有较大的运动时，可能会对场景流进行错误估计。从我们的观察来看，点云特征提取的质量和两帧点云的相关性直接影响场景流估计的准确性。因此，我们提出了一种改进的自注意结构，称为分组相对自注意（GRSA），该结构同时利用分组操作和偏移量减法操作并进行归一化细化来处理点云。具体而言，我们将GRSA （Grouped Relative Self-Attention，分组相对自注意）分别嵌入到特征提取和流精化的各个阶段中，以获得轻量级和高效的自注意，从而可以提取有区别的点特征并增强点相关性，从而更适应远距离运动。此外，我们使用综合损失函数来避免异常值并获得鲁棒性结果。我们在FlyingThings3D和KITTI数据集上对我们的方法进行了评估，并取得了优异的性能。特别是，我们的方法在FlyingThings3D数据集上优于所有其他方法，其中Outliers实现了16.9%的改进。在KITTI数据集上，Outliers也实现了6.7%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scene flow estimation from point cloud based on grouped relative self-attention

3D scene flow estimation is a fundamental task in computer vision, which aims to estimate the 3D motions of point clouds. The point cloud is disordered, and the point density in the local area of the same object is non-uniform. The features extracted by previous methods are not discriminative enough to obtain accurate scene flow. Besides, scene flow may be misestimated when two adjacent frames of point clouds have large movements. From our observation, the quality of point cloud feature extraction and the correlations of two-frame point clouds directly affect the accuracy of scene flow estimation. Therefore, we propose an improved self-attention structure named Grouped Relative Self-Attention (GRSA) that simultaneously utilizes the grouping operation and offset subtraction operation with normalization refinement to process point clouds. Specifically, we embed the Grouped Relative Self-Attention (GRSA) into feature extraction and each stage of flow refinement to gain lightweight but efficient self-attention respectively, which can extract discriminative point features and enhance the point correlations to be more adaptable to long-distance movements. Furthermore, we use a comprehensive loss function to avoid outliers and obtain robust results. We evaluate our method on the FlyingThings3D and KITTI datasets and achieve superior performance. In particular, our method outperforms all other methods on the FlyingThings3D dataset, where Outliers achieves a 16.9% improvement. On the KITTI dataset, Outliers also achieves a 6.7% improvement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.