End-to-end point cloud registration with transformer

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2024-11-26 DOI:10.1007/s10462-024-10985-y

Yong Wang, Pengbo Zhou, Guohua Geng, Li An, Qi Zhang

{"title":"End-to-end point cloud registration with transformer","authors":"Yong Wang, Pengbo Zhou, Guohua Geng, Li An, Qi Zhang","doi":"10.1007/s10462-024-10985-y","DOIUrl":null,"url":null,"abstract":"<div><p>With the widespread application of large-scale 3D point cloud data in real-world scenarios, efficient and accurate point cloud registration has become a crucial challenge. We propose an end-to-end point cloud registration method based on the Transformer architecture. This method addresses the issues of low overlap and registration in large scenes, exhibiting strong algorithmic versatility and efficiency. We introduce a combination of dynamic position encoding and ternary angular position encoding within the Transformer, effectively enhancing the representation capability of point cloud data and algorithmic generality, thus better tackling point cloud registration challenges in large scenes. Additionally, to enhance the learning capacity of the attention mechanism, we employ an improved cross-attention mechanism that multiplies the softmax with adaptive weights, enabling the model to capture key information within the point cloud more accurately. In the decoding stage, we introduce a multi-scale feature fusion approach that fully exploits the multi-layer information in point cloud data, further improving registration accuracy and robustness. Through the fusion of multi-scale features, we effectively mitigate information loss and handle matching problems between point clouds of varying sizes. Experimental results demonstrate the excellence of our method in addressing low overlap and registration tasks in large scenes, validated across multiple datasets including 3DMatch, ModelNet, KITTI, and MVP-RG.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 1","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10985-y.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10985-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With the widespread application of large-scale 3D point cloud data in real-world scenarios, efficient and accurate point cloud registration has become a crucial challenge. We propose an end-to-end point cloud registration method based on the Transformer architecture. This method addresses the issues of low overlap and registration in large scenes, exhibiting strong algorithmic versatility and efficiency. We introduce a combination of dynamic position encoding and ternary angular position encoding within the Transformer, effectively enhancing the representation capability of point cloud data and algorithmic generality, thus better tackling point cloud registration challenges in large scenes. Additionally, to enhance the learning capacity of the attention mechanism, we employ an improved cross-attention mechanism that multiplies the softmax with adaptive weights, enabling the model to capture key information within the point cloud more accurately. In the decoding stage, we introduce a multi-scale feature fusion approach that fully exploits the multi-layer information in point cloud data, further improving registration accuracy and robustness. Through the fusion of multi-scale features, we effectively mitigate information loss and handle matching problems between point clouds of varying sizes. Experimental results demonstrate the excellence of our method in addressing low overlap and registration tasks in large scenes, validated across multiple datasets including 3DMatch, ModelNet, KITTI, and MVP-RG.

查看原文本刊更多论文

使用转换器进行端到端点云注册

随着大规模三维点云数据在现实世界中的广泛应用，高效、准确的点云注册已成为一项重要挑战。我们提出了一种基于 Transformer 架构的端到端点云注册方法。该方法解决了低重叠和大场景下的注册问题，具有很强的算法通用性和高效性。我们在 Transformer 中引入了动态位置编码和三元角位置编码相结合的方法，有效增强了点云数据的表示能力和算法的通用性，从而更好地应对大场景中的点云注册难题。此外，为了增强注意力机制的学习能力，我们采用了一种改进的交叉注意力机制，将 softmax 与自适应权重相乘，使模型能够更准确地捕捉点云中的关键信息。在解码阶段，我们引入了多尺度特征融合方法，充分利用了点云数据中的多层信息，进一步提高了配准精度和鲁棒性。通过多尺度特征融合，我们有效地减少了信息丢失，并处理了不同大小点云之间的匹配问题。实验结果表明，我们的方法在处理大型场景中的低重叠率和配准任务方面表现出色，并在 3DMatch、ModelNet、KITTI 和 MVP-RG 等多个数据集上得到了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.