DPCN++：用于通用姿态配准的可微分相位相关网络

IF 20.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2022-06-12 DOI:10.48550/arXiv.2206.05707

Zexi Chen, Yiyi Liao, Haozhe Du, Haodong Zhang, Xuecheng Xu, Haojian Lu, R. Xiong, Yue Wang

{"title":"DPCN++：用于通用姿态配准的可微分相位相关网络","authors":"Zexi Chen, Yiyi Liao, Haozhe Du, Haodong Zhang, Xuecheng Xu, Haojian Lu, R. Xiong, Yue Wang","doi":"10.48550/arXiv.2206.05707","DOIUrl":null,"url":null,"abstract":"Pose registration is critical in vision and robotics. This paper focuses on the challenging task of initialization-free pose registration up to 7DoF for homogeneous and heterogeneous measurements. While recent learning-based methods show promise using differentiable solvers, they either rely on heuristically defined correspondences or require initialization. Phase correlation seeks solutions in the spectral domain and is correspondence-free and initialization-free. Following this, we propose a differentiable solver and combine it with simple feature extraction networks, namely DPCN++. It can perform registration for homo/hetero inputs and generalizes well on unseen objects. Specifically, the feature extraction networks first learn dense feature grids from a pair of homogeneous/heterogeneous measurements. These feature grids are then transformed into a translation and scale invariant spectrum representation based on Fourier transform and spherical radial aggregation, decoupling translation and scale from rotation. Next, the rotation, scale, and translation are independently and efficiently estimated in the spectrum step-by-step. The entire pipeline is differentiable and trained end-to-end. We evaluate DCPN++ on a wide range of tasks taking different input modalities, including 2D bird's-eye view images, 3D object and scene measurements, and medical images. Experimental results demonstrate that DCPN++ outperforms both classical and learning-based baselines, especially on partially observed and heterogeneous measurements.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":20.8000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration\",\"authors\":\"Zexi Chen, Yiyi Liao, Haozhe Du, Haodong Zhang, Xuecheng Xu, Haojian Lu, R. Xiong, Yue Wang\",\"doi\":\"10.48550/arXiv.2206.05707\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pose registration is critical in vision and robotics. This paper focuses on the challenging task of initialization-free pose registration up to 7DoF for homogeneous and heterogeneous measurements. While recent learning-based methods show promise using differentiable solvers, they either rely on heuristically defined correspondences or require initialization. Phase correlation seeks solutions in the spectral domain and is correspondence-free and initialization-free. Following this, we propose a differentiable solver and combine it with simple feature extraction networks, namely DPCN++. It can perform registration for homo/hetero inputs and generalizes well on unseen objects. Specifically, the feature extraction networks first learn dense feature grids from a pair of homogeneous/heterogeneous measurements. These feature grids are then transformed into a translation and scale invariant spectrum representation based on Fourier transform and spherical radial aggregation, decoupling translation and scale from rotation. Next, the rotation, scale, and translation are independently and efficiently estimated in the spectrum step-by-step. The entire pipeline is differentiable and trained end-to-end. We evaluate DCPN++ on a wide range of tasks taking different input modalities, including 2D bird's-eye view images, 3D object and scene measurements, and medical images. Experimental results demonstrate that DCPN++ outperforms both classical and learning-based baselines, especially on partially observed and heterogeneous measurements.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":20.8000,\"publicationDate\":\"2022-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2206.05707\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.05707","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

姿势配准在视觉和机器人技术中至关重要。本文的重点是具有挑战性的任务，即针对同质和异质测量，无初始化姿态配准高达7DoF。虽然最近的基于学习的方法显示出使用可微分求解器的前景，但它们要么依赖于启发式定义的对应关系，要么需要初始化。相位相关在谱域中寻找解，并且是无对应和无初始化的。在此之后，我们提出了一种可微求解器，并将其与简单的特征提取网络相结合，即DPCN++。它可以对同源/异源输入进行配准，并对看不见的对象进行良好的泛化。具体而言，特征提取网络首先从一对同质/异质测量中学习密集特征网格。然后，基于傅立叶变换和球面径向聚合，将这些特征网格转换为平移和尺度不变的频谱表示，将平移和尺度与旋转解耦。接下来，在频谱中逐步独立有效地估计旋转、缩放和平移。整个管道是可微分的，并且是端到端训练的。我们在采用不同输入模式的广泛任务中评估DCPN++，包括2D鸟瞰图、3D对象和场景测量以及医学图像。实验结果表明，DCPN++的性能优于经典基线和基于学习的基线，尤其是在部分观测和异构测量方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Pose registration is critical in vision and robotics. This paper focuses on the challenging task of initialization-free pose registration up to 7DoF for homogeneous and heterogeneous measurements. While recent learning-based methods show promise using differentiable solvers, they either rely on heuristically defined correspondences or require initialization. Phase correlation seeks solutions in the spectral domain and is correspondence-free and initialization-free. Following this, we propose a differentiable solver and combine it with simple feature extraction networks, namely DPCN++. It can perform registration for homo/hetero inputs and generalizes well on unseen objects. Specifically, the feature extraction networks first learn dense feature grids from a pair of homogeneous/heterogeneous measurements. These feature grids are then transformed into a translation and scale invariant spectrum representation based on Fourier transform and spherical radial aggregation, decoupling translation and scale from rotation. Next, the rotation, scale, and translation are independently and efficiently estimated in the spectrum step-by-step. The entire pipeline is differentiable and trained end-to-end. We evaluate DCPN++ on a wide range of tasks taking different input modalities, including 2D bird's-eye view images, 3D object and scene measurements, and medical images. Experimental results demonstrate that DCPN++ outperforms both classical and learning-based baselines, especially on partially observed and heterogeneous measurements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.