基于a5组神经元的关节姿态估计和形状重构等变扩散模型

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-11 DOI:10.1109/TPAMI.2025.3540593

Boyan Wan;Yifei Shi;Xiaohong Chen;Kai Xu

{"title":"基于a5组神经元的关节姿态估计和形状重构等变扩散模型","authors":"Boyan Wan;Yifei Shi;Xiaohong Chen;Kai Xu","doi":"10.1109/TPAMI.2025.3540593","DOIUrl":null,"url":null,"abstract":"Object pose estimation and shape reconstruction are inherently coupled tasks although they have so far been studied separately in most existing approaches. A few recent works addressed the problem of joint pose estimation and shape reconstruction, but they found difficulties in handling partial observations and shape ambiguities. An open challenge in this area is to design a mechanism that has the two tasks benefit each other and boost the performance and robustness of both. In this work, we advocate the use of diffusion models for joint estimation of category-level object poses and reconstruction of object geometry. Diffusion models formulate shape reconstruction as a generation process conditioned on input observations. It has two main advantages. First, the iterative inference of diffusion models provides a mechanism for iterative optimization for both pose estimation and shape reconstruction. Second, diffusion models allow multiple outputs starting from different input noises, which would address the problem of ambiguity caused by partial observations. To achieve this, we propose equivariant diffusion model for joint pose estimation and shape reconstruction. The approach consists of an equivariant feature extractor to aggregate features of the input point cloud and a ShapePose diffusion model to generate object pose and shape simultaneously. To avoid training the model on all possible shape poses in the SO(3) space, we propose to augment the diffusion model with A5-group neurons where the neurons are converted into 5D vectors and can be rotated with the alternating group A5. Based on the A5-group neurons, we implement SO(3)-equivariant 3D point convolution and SO(3)-equivariant concatenation, making the entire network SO(3)-equivariant. Moreover, to select the most plausible combination of pose and shape from the generated ones, we propose a geometry-based measure of plausibility for an estimated pose along with a reconstructed shape. Extensive experiments demonstrate the effectiveness of the proposed method. Specifically, our method achieves the state-of-the-art on two public datasets and a new dataset with stacked objects, in terms of shape reconstruction and pose estimation. In particular, we show the proposed method could provide multiple plausible outputs under partial observations and shape ambiguities.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4343-4357"},"PeriodicalIF":18.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Equivariant Diffusion Model With A5-Group Neurons for Joint Pose Estimation and Shape Reconstruction\",\"authors\":\"Boyan Wan;Yifei Shi;Xiaohong Chen;Kai Xu\",\"doi\":\"10.1109/TPAMI.2025.3540593\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Object pose estimation and shape reconstruction are inherently coupled tasks although they have so far been studied separately in most existing approaches. A few recent works addressed the problem of joint pose estimation and shape reconstruction, but they found difficulties in handling partial observations and shape ambiguities. An open challenge in this area is to design a mechanism that has the two tasks benefit each other and boost the performance and robustness of both. In this work, we advocate the use of diffusion models for joint estimation of category-level object poses and reconstruction of object geometry. Diffusion models formulate shape reconstruction as a generation process conditioned on input observations. It has two main advantages. First, the iterative inference of diffusion models provides a mechanism for iterative optimization for both pose estimation and shape reconstruction. Second, diffusion models allow multiple outputs starting from different input noises, which would address the problem of ambiguity caused by partial observations. To achieve this, we propose equivariant diffusion model for joint pose estimation and shape reconstruction. The approach consists of an equivariant feature extractor to aggregate features of the input point cloud and a ShapePose diffusion model to generate object pose and shape simultaneously. To avoid training the model on all possible shape poses in the SO(3) space, we propose to augment the diffusion model with A5-group neurons where the neurons are converted into 5D vectors and can be rotated with the alternating group A5. Based on the A5-group neurons, we implement SO(3)-equivariant 3D point convolution and SO(3)-equivariant concatenation, making the entire network SO(3)-equivariant. Moreover, to select the most plausible combination of pose and shape from the generated ones, we propose a geometry-based measure of plausibility for an estimated pose along with a reconstructed shape. Extensive experiments demonstrate the effectiveness of the proposed method. Specifically, our method achieves the state-of-the-art on two public datasets and a new dataset with stacked objects, in terms of shape reconstruction and pose estimation. In particular, we show the proposed method could provide multiple plausible outputs under partial observations and shape ambiguities.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 6\",\"pages\":\"4343-4357\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10879592/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10879592/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

物体姿态估计和形状重建是固有耦合的任务，尽管在大多数现有方法中它们是分开研究的。最近的一些工作解决了关节姿态估计和形状重建的问题，但他们发现在处理部分观测和形状模糊方面存在困难。该领域的一个公开挑战是设计一种机制，使两个任务相互受益，并提高两者的性能和鲁棒性。在这项工作中，我们提倡使用扩散模型来联合估计类别级物体姿态和重建物体几何形状。扩散模型将形状重建表述为以输入观测为条件的生成过程。它有两个主要优点。首先，扩散模型的迭代推理为姿态估计和形状重建提供了一种迭代优化机制。其次，扩散模型允许从不同的输入噪声开始多个输出，这将解决部分观测引起的模糊问题。为了实现这一点，我们提出了用于关节姿态估计和形状重建的等变扩散模型。该方法由一个等变特征提取器和ShapePose扩散模型组成，前者用于聚合输入点云的特征，后者用于同时生成目标的姿态和形状。为了避免在SO(3)空间中的所有可能的形状姿态上训练模型，我们提出用A5组神经元来增强扩散模型，其中神经元被转换为5D向量，并且可以与交替的A5组旋转。在a5组神经元的基础上，我们实现了SO(3)-等变三维点卷积和SO(3)-等变连接，使整个网络成为SO(3)-等变。此外，为了从生成的组合中选择最合理的姿势和形状组合，我们提出了一种基于几何的估计姿势和重建形状的合理性度量。大量的实验证明了该方法的有效性。具体来说，我们的方法在两个公共数据集和一个具有堆叠对象的新数据集上实现了最先进的形状重建和姿态估计。特别地，我们证明了所提出的方法可以在部分观测和形状歧义下提供多个可信的输出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Equivariant Diffusion Model With A5-Group Neurons for Joint Pose Estimation and Shape Reconstruction

Object pose estimation and shape reconstruction are inherently coupled tasks although they have so far been studied separately in most existing approaches. A few recent works addressed the problem of joint pose estimation and shape reconstruction, but they found difficulties in handling partial observations and shape ambiguities. An open challenge in this area is to design a mechanism that has the two tasks benefit each other and boost the performance and robustness of both. In this work, we advocate the use of diffusion models for joint estimation of category-level object poses and reconstruction of object geometry. Diffusion models formulate shape reconstruction as a generation process conditioned on input observations. It has two main advantages. First, the iterative inference of diffusion models provides a mechanism for iterative optimization for both pose estimation and shape reconstruction. Second, diffusion models allow multiple outputs starting from different input noises, which would address the problem of ambiguity caused by partial observations. To achieve this, we propose equivariant diffusion model for joint pose estimation and shape reconstruction. The approach consists of an equivariant feature extractor to aggregate features of the input point cloud and a ShapePose diffusion model to generate object pose and shape simultaneously. To avoid training the model on all possible shape poses in the SO(3) space, we propose to augment the diffusion model with A5-group neurons where the neurons are converted into 5D vectors and can be rotated with the alternating group A5. Based on the A5-group neurons, we implement SO(3)-equivariant 3D point convolution and SO(3)-equivariant concatenation, making the entire network SO(3)-equivariant. Moreover, to select the most plausible combination of pose and shape from the generated ones, we propose a geometry-based measure of plausibility for an estimated pose along with a reconstructed shape. Extensive experiments demonstrate the effectiveness of the proposed method. Specifically, our method achieves the state-of-the-art on two public datasets and a new dataset with stacked objects, in terms of shape reconstruction and pose estimation. In particular, we show the proposed method could provide multiple plausible outputs under partial observations and shape ambiguities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量