从单张图像生成三维点云的循环扩散

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-02-27 DOI:10.1109/TIP.2025.3539935

Yan Zhou;Dewang Ye;Huaidong Zhang;Xuemiao Xu;Huajie Sun;Yewen Xu;Xiangyu Liu;Yuexia Zhou

{"title":"从单张图像生成三维点云的循环扩散","authors":"Yan Zhou;Dewang Ye;Huaidong Zhang;Xuemiao Xu;Huajie Sun;Yewen Xu;Xiangyu Liu;Yuexia Zhou","doi":"10.1109/TIP.2025.3539935","DOIUrl":null,"url":null,"abstract":"Single-image 3D shape reconstruction has attracted significant attention with the advance of generative models. Recent studies have utilized diffusion models to achieve unprecedented shape reconstruction quality. However, these methods, in each sampling step, perform denoising in a single forward pass, leading to cumulative errors that severely impact the geometric consistency of the generated shapes with the input targets and face difficulties in reconstructing rich details of complex 3D shapes. Moreover, the performance of current works suffers significant degradation due to limited information when only a single image is used as input during testing, further affecting the quality of 3D shape generation. In this paper, we present a recurrent diffusion framework, aiming to improve generation performance during single image-to-shape inference. Diverging from denoising in a single forward pass, we recursively refine the noise prediction in a self-rectified manner with the explicit guidance of the input target, thereby markedly suppressing cumulative errors and improving detail modeling. To enhance the geometric perception ability of the network during single-image inference, we further introduce a multi-view training scheme equipped with a view-robust conditional generation mechanism, which effectively promotes generation quality even when only a single image is available during inference. The effectiveness of our method is demonstrated through extensive evaluations on two public 3D shape datasets, where it surpasses state-of-the-art methods both qualitatively and quantitatively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1753-1765"},"PeriodicalIF":13.7000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recurrent Diffusion for 3D Point Cloud Generation From a Single Image\",\"authors\":\"Yan Zhou;Dewang Ye;Huaidong Zhang;Xuemiao Xu;Huajie Sun;Yewen Xu;Xiangyu Liu;Yuexia Zhou\",\"doi\":\"10.1109/TIP.2025.3539935\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-image 3D shape reconstruction has attracted significant attention with the advance of generative models. Recent studies have utilized diffusion models to achieve unprecedented shape reconstruction quality. However, these methods, in each sampling step, perform denoising in a single forward pass, leading to cumulative errors that severely impact the geometric consistency of the generated shapes with the input targets and face difficulties in reconstructing rich details of complex 3D shapes. Moreover, the performance of current works suffers significant degradation due to limited information when only a single image is used as input during testing, further affecting the quality of 3D shape generation. In this paper, we present a recurrent diffusion framework, aiming to improve generation performance during single image-to-shape inference. Diverging from denoising in a single forward pass, we recursively refine the noise prediction in a self-rectified manner with the explicit guidance of the input target, thereby markedly suppressing cumulative errors and improving detail modeling. To enhance the geometric perception ability of the network during single-image inference, we further introduce a multi-view training scheme equipped with a view-robust conditional generation mechanism, which effectively promotes generation quality even when only a single image is available during inference. The effectiveness of our method is demonstrated through extensive evaluations on two public 3D shape datasets, where it surpasses state-of-the-art methods both qualitatively and quantitatively.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"1753-1765\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10907786/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10907786/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着生成模型的发展，单图像三维形状重建受到了广泛的关注。最近的研究利用扩散模型实现了前所未有的形状重建质量。然而，这些方法在每个采样步骤中都是单次前向传递去噪，导致累积误差严重影响生成的形状与输入目标的几何一致性，并且难以重建复杂3D形状的丰富细节。此外，在测试过程中仅使用单张图像作为输入时，由于信息有限，当前作品的性能明显下降，进一步影响了3D形状生成的质量。在本文中，我们提出了一个循环扩散框架，旨在提高单幅图像到形状推理的生成性能。与单次前向传递去噪不同，我们在输入目标的明确引导下，以自校正的方式递归地改进噪声预测，从而显著抑制累积误差并改善细节建模。为了增强网络在单图像推理过程中的几何感知能力，我们进一步引入了一种多视图训练方案，该方案配备了视图鲁棒条件生成机制，即使在推理过程中只有一张图像可用，也能有效提高生成质量。通过对两个公共3D形状数据集的广泛评估，我们的方法的有效性得到了证明，它在定性和定量上都超过了最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recurrent Diffusion for 3D Point Cloud Generation From a Single Image

Single-image 3D shape reconstruction has attracted significant attention with the advance of generative models. Recent studies have utilized diffusion models to achieve unprecedented shape reconstruction quality. However, these methods, in each sampling step, perform denoising in a single forward pass, leading to cumulative errors that severely impact the geometric consistency of the generated shapes with the input targets and face difficulties in reconstructing rich details of complex 3D shapes. Moreover, the performance of current works suffers significant degradation due to limited information when only a single image is used as input during testing, further affecting the quality of 3D shape generation. In this paper, we present a recurrent diffusion framework, aiming to improve generation performance during single image-to-shape inference. Diverging from denoising in a single forward pass, we recursively refine the noise prediction in a self-rectified manner with the explicit guidance of the input target, thereby markedly suppressing cumulative errors and improving detail modeling. To enhance the geometric perception ability of the network during single-image inference, we further introduce a multi-view training scheme equipped with a view-robust conditional generation mechanism, which effectively promotes generation quality even when only a single image is available during inference. The effectiveness of our method is demonstrated through extensive evaluations on two public 3D shape datasets, where it surpasses state-of-the-art methods both qualitatively and quantitatively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量