DreamReward-X：促进高质量的3D生成与人类偏好对齐。

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-15 DOI:10.1109/tpami.2025.3609680

Fangfu Liu,Junliang Ye,Yikai Wang,Hanyang Wang,Zhengyi Wang,Jun Zhu,Yueqi Duan

{"title":"DreamReward-X：促进高质量的3D生成与人类偏好对齐。","authors":"Fangfu Liu,Junliang Ye,Yikai Wang,Hanyang Wang,Zhengyi Wang,Jun Zhu,Yueqi Duan","doi":"10.1109/tpami.2025.3609680","DOIUrl":null,"url":null,"abstract":"Recent advancements in 3D content generation have shown remarkable success by leveraging pretrained large-scale diffusion models. However, existing 3D generation results are far from perfect as one primary challenge lies in aligning 3D content with human preference, especially in text-driven 3D generation. In this paper, we propose a novel 3D generation framework, coined DreamReward, to learn and improve text-driven 3D generation models from human preference feedback. First, we collect 25K+ expert comparisons based on a systematic annotation pipeline including filtering, rating, and ranking. Then, we build Reward3D, the first general-purpose text-to-3D human preference reward model to encode human preferences effectively. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL) algorithm to guide the noisy pretrained distribution toward the actual user-prompt distributions in optimization. With the rapid development and growing popularity of 4D and image-driven 3D generation, we further extend our DreamReward into 4D generation (DreamReward-4D) and image-to-3D generation (DreamReward-img) in a low-cost but effective manner. Despite the impressive results created by DreamReward, the diversity in text-driven 3D generation is limited due to inherent maximum likelihood-seeking issues. To address this, we explore the gap between Denoising Diffusion Implicit Models (DDIM) and SDS-based DreamFL in the generation process and propose DreamReward++, where we introduce a reward-aware noise sampling strategy to unleash text-driven diversity during the generation process while ensuring human preference alignment. Grounded by theoretical proof and extensive experiment comparisons, our method successfully generates high-fidelity and diverse 3D results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve 3D generation.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"73 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DreamReward-X: Boosting High-Quality 3D Generation with Human Preference Alignment.\",\"authors\":\"Fangfu Liu,Junliang Ye,Yikai Wang,Hanyang Wang,Zhengyi Wang,Jun Zhu,Yueqi Duan\",\"doi\":\"10.1109/tpami.2025.3609680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in 3D content generation have shown remarkable success by leveraging pretrained large-scale diffusion models. However, existing 3D generation results are far from perfect as one primary challenge lies in aligning 3D content with human preference, especially in text-driven 3D generation. In this paper, we propose a novel 3D generation framework, coined DreamReward, to learn and improve text-driven 3D generation models from human preference feedback. First, we collect 25K+ expert comparisons based on a systematic annotation pipeline including filtering, rating, and ranking. Then, we build Reward3D, the first general-purpose text-to-3D human preference reward model to encode human preferences effectively. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL) algorithm to guide the noisy pretrained distribution toward the actual user-prompt distributions in optimization. With the rapid development and growing popularity of 4D and image-driven 3D generation, we further extend our DreamReward into 4D generation (DreamReward-4D) and image-to-3D generation (DreamReward-img) in a low-cost but effective manner. Despite the impressive results created by DreamReward, the diversity in text-driven 3D generation is limited due to inherent maximum likelihood-seeking issues. To address this, we explore the gap between Denoising Diffusion Implicit Models (DDIM) and SDS-based DreamFL in the generation process and propose DreamReward++, where we introduce a reward-aware noise sampling strategy to unleash text-driven diversity during the generation process while ensuring human preference alignment. Grounded by theoretical proof and extensive experiment comparisons, our method successfully generates high-fidelity and diverse 3D results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve 3D generation.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3609680\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3609680","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

通过利用预训练的大规模扩散模型，3D内容生成的最新进展取得了显着的成功。然而，现有的3D生成结果远非完美，因为一个主要挑战在于将3D内容与人类偏好相匹配，特别是在文本驱动的3D生成中。在本文中，我们提出了一个新的3D生成框架，称为DreamReward，以学习和改进基于人类偏好反馈的文本驱动3D生成模型。首先，我们基于系统标注管道收集了25K+专家比较，包括过滤、评级和排名。然后，我们建立了Reward3D，这是第一个通用的文本到3d的人类偏好奖励模型，用于有效地编码人类偏好。在3D奖励模型的基础上，我们最后进行了理论分析，并提出了Reward3D反馈学习（DreamFL）算法，将带噪声的预训练分布引导到优化的实际用户提示分布。随着4D和图像驱动3D生成的快速发展和普及，我们进一步以低成本但有效的方式将DreamReward扩展到4D生成（DreamReward-4D）和图像到3D生成（DreamReward-img）。尽管DreamReward创造了令人印象深刻的结果，但由于固有的最大可能性寻找问题，文本驱动的3D生成的多样性受到限制。为了解决这个问题，我们探索了去噪扩散隐式模型（DDIM）和基于sds的DreamFL在生成过程中的差距，并提出了DreamReward++，其中我们引入了奖励感知噪声采样策略，以在生成过程中释放文本驱动的多样性，同时确保人类偏好的一致性。基于理论证明和广泛的实验比较，我们的方法成功地生成了高保真度和多样化的3D结果，并显著提高了与人类意图的快速一致性。我们的研究结果证明了从人类反馈中学习以改善3D生成的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DreamReward-X: Boosting High-Quality 3D Generation with Human Preference Alignment.

Recent advancements in 3D content generation have shown remarkable success by leveraging pretrained large-scale diffusion models. However, existing 3D generation results are far from perfect as one primary challenge lies in aligning 3D content with human preference, especially in text-driven 3D generation. In this paper, we propose a novel 3D generation framework, coined DreamReward, to learn and improve text-driven 3D generation models from human preference feedback. First, we collect 25K+ expert comparisons based on a systematic annotation pipeline including filtering, rating, and ranking. Then, we build Reward3D, the first general-purpose text-to-3D human preference reward model to encode human preferences effectively. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL) algorithm to guide the noisy pretrained distribution toward the actual user-prompt distributions in optimization. With the rapid development and growing popularity of 4D and image-driven 3D generation, we further extend our DreamReward into 4D generation (DreamReward-4D) and image-to-3D generation (DreamReward-img) in a low-cost but effective manner. Despite the impressive results created by DreamReward, the diversity in text-driven 3D generation is limited due to inherent maximum likelihood-seeking issues. To address this, we explore the gap between Denoising Diffusion Implicit Models (DDIM) and SDS-based DreamFL in the generation process and propose DreamReward++, where we introduce a reward-aware noise sampling strategy to unleash text-driven diversity during the generation process while ensuring human preference alignment. Grounded by theoretical proof and extensive experiment comparisons, our method successfully generates high-fidelity and diverse 3D results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve 3D generation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.