{"title":"DreamReward-X:促进高质量的3D生成与人类偏好对齐。","authors":"Fangfu Liu,Junliang Ye,Yikai Wang,Hanyang Wang,Zhengyi Wang,Jun Zhu,Yueqi Duan","doi":"10.1109/tpami.2025.3609680","DOIUrl":null,"url":null,"abstract":"Recent advancements in 3D content generation have shown remarkable success by leveraging pretrained large-scale diffusion models. However, existing 3D generation results are far from perfect as one primary challenge lies in aligning 3D content with human preference, especially in text-driven 3D generation. In this paper, we propose a novel 3D generation framework, coined DreamReward, to learn and improve text-driven 3D generation models from human preference feedback. First, we collect 25K+ expert comparisons based on a systematic annotation pipeline including filtering, rating, and ranking. Then, we build Reward3D, the first general-purpose text-to-3D human preference reward model to encode human preferences effectively. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL) algorithm to guide the noisy pretrained distribution toward the actual user-prompt distributions in optimization. With the rapid development and growing popularity of 4D and image-driven 3D generation, we further extend our DreamReward into 4D generation (DreamReward-4D) and image-to-3D generation (DreamReward-img) in a low-cost but effective manner. Despite the impressive results created by DreamReward, the diversity in text-driven 3D generation is limited due to inherent maximum likelihood-seeking issues. To address this, we explore the gap between Denoising Diffusion Implicit Models (DDIM) and SDS-based DreamFL in the generation process and propose DreamReward++, where we introduce a reward-aware noise sampling strategy to unleash text-driven diversity during the generation process while ensuring human preference alignment. Grounded by theoretical proof and extensive experiment comparisons, our method successfully generates high-fidelity and diverse 3D results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve 3D generation.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"73 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DreamReward-X: Boosting High-Quality 3D Generation with Human Preference Alignment.\",\"authors\":\"Fangfu Liu,Junliang Ye,Yikai Wang,Hanyang Wang,Zhengyi Wang,Jun Zhu,Yueqi Duan\",\"doi\":\"10.1109/tpami.2025.3609680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in 3D content generation have shown remarkable success by leveraging pretrained large-scale diffusion models. However, existing 3D generation results are far from perfect as one primary challenge lies in aligning 3D content with human preference, especially in text-driven 3D generation. In this paper, we propose a novel 3D generation framework, coined DreamReward, to learn and improve text-driven 3D generation models from human preference feedback. First, we collect 25K+ expert comparisons based on a systematic annotation pipeline including filtering, rating, and ranking. Then, we build Reward3D, the first general-purpose text-to-3D human preference reward model to encode human preferences effectively. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL) algorithm to guide the noisy pretrained distribution toward the actual user-prompt distributions in optimization. With the rapid development and growing popularity of 4D and image-driven 3D generation, we further extend our DreamReward into 4D generation (DreamReward-4D) and image-to-3D generation (DreamReward-img) in a low-cost but effective manner. Despite the impressive results created by DreamReward, the diversity in text-driven 3D generation is limited due to inherent maximum likelihood-seeking issues. To address this, we explore the gap between Denoising Diffusion Implicit Models (DDIM) and SDS-based DreamFL in the generation process and propose DreamReward++, where we introduce a reward-aware noise sampling strategy to unleash text-driven diversity during the generation process while ensuring human preference alignment. Grounded by theoretical proof and extensive experiment comparisons, our method successfully generates high-fidelity and diverse 3D results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve 3D generation.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3609680\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3609680","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
DreamReward-X: Boosting High-Quality 3D Generation with Human Preference Alignment.
Recent advancements in 3D content generation have shown remarkable success by leveraging pretrained large-scale diffusion models. However, existing 3D generation results are far from perfect as one primary challenge lies in aligning 3D content with human preference, especially in text-driven 3D generation. In this paper, we propose a novel 3D generation framework, coined DreamReward, to learn and improve text-driven 3D generation models from human preference feedback. First, we collect 25K+ expert comparisons based on a systematic annotation pipeline including filtering, rating, and ranking. Then, we build Reward3D, the first general-purpose text-to-3D human preference reward model to encode human preferences effectively. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL) algorithm to guide the noisy pretrained distribution toward the actual user-prompt distributions in optimization. With the rapid development and growing popularity of 4D and image-driven 3D generation, we further extend our DreamReward into 4D generation (DreamReward-4D) and image-to-3D generation (DreamReward-img) in a low-cost but effective manner. Despite the impressive results created by DreamReward, the diversity in text-driven 3D generation is limited due to inherent maximum likelihood-seeking issues. To address this, we explore the gap between Denoising Diffusion Implicit Models (DDIM) and SDS-based DreamFL in the generation process and propose DreamReward++, where we introduce a reward-aware noise sampling strategy to unleash text-driven diversity during the generation process while ensuring human preference alignment. Grounded by theoretical proof and extensive experiment comparisons, our method successfully generates high-fidelity and diverse 3D results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve 3D generation.
期刊介绍:
The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.