DreamArrangement: Learning Language-Conditioned Robotic Rearrangement of Objects via Denoising Diffusion and VLM Planner

IF 8.7 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Wenkai Chen;Changming Xiao;Ge Gao;Fuchun Sun;Changshui Zhang;Jianwei Zhang
{"title":"DreamArrangement: Learning Language-Conditioned Robotic Rearrangement of Objects via Denoising Diffusion and VLM Planner","authors":"Wenkai Chen;Changming Xiao;Ge Gao;Fuchun Sun;Changshui Zhang;Jianwei Zhang","doi":"10.1109/TSMC.2025.3611698","DOIUrl":null,"url":null,"abstract":"The capability for robotic systems to rearrange objects based on human instructions represents a critical step toward realizing embodied intelligence. Recently, diffusion-based learning has shown significant advancements in the field of data generation while prompt-based learning has proven effective in formulating robot manipulation strategies. However, prior solutions for robotic rearrangement have overlooked the significance of integrating human preferences and optimizing for rearrangement efficiency. Additionally, traditional prompt-based approaches struggle with complex, semantically meaningful rearrangement tasks without predefined target states for objects. To address these challenges, our work first introduces a comprehensive two dimensional (2-D) tabletop rearrangement dataset, utilizing a physical simulator to capture interobject relationships and semantic configurations. Then, we present DreamArrangement, a novel language-conditioned object rearrangement scheme, consisting of two primary processes: employing a transformer-based multimodal denoising diffusion model to envisage the desired arrangement of objects, and leveraging a vision–language foundational model to derive actionable policies from text, alongside initial and target visual information. In particular, we introduce an efficiency-oriented learning strategy to minimize the average motion distance of objects. Given few-shot instruction examples, the learned policy from our synthetic dataset can be transferred to the real world without extra human intervention. Extensive simulations validate DreamArrangement’s superior rearrangement quality and efficiency. Moreover, real-world robotic experiments confirm that our method can adeptly execute a range of challenging, language-conditioned, and long-horizon tasks with a singular model. The demonstration video can be found at <uri>https://youtu.be/fq25-DjrbQE</uri>","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 11","pages":"8675-8688"},"PeriodicalIF":8.7000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11176993/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The capability for robotic systems to rearrange objects based on human instructions represents a critical step toward realizing embodied intelligence. Recently, diffusion-based learning has shown significant advancements in the field of data generation while prompt-based learning has proven effective in formulating robot manipulation strategies. However, prior solutions for robotic rearrangement have overlooked the significance of integrating human preferences and optimizing for rearrangement efficiency. Additionally, traditional prompt-based approaches struggle with complex, semantically meaningful rearrangement tasks without predefined target states for objects. To address these challenges, our work first introduces a comprehensive two dimensional (2-D) tabletop rearrangement dataset, utilizing a physical simulator to capture interobject relationships and semantic configurations. Then, we present DreamArrangement, a novel language-conditioned object rearrangement scheme, consisting of two primary processes: employing a transformer-based multimodal denoising diffusion model to envisage the desired arrangement of objects, and leveraging a vision–language foundational model to derive actionable policies from text, alongside initial and target visual information. In particular, we introduce an efficiency-oriented learning strategy to minimize the average motion distance of objects. Given few-shot instruction examples, the learned policy from our synthetic dataset can be transferred to the real world without extra human intervention. Extensive simulations validate DreamArrangement’s superior rearrangement quality and efficiency. Moreover, real-world robotic experiments confirm that our method can adeptly execute a range of challenging, language-conditioned, and long-horizon tasks with a singular model. The demonstration video can be found at https://youtu.be/fq25-DjrbQE
梦境排列:基于去噪扩散和VLM规划的语言条件机器人物体重排学习
机器人系统根据人类指令重新排列物体的能力是实现具身智能的关键一步。近年来,基于扩散的学习在数据生成领域取得了重大进展,而基于提示的学习在制定机器人操作策略方面已被证明是有效的。然而,现有的机器人重排解决方案忽视了整合人类偏好和优化重排效率的重要性。此外,传统的基于提示的方法难以处理复杂的、语义上有意义的重排任务,因为没有预定义的对象目标状态。为了应对这些挑战,我们的工作首先引入了一个全面的二维(2-D)桌面重排数据集,利用物理模拟器捕获对象间关系和语义配置。然后,我们提出了DreamArrangement,这是一种新的语言条件下的对象重排方案,由两个主要过程组成:采用基于转换器的多模态去噪扩散模型来设想所需的对象排列,并利用视觉语言基础模型从文本以及初始和目标视觉信息中导出可操作的策略。特别地,我们引入了一种以效率为导向的学习策略来最小化物体的平均运动距离。给定少量的指令示例,从我们的合成数据集中学习到的策略可以转移到现实世界中,而无需额外的人为干预。大量的仿真验证了DreamArrangement优越的重排质量和效率。此外,现实世界的机器人实验证实,我们的方法可以熟练地执行一系列具有挑战性的、语言条件的、长期的任务。该演示视频可在https://youtu.be/fq25-DjrbQE上找到
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Systems Man Cybernetics-Systems
IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS
CiteScore
18.50
自引率
11.50%
发文量
812
审稿时长
6 months
期刊介绍: The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信