StructDiffusion:语言引导下使用看不见的对象创建物理有效的结构

Robotics: Science and Systems XIX Pub Date : 2022-11-08 DOI:10.15607/RSS.2023.XIX.031

Weiyu Liu, Yilun Du, Tucker Hermans, S. Chernova, Chris Paxton

{"title":"StructDiffusion:语言引导下使用看不见的对象创建物理有效的结构","authors":"Weiyu Liu, Yilun Du, Tucker Hermans, S. Chernova, Chris Paxton","doi":"10.15607/RSS.2023.XIX.031","DOIUrl":null,"url":null,"abstract":"Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as\"set the table\". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects\",\"authors\":\"Weiyu Liu, Yilun Du, Tucker Hermans, S. Chernova, Chris Paxton\",\"doi\":\"10.15607/RSS.2023.XIX.031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as\\\"set the table\\\". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.\",\"PeriodicalId\":248720,\"journal\":{\"name\":\"Robotics: Science and Systems XIX\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics: Science and Systems XIX\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15607/RSS.2023.XIX.031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XIX","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/RSS.2023.XIX.031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

在人类环境中工作的机器人必须能够将物体重新排列成语义上有意义的配置，即使这些物体以前是看不见的。在这项工作中，我们专注于在没有逐步指导的情况下建造物理有效结构的问题。我们提出了StructDiffusion，它结合了扩散模型和以对象为中心的转换器来构建给定部分视图点云和高级语言目标(如“设置表格”)的结构。我们的方法可以使用一个模型执行多个具有挑战性的语言条件的多步骤3D规划任务。StructDiffusion甚至提高了从看不见的物体中组装物理有效结构的成功率，比现有的针对特定结构训练的多模态变压器模型平均提高了16%。我们展示了在模拟和现实世界的重排任务中放置对象的实验。重要的是，我们展示了如何集成扩散模型和碰撞鉴别器模型，以便在重新排列以前看不见的对象时，比其他方法更好地进行泛化。有关视频和其他结果，请参阅我们的网站:https://structdiffusion.github.io/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as"set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics: Science and Systems XIX

自引率

0.00%

发文量