DayDreamer: World Models for Physical Robot Learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, P. Abbeel
{"title":"DayDreamer: World Models for Physical Robot Learning","authors":"Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, P. Abbeel","doi":"10.48550/arXiv.2206.14176","DOIUrl":null,"url":null,"abstract":"To solve tasks in complex environments, robots need to learn from experience. Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn, limiting its deployment in the physical world. As a consequence, many advances in robot learning rely on simulators. On the other hand, learning inside of simulators fails to capture the complexity of the real world, is prone to simulator inaccuracies, and the resulting behaviors do not adapt to changes in the world. The Dreamer algorithm has recently shown great promise for learning from small amounts of interaction by planning within a learned world model, outperforming pure reinforcement learning in video games. Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment. However, it is unknown whether Dreamer can facilitate faster learning on physical robots. In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators. Dreamer trains a quadruped robot to roll off its back, stand up, and walk from scratch and without resets in only 1 hour. We then push the robot and find that Dreamer adapts within 10 minutes to withstand perturbations or quickly roll over and stand back up. On two different robotic arms, Dreamer learns to pick and place multiple objects directly from camera images and sparse rewards, approaching human performance. On a wheeled robot, Dreamer learns to navigate to a goal position purely from camera images, automatically resolving ambiguity about the robot orientation. Using the same hyperparameters across all experiments, we find that Dreamer is capable of online learning in the real world, establishing a strong baseline. We release our infrastructure for future applications of world models to robot learning.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"6 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"105","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.14176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 105

Abstract

To solve tasks in complex environments, robots need to learn from experience. Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn, limiting its deployment in the physical world. As a consequence, many advances in robot learning rely on simulators. On the other hand, learning inside of simulators fails to capture the complexity of the real world, is prone to simulator inaccuracies, and the resulting behaviors do not adapt to changes in the world. The Dreamer algorithm has recently shown great promise for learning from small amounts of interaction by planning within a learned world model, outperforming pure reinforcement learning in video games. Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment. However, it is unknown whether Dreamer can facilitate faster learning on physical robots. In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators. Dreamer trains a quadruped robot to roll off its back, stand up, and walk from scratch and without resets in only 1 hour. We then push the robot and find that Dreamer adapts within 10 minutes to withstand perturbations or quickly roll over and stand back up. On two different robotic arms, Dreamer learns to pick and place multiple objects directly from camera images and sparse rewards, approaching human performance. On a wheeled robot, Dreamer learns to navigate to a goal position purely from camera images, automatically resolving ambiguity about the robot orientation. Using the same hyperparameters across all experiments, we find that Dreamer is capable of online learning in the real world, establishing a strong baseline. We release our infrastructure for future applications of world models to robot learning.
白日梦者:物理机器人学习的世界模型
为了解决复杂环境中的任务,机器人需要从经验中学习。深度强化学习是一种常见的机器人学习方法,但需要大量的试验和错误来学习,限制了它在物理世界中的部署。因此,机器人学习的许多进步都依赖于模拟器。另一方面,模拟器内部的学习无法捕捉到真实世界的复杂性,容易出现模拟器不准确的情况,产生的行为不能适应世界的变化。“梦想者”算法最近显示出了巨大的前景,它可以通过在一个学习世界模型中进行规划,从少量的交互中学习,在电子游戏中表现优于纯粹的强化学习。学习一个世界模型来预测潜在行动的结果,可以在想象中进行规划,减少在真实环境中需要的试验和错误。然而,尚不清楚“梦想者”是否能促进物理机器人更快地学习。在本文中,我们将梦想者应用于4个机器人,在没有模拟器的情况下在线和直接在现实世界中学习。“梦想家”训练一个四足机器人在1小时内从背上滚下来,站起来,从零开始走路,而且没有重置。然后我们推了推机器人,发现“梦想者”在10分钟内就能适应各种扰动,或者迅速翻身并站起来。在两个不同的机械臂上,“梦想者”学会直接从相机图像和稀疏的奖励中挑选和放置多个物体,接近人类的表现。在轮式机器人上,“梦想者”学会了完全从相机图像中导航到目标位置,自动解决机器人方向的模糊性。在所有实验中使用相同的超参数,我们发现梦想者能够在现实世界中进行在线学习,建立了一个强大的基线。我们发布了我们的基础设施,用于未来世界模型在机器人学习中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信