扩展数据驱动机器人与奖励草图和批强化学习

Robotics: Science and Systems XVI Pub Date : 2019-09-26 DOI:10.15607/rss.2020.xvi.076

Serkan Cabi, Sergio Gomez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott E. Reed, Rae Jeong, Konrad Zolna, Y. Aytar, D. Budden, Mel Vecerík, Oleg O. Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, Ziyun Wang

{"title":"扩展数据驱动机器人与奖励草图和批强化学习","authors":"Serkan Cabi, Sergio Gomez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott E. Reed, Rae Jeong, Konrad Zolna, Y. Aytar, D. Budden, Mel Vecerík, Oleg O. Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, Ziyun Wang","doi":"10.15607/rss.2020.xvi.076","DOIUrl":null,"url":null,"abstract":"We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.","PeriodicalId":231005,"journal":{"name":"Robotics: Science and Systems XVI","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"109","resultStr":"{\"title\":\"Scaling data-driven robotics with reward sketching and batch reinforcement learning\",\"authors\":\"Serkan Cabi, Sergio Gomez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott E. Reed, Rae Jeong, Konrad Zolna, Y. Aytar, D. Budden, Mel Vecerík, Oleg O. Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, Ziyun Wang\",\"doi\":\"10.15607/rss.2020.xvi.076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.\",\"PeriodicalId\":231005,\"journal\":{\"name\":\"Robotics: Science and Systems XVI\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"109\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics: Science and Systems XVI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15607/rss.2020.xvi.076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XVI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/rss.2020.xvi.076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 109

摘要

我们提出了一个数据驱动机器人的框架，该框架利用记录机器人经验的大型数据集，并使用学习奖励函数扩展到多个任务。我们展示了如何应用该框架在真实机器人平台上完成三种不同的对象操作任务。给定一个任务的演示以及与任务无关的记录经验，我们使用一种特殊形式的人类注释作为监督来学习奖励函数，这使我们能够处理无法直接获得奖励信号的现实世界任务。学习奖励与来自不同任务的大量经验数据集结合使用，通过批处理强化学习离线学习机器人策略。我们表明，使用我们的方法可以训练代理执行各种具有挑战性的操作任务，包括堆叠刚性物体和处理布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scaling data-driven robotics with reward sketching and batch reinforcement learning

We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics: Science and Systems XVI

自引率

0.00%

发文量