i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

Saminda Abeyruwan, L. Graesser, David B. D'Ambrosio, Avi Singh, A. Shankar, A. Bewley, Deepali Jain, K. Choromanski, P. Sanketi
{"title":"i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops","authors":"Saminda Abeyruwan, L. Graesser, David B. D'Ambrosio, Avi Singh, A. Shankar, A. Bewley, Deepali Jain, K. Choromanski, P. Sanketi","doi":"10.48550/arXiv.2207.06572","DOIUrl":null,"url":null,"abstract":"Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, our goal is to leverage the power of simulation to train robotic policies that are proficient at interacting with humans upon deployment. But there is a chicken and egg problem -- how to gather examples of a human interacting with a physical robot so as to model human behavior in simulation without already having a robot that is able to interact with a human? Our proposed method, Iterative-Sim-to-Real (i-S2R), attempts to address this. i-S2R bootstraps from a simple model of human behavior and alternates between training in simulation and deploying in the real world. In each iteration, both the human behavior model and the policy are refined. For all training we apply a new evolutionary search algorithm called Blackbox Gradient Sensing (BGS). We evaluate our method on a real world robotic table tennis setting, where the objective for the robot is to play cooperatively with a human player for as long as possible. Table tennis is a high-speed, dynamic task that requires the two players to react quickly to each other's moves, making for a challenging test bed for research on human-robot interaction. We present results on an industrial robotic arm that is able to cooperatively play table tennis with human players, achieving rallies of 22 successive hits on average and 150 at best. Further, for 80% of players, rally lengths are 70% to 175% longer compared to the sim-to-real plus fine-tuning (S2R+FT) baseline. For videos of our system in action, please see https://sites.google.com/view/is2r.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.06572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, our goal is to leverage the power of simulation to train robotic policies that are proficient at interacting with humans upon deployment. But there is a chicken and egg problem -- how to gather examples of a human interacting with a physical robot so as to model human behavior in simulation without already having a robot that is able to interact with a human? Our proposed method, Iterative-Sim-to-Real (i-S2R), attempts to address this. i-S2R bootstraps from a simple model of human behavior and alternates between training in simulation and deploying in the real world. In each iteration, both the human behavior model and the policy are refined. For all training we apply a new evolutionary search algorithm called Blackbox Gradient Sensing (BGS). We evaluate our method on a real world robotic table tennis setting, where the objective for the robot is to play cooperatively with a human player for as long as possible. Table tennis is a high-speed, dynamic task that requires the two players to react quickly to each other's moves, making for a challenging test bed for research on human-robot interaction. We present results on an industrial robotic arm that is able to cooperatively play table tennis with human players, achieving rallies of 22 successive hits on average and 150 at best. Further, for 80% of players, rally lengths are 70% to 175% longer compared to the sim-to-real plus fine-tuning (S2R+FT) baseline. For videos of our system in action, please see https://sites.google.com/view/is2r.
i-Sim2Real:紧密人机交互循环中机器人策略的强化学习
模拟到真实的迁移是机器人强化学习的一个强大范例。在模拟中训练策略的能力使安全探索和低成本快速大规模数据收集成为可能。然而,先前在机器人策略的模拟到真实迁移方面的工作通常不涉及任何人机交互,因为准确模拟人类行为是一个开放的问题。在这项工作中,我们的目标是利用模拟的力量来训练机器人策略,这些策略在部署时能够熟练地与人类交互。但这是一个先有鸡还是先有蛋的问题——在没有能够与人类互动的机器人的情况下,如何收集人类与物理机器人互动的例子,从而在模拟中模拟人类的行为?我们提出的方法,迭代模拟到真实(i-S2R),试图解决这个问题。i-S2R从一个简单的人类行为模型出发,在模拟训练和在现实世界中部署之间交替进行。在每次迭代中,人类行为模型和策略都得到了改进。对于所有的训练,我们应用了一种新的进化搜索算法,称为黑盒梯度传感(BGS)。我们在真实世界的机器人乒乓球设置中评估了我们的方法,其中机器人的目标是尽可能长时间地与人类球员合作。乒乓球是一项高速、动态的运动,需要两名选手对对方的动作做出快速反应,这为研究人机交互提供了一个具有挑战性的试验台。我们展示了一个工业机械臂的结果,它能够与人类运动员合作打乒乓球,平均连续击球22次,最多150次。此外,对于80%的玩家来说,与模拟到真实+微调(S2R+FT)基线相比,拉力赛长度要长70%至175%。有关我们系统运行的视频,请访问https://sites.google.com/view/is2r。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信