TeachMe: Three-phase learning framework for robotic motion imitation based on interactive teaching and reinforcement learning

2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Pub Date : 2019-10-01 DOI:10.1109/RO-MAN46459.2019.8956326

Taewoo Kim, Joo-Haeng Lee

{"title":"TeachMe: Three-phase learning framework for robotic motion imitation based on interactive teaching and reinforcement learning","authors":"Taewoo Kim, Joo-Haeng Lee","doi":"10.1109/RO-MAN46459.2019.8956326","DOIUrl":null,"url":null,"abstract":"Motion imitation is a fundamental communication skill for a robot; especially, as a nonverbal interaction with a human. Owing to kinematic configuration differences between the human and the robot, it is challenging to determine the appropriate mapping between the two pose domains. Moreover, technical limitations while extracting 3D motion details, such as wrist joint movements from human motion videos, results in significant challenges in motion retargeting. Explicit mapping over different motion domains indicates a considerably inefficient solution. To solve these problems, we propose a three-phase reinforcement learning scheme to enable a NAO robot to learn motions from human pose skeletons extracted from video inputs. Our learning scheme consists of three phases: (i) phase one for learning preparation, (ii) phase two for a simulation-based reinforcement learning, and (iii) phase three for a human-in-the-loop-based reinforcement learning. In phase one, embeddings of the motions of a human skeleton and robot are learned by an autoencoder. In phase two, the NAO robot learns a rough imitation skill using reinforcement learning that translates the learned embeddings. In the last phase, the robot learns motion details that were not considered in the previous phases by interactively setting rewards based on direct teaching instead of the method used in the previous phase. Especially, it is to be noted that a relatively smaller number of interactive inputs are required for motion details in phase three when compared to the large volume of training sets required for overall imitation in phase two. The experimental results demonstrate that the proposed method improves the imitation skills efficiently for hand waving and saluting motions obtained from NTU-DB.","PeriodicalId":286478,"journal":{"name":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN46459.2019.8956326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Motion imitation is a fundamental communication skill for a robot; especially, as a nonverbal interaction with a human. Owing to kinematic configuration differences between the human and the robot, it is challenging to determine the appropriate mapping between the two pose domains. Moreover, technical limitations while extracting 3D motion details, such as wrist joint movements from human motion videos, results in significant challenges in motion retargeting. Explicit mapping over different motion domains indicates a considerably inefficient solution. To solve these problems, we propose a three-phase reinforcement learning scheme to enable a NAO robot to learn motions from human pose skeletons extracted from video inputs. Our learning scheme consists of three phases: (i) phase one for learning preparation, (ii) phase two for a simulation-based reinforcement learning, and (iii) phase three for a human-in-the-loop-based reinforcement learning. In phase one, embeddings of the motions of a human skeleton and robot are learned by an autoencoder. In phase two, the NAO robot learns a rough imitation skill using reinforcement learning that translates the learned embeddings. In the last phase, the robot learns motion details that were not considered in the previous phases by interactively setting rewards based on direct teaching instead of the method used in the previous phase. Especially, it is to be noted that a relatively smaller number of interactive inputs are required for motion details in phase three when compared to the large volume of training sets required for overall imitation in phase two. The experimental results demonstrate that the proposed method improves the imitation skills efficiently for hand waving and saluting motions obtained from NTU-DB.

查看原文本刊更多论文

TeachMe:基于交互式教学和强化学习的机器人运动模仿三相学习框架

动作模仿是机器人的基本沟通技能;尤其是与人类的非语言互动。由于人与机器人的运动学构型存在差异，确定两种姿态域之间的适当映射具有挑战性。此外，在从人体运动视频中提取3D运动细节(如手腕关节运动)时，技术限制导致运动重定向面临重大挑战。在不同运动域上的显式映射表明了一个相当低效的解决方案。为了解决这些问题，我们提出了一种三阶段强化学习方案，使NAO机器人能够从从视频输入中提取的人体姿态骨架中学习运动。我们的学习方案包括三个阶段:(i)第一阶段用于学习准备，(ii)第二阶段用于基于模拟的强化学习，(iii)第三阶段用于基于人在环的强化学习。在第一阶段，人类骨骼和机器人运动的嵌入由自动编码器学习。在第二阶段，NAO机器人使用强化学习来学习粗略的模仿技能，该强化学习可以翻译学习到的嵌入。在最后一个阶段，机器人通过基于直接教学的交互式设置奖励来学习前一阶段没有考虑到的运动细节，而不是前一阶段使用的方法。特别值得注意的是，与第二阶段整体模仿所需的大量训练集相比，第三阶段运动细节所需的交互式输入数量相对较少。实验结果表明，该方法有效地提高了对NTU-DB中手势和敬礼动作的模仿能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

自引率

0.00%

发文量