Humanoid action imitation learning via boosting sample DQN in virtual demonstrator environment

2016 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP) Pub Date : 2016-11-01 DOI:10.1109/M2VIP.2016.7827324

Rong Zhou, Zhisheng Zhang, Kunyyu Peng, Yang Mi, Xiangsheng Huang

{"title":"Humanoid action imitation learning via boosting sample DQN in virtual demonstrator environment","authors":"Rong Zhou, Zhisheng Zhang, Kunyyu Peng, Yang Mi, Xiangsheng Huang","doi":"10.1109/M2VIP.2016.7827324","DOIUrl":null,"url":null,"abstract":"With the growth of modern industrial automation, autonomous-learning applied in the field of robot has aroused considerable attentions of researchers. However, those existing learning methods typically require mass among of training set, increasing the difficulty of collecting samples which is time-consuming, while the validity of samples might be divergent greatly, and thus the training efficiency is limited. Simultaneously, the reinforcement learning used in the system was based on the hypothesis that each action in the sequence contribute equally to the consequence, which is not corresponding to the common rules. In this paper, we propose a method, boosting sample DQN, to optimize the validity of training sample set. Inspired by boosting method, by extracting samples from replay memory hierarchically based on statistical results, the efficiency of network training is improved. Our algorithm, which has a small count of parameters, has been transplanted to the dual-arm robot system successfully. This approach learns a set of trajectories for the action of reaching and grabbing target objects using real-time models obtained by interactively wearable sensing equipment. And also, solution was proposed to distinguish weights of different actions. Our method has proved to be adaptive in learning complicated tasks, including grabbing bottle within its scope, as we presented in the paper.","PeriodicalId":125468,"journal":{"name":"2016 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/M2VIP.2016.7827324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

With the growth of modern industrial automation, autonomous-learning applied in the field of robot has aroused considerable attentions of researchers. However, those existing learning methods typically require mass among of training set, increasing the difficulty of collecting samples which is time-consuming, while the validity of samples might be divergent greatly, and thus the training efficiency is limited. Simultaneously, the reinforcement learning used in the system was based on the hypothesis that each action in the sequence contribute equally to the consequence, which is not corresponding to the common rules. In this paper, we propose a method, boosting sample DQN, to optimize the validity of training sample set. Inspired by boosting method, by extracting samples from replay memory hierarchically based on statistical results, the efficiency of network training is improved. Our algorithm, which has a small count of parameters, has been transplanted to the dual-arm robot system successfully. This approach learns a set of trajectories for the action of reaching and grabbing target objects using real-time models obtained by interactively wearable sensing equipment. And also, solution was proposed to distinguish weights of different actions. Our method has proved to be adaptive in learning complicated tasks, including grabbing bottle within its scope, as we presented in the paper.

查看原文本刊更多论文

虚拟演示环境下基于增强样本DQN的人形动作模仿学习

随着现代工业自动化的发展，自主学习在机器人领域的应用引起了研究者的广泛关注。然而，现有的学习方法通常需要大量的训练集，增加了样本收集的难度和时间，同时样本的有效性可能会有很大的分歧，从而限制了训练效率。同时，系统中使用的强化学习是基于序列中每个动作对结果的贡献相等的假设，这与通用规则不一致。本文提出了一种提高样本DQN的方法来优化训练样本集的有效性。受boosting方法的启发，基于统计结果从重放存储器中分层提取样本，提高了网络训练效率。该算法参数数量少，已成功移植到双臂机器人系统中。该方法利用交互式可穿戴传感设备获得的实时模型，学习一组达到和抓取目标物体的动作轨迹。同时，提出了区分不同动作权值的方法。我们的方法已被证明在学习复杂的任务中是自适应的，包括在其范围内抓取瓶子，正如我们在论文中提出的那样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP)

自引率

0.00%

发文量