Rong Zhou, Zhisheng Zhang, Kunyyu Peng, Yang Mi, Xiangsheng Huang
{"title":"Humanoid action imitation learning via boosting sample DQN in virtual demonstrator environment","authors":"Rong Zhou, Zhisheng Zhang, Kunyyu Peng, Yang Mi, Xiangsheng Huang","doi":"10.1109/M2VIP.2016.7827324","DOIUrl":null,"url":null,"abstract":"With the growth of modern industrial automation, autonomous-learning applied in the field of robot has aroused considerable attentions of researchers. However, those existing learning methods typically require mass among of training set, increasing the difficulty of collecting samples which is time-consuming, while the validity of samples might be divergent greatly, and thus the training efficiency is limited. Simultaneously, the reinforcement learning used in the system was based on the hypothesis that each action in the sequence contribute equally to the consequence, which is not corresponding to the common rules. In this paper, we propose a method, boosting sample DQN, to optimize the validity of training sample set. Inspired by boosting method, by extracting samples from replay memory hierarchically based on statistical results, the efficiency of network training is improved. Our algorithm, which has a small count of parameters, has been transplanted to the dual-arm robot system successfully. This approach learns a set of trajectories for the action of reaching and grabbing target objects using real-time models obtained by interactively wearable sensing equipment. And also, solution was proposed to distinguish weights of different actions. Our method has proved to be adaptive in learning complicated tasks, including grabbing bottle within its scope, as we presented in the paper.","PeriodicalId":125468,"journal":{"name":"2016 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/M2VIP.2016.7827324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With the growth of modern industrial automation, autonomous-learning applied in the field of robot has aroused considerable attentions of researchers. However, those existing learning methods typically require mass among of training set, increasing the difficulty of collecting samples which is time-consuming, while the validity of samples might be divergent greatly, and thus the training efficiency is limited. Simultaneously, the reinforcement learning used in the system was based on the hypothesis that each action in the sequence contribute equally to the consequence, which is not corresponding to the common rules. In this paper, we propose a method, boosting sample DQN, to optimize the validity of training sample set. Inspired by boosting method, by extracting samples from replay memory hierarchically based on statistical results, the efficiency of network training is improved. Our algorithm, which has a small count of parameters, has been transplanted to the dual-arm robot system successfully. This approach learns a set of trajectories for the action of reaching and grabbing target objects using real-time models obtained by interactively wearable sensing equipment. And also, solution was proposed to distinguish weights of different actions. Our method has proved to be adaptive in learning complicated tasks, including grabbing bottle within its scope, as we presented in the paper.