机器人抓取加速学习的深度q网络初始化方法

2020 IEEE International Conference on Networking, Sensing and Control (ICNSC) Pub Date : 2020-10-30 DOI:10.1109/ICNSC48988.2020.9238061

Yanxu Hou, Jun Li, Zihan Fang, Xuechao Zhang

{"title":"机器人抓取加速学习的深度q网络初始化方法","authors":"Yanxu Hou, Jun Li, Zihan Fang, Xuechao Zhang","doi":"10.1109/ICNSC48988.2020.9238061","DOIUrl":null,"url":null,"abstract":"Generally, self-supervised learning of robotic grasp utilizes a model-free Reinforcement Learning method, e.g., a Deep Q-network (DQN). A DQN makes use of a high-dimensional Q-network to infer dense pixel-wise probability maps of affordances for grasping actions. Unfortunately, it usually leads to a time-consuming training process. Inspired by the initialization thought of optimization algorithms, we propose a method of initialization for accelerating self-supervised learning of robotic grasp. It pre-trains the Q-network by the supervised learning of affordance maps before the robotic grasp training. When applying the pre-trained Q-network a robot can be trained through self-supervised trial-and-error in a purposeful style to avoid meaningless grasping in empty regions. The Q-network is pre-trained by supervised learning on a small dataset with coarse-grained labels. We test the proposed method with Mean Square Error, Smooth L1, and Kullback-Leibler Divergence (KLD) as loss functions in the pre-training phase. The results indicate that the KLD loss function can predict accurately affordances with less noise in the empty regions. Also, our method is able to accelerate the self-supervised learning significantly in the early stage and shows little relevance to the sparsity of objects in the workspace.","PeriodicalId":412290,"journal":{"name":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Initialization Method of Deep Q-network for Learning Acceleration of Robotic Grasp\",\"authors\":\"Yanxu Hou, Jun Li, Zihan Fang, Xuechao Zhang\",\"doi\":\"10.1109/ICNSC48988.2020.9238061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generally, self-supervised learning of robotic grasp utilizes a model-free Reinforcement Learning method, e.g., a Deep Q-network (DQN). A DQN makes use of a high-dimensional Q-network to infer dense pixel-wise probability maps of affordances for grasping actions. Unfortunately, it usually leads to a time-consuming training process. Inspired by the initialization thought of optimization algorithms, we propose a method of initialization for accelerating self-supervised learning of robotic grasp. It pre-trains the Q-network by the supervised learning of affordance maps before the robotic grasp training. When applying the pre-trained Q-network a robot can be trained through self-supervised trial-and-error in a purposeful style to avoid meaningless grasping in empty regions. The Q-network is pre-trained by supervised learning on a small dataset with coarse-grained labels. We test the proposed method with Mean Square Error, Smooth L1, and Kullback-Leibler Divergence (KLD) as loss functions in the pre-training phase. The results indicate that the KLD loss function can predict accurately affordances with less noise in the empty regions. Also, our method is able to accelerate the self-supervised learning significantly in the early stage and shows little relevance to the sparsity of objects in the workspace.\",\"PeriodicalId\":412290,\"journal\":{\"name\":\"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNSC48988.2020.9238061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC48988.2020.9238061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

一般来说，机器人抓取的自监督学习采用无模型强化学习方法，例如Deep Q-network (DQN)。DQN利用高维q网络来推断抓取动作的可视性的密集逐像素概率图。不幸的是，这通常会导致一个耗时的培训过程。受优化算法初始化思想的启发，提出了一种加速机器人抓取自监督学习的初始化方法。在机器人抓握训练之前，通过对可视性图的监督学习对q网络进行预训练。当应用预训练的q网络时，机器人可以通过有目的的自监督试错来训练，以避免在空白区域无意义的抓取。q网络通过监督学习在一个带有粗粒度标签的小数据集上进行预训练。我们在预训练阶段用均方误差、平滑L1和Kullback-Leibler散度(KLD)作为损失函数来测试所提出的方法。结果表明，KLD损失函数可以准确地预测空区域的性能，并且噪声较小。此外，我们的方法能够在早期阶段显著加速自监督学习，并且与工作空间中对象的稀疏性无关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Initialization Method of Deep Q-network for Learning Acceleration of Robotic Grasp

Generally, self-supervised learning of robotic grasp utilizes a model-free Reinforcement Learning method, e.g., a Deep Q-network (DQN). A DQN makes use of a high-dimensional Q-network to infer dense pixel-wise probability maps of affordances for grasping actions. Unfortunately, it usually leads to a time-consuming training process. Inspired by the initialization thought of optimization algorithms, we propose a method of initialization for accelerating self-supervised learning of robotic grasp. It pre-trains the Q-network by the supervised learning of affordance maps before the robotic grasp training. When applying the pre-trained Q-network a robot can be trained through self-supervised trial-and-error in a purposeful style to avoid meaningless grasping in empty regions. The Q-network is pre-trained by supervised learning on a small dataset with coarse-grained labels. We test the proposed method with Mean Square Error, Smooth L1, and Kullback-Leibler Divergence (KLD) as loss functions in the pre-training phase. The results indicate that the KLD loss function can predict accurately affordances with less noise in the empty regions. Also, our method is able to accelerate the self-supervised learning significantly in the early stage and shows little relevance to the sparsity of objects in the workspace.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)

自引率

0.00%

发文量