面向特定应用机器人控制的硬件加速强化学习

2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2018-07-01 DOI:10.1109/ASAP.2018.8445099

Shengjia Shao, Jason Tsai, Michal Mysior, W. Luk, T. Chau, Alexander Warren, B. Jeppesen

{"title":"面向特定应用机器人控制的硬件加速强化学习","authors":"Shengjia Shao, Jason Tsai, Michal Mysior, W. Luk, T. Chau, Alexander Warren, B. Jeppesen","doi":"10.1109/ASAP.2018.8445099","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its longterm cumulative reward. This paper presents a novel approach which has showon promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.","PeriodicalId":421577,"journal":{"name":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control\",\"authors\":\"Shengjia Shao, Jason Tsai, Michal Mysior, W. Luk, T. Chau, Alexander Warren, B. Jeppesen\",\"doi\":\"10.1109/ASAP.2018.8445099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its longterm cumulative reward. This paper presents a novel approach which has showon promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.\",\"PeriodicalId\":421577,\"journal\":{\"name\":\"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP.2018.8445099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2018.8445099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

强化学习(RL)是机器学习的一个领域，其中代理通过做出顺序决策与环境进行交互。代理根据决策的好坏从环境中获得奖励，并试图找到一个使其长期累积奖励最大化的最优决策策略。本文提出了一种新的方法，该方法在将强化学习策略训练的加速模拟应用于特定应用的真实机械臂的自动化控制方面显示出了希望。该方法分为两个步骤。首先，开发了设计空间探索技术，以增强基于信任域策略优化(TRPO)的RL策略训练的FPGA加速器的性能，该加速器的速度比以前的FPGA实现提高了43%，同时在GPU上运行的深度学习库速度提高了4.65倍，在CPU上运行的速度提高了19.29倍。其次，将训练好的RL策略转移到真实的机械臂上。实验表明，训练后的手臂可以成功地到达并拾取预定的物体，证明了我们的方法的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its longterm cumulative reward. This paper presents a novel approach which has showon promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

自引率

0.00%

发文量