基于参数共享网络结构搜索的自动强化学习

Zhaolei Wang, Jun Zhang, Yue Li, Qinghai Gong, Wuyi Luo, Jikang Zhao
{"title":"基于参数共享网络结构搜索的自动强化学习","authors":"Zhaolei Wang, Jun Zhang, Yue Li, Qinghai Gong, Wuyi Luo, Jikang Zhao","doi":"10.1109/ICRAE53653.2021.9657793","DOIUrl":null,"url":null,"abstract":"The performance of machine learning depends on the choice of hyperparameters to a great extent. Only by choosing the appropriate hyperparameters can we learn the desired learning results. At present, the end-to-end learning algorithm is widely concerned in the academic circles, and realizes the agile design from the demand end to the execution end at the design task level, which can dramatically reduce the complexity of the design. However, there are still a large number of hyperparameters, which need to be tuned manually, increasing the difficulty of machine learning application. Thus, with the continuous development of high-performance parallel computing, automated machine learning method arises. In this paper, aiming at the automatic design of the hyperparameter, the neural network architecture of deep reinforcement learning in the field of motion control, LSTM recurrent neural network topology generation algorithm, parameter sharing based fast reinforcement learning and evaluation mechanism, and graph generator parameter learning algorithm based on policy gradient are combined. An automated search and optimization framework of neural network architecture in the deep reinforcement learning is proposed, realizing the automated generation of network architecture. Finally, the effectiveness of the proposed approach is verified by taking the lunar lander landing control problem as an example.","PeriodicalId":338398,"journal":{"name":"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automated Reinforcement Learning Based on Parameter Sharing Network Architecture Search\",\"authors\":\"Zhaolei Wang, Jun Zhang, Yue Li, Qinghai Gong, Wuyi Luo, Jikang Zhao\",\"doi\":\"10.1109/ICRAE53653.2021.9657793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of machine learning depends on the choice of hyperparameters to a great extent. Only by choosing the appropriate hyperparameters can we learn the desired learning results. At present, the end-to-end learning algorithm is widely concerned in the academic circles, and realizes the agile design from the demand end to the execution end at the design task level, which can dramatically reduce the complexity of the design. However, there are still a large number of hyperparameters, which need to be tuned manually, increasing the difficulty of machine learning application. Thus, with the continuous development of high-performance parallel computing, automated machine learning method arises. In this paper, aiming at the automatic design of the hyperparameter, the neural network architecture of deep reinforcement learning in the field of motion control, LSTM recurrent neural network topology generation algorithm, parameter sharing based fast reinforcement learning and evaluation mechanism, and graph generator parameter learning algorithm based on policy gradient are combined. An automated search and optimization framework of neural network architecture in the deep reinforcement learning is proposed, realizing the automated generation of network architecture. Finally, the effectiveness of the proposed approach is verified by taking the lunar lander landing control problem as an example.\",\"PeriodicalId\":338398,\"journal\":{\"name\":\"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRAE53653.2021.9657793\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAE53653.2021.9657793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

机器学习的性能在很大程度上取决于超参数的选择。只有选择合适的超参数,我们才能得到想要的学习结果。目前,端到端学习算法受到学术界的广泛关注,在设计任务层面实现了从需求端到执行端的敏捷设计,可以大幅降低设计的复杂性。但是,仍然存在大量的超参数,需要人工调优,增加了机器学习应用的难度。因此,随着高性能并行计算的不断发展,自动机器学习方法应运而生。本文针对超参数的自动设计,将运动控制领域深度强化学习的神经网络体系结构、LSTM递归神经网络拓扑生成算法、基于参数共享的快速强化学习与评估机制、基于策略梯度的图生成器参数学习算法相结合。提出了一种深度强化学习中神经网络架构的自动搜索与优化框架,实现了网络架构的自动生成。最后,以月球着陆器着陆控制问题为例,验证了所提方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automated Reinforcement Learning Based on Parameter Sharing Network Architecture Search
The performance of machine learning depends on the choice of hyperparameters to a great extent. Only by choosing the appropriate hyperparameters can we learn the desired learning results. At present, the end-to-end learning algorithm is widely concerned in the academic circles, and realizes the agile design from the demand end to the execution end at the design task level, which can dramatically reduce the complexity of the design. However, there are still a large number of hyperparameters, which need to be tuned manually, increasing the difficulty of machine learning application. Thus, with the continuous development of high-performance parallel computing, automated machine learning method arises. In this paper, aiming at the automatic design of the hyperparameter, the neural network architecture of deep reinforcement learning in the field of motion control, LSTM recurrent neural network topology generation algorithm, parameter sharing based fast reinforcement learning and evaluation mechanism, and graph generator parameter learning algorithm based on policy gradient are combined. An automated search and optimization framework of neural network architecture in the deep reinforcement learning is proposed, realizing the automated generation of network architecture. Finally, the effectiveness of the proposed approach is verified by taking the lunar lander landing control problem as an example.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信