{"title":"基于参数共享网络结构搜索的自动强化学习","authors":"Zhaolei Wang, Jun Zhang, Yue Li, Qinghai Gong, Wuyi Luo, Jikang Zhao","doi":"10.1109/ICRAE53653.2021.9657793","DOIUrl":null,"url":null,"abstract":"The performance of machine learning depends on the choice of hyperparameters to a great extent. Only by choosing the appropriate hyperparameters can we learn the desired learning results. At present, the end-to-end learning algorithm is widely concerned in the academic circles, and realizes the agile design from the demand end to the execution end at the design task level, which can dramatically reduce the complexity of the design. However, there are still a large number of hyperparameters, which need to be tuned manually, increasing the difficulty of machine learning application. Thus, with the continuous development of high-performance parallel computing, automated machine learning method arises. In this paper, aiming at the automatic design of the hyperparameter, the neural network architecture of deep reinforcement learning in the field of motion control, LSTM recurrent neural network topology generation algorithm, parameter sharing based fast reinforcement learning and evaluation mechanism, and graph generator parameter learning algorithm based on policy gradient are combined. An automated search and optimization framework of neural network architecture in the deep reinforcement learning is proposed, realizing the automated generation of network architecture. Finally, the effectiveness of the proposed approach is verified by taking the lunar lander landing control problem as an example.","PeriodicalId":338398,"journal":{"name":"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automated Reinforcement Learning Based on Parameter Sharing Network Architecture Search\",\"authors\":\"Zhaolei Wang, Jun Zhang, Yue Li, Qinghai Gong, Wuyi Luo, Jikang Zhao\",\"doi\":\"10.1109/ICRAE53653.2021.9657793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of machine learning depends on the choice of hyperparameters to a great extent. Only by choosing the appropriate hyperparameters can we learn the desired learning results. At present, the end-to-end learning algorithm is widely concerned in the academic circles, and realizes the agile design from the demand end to the execution end at the design task level, which can dramatically reduce the complexity of the design. However, there are still a large number of hyperparameters, which need to be tuned manually, increasing the difficulty of machine learning application. Thus, with the continuous development of high-performance parallel computing, automated machine learning method arises. In this paper, aiming at the automatic design of the hyperparameter, the neural network architecture of deep reinforcement learning in the field of motion control, LSTM recurrent neural network topology generation algorithm, parameter sharing based fast reinforcement learning and evaluation mechanism, and graph generator parameter learning algorithm based on policy gradient are combined. An automated search and optimization framework of neural network architecture in the deep reinforcement learning is proposed, realizing the automated generation of network architecture. Finally, the effectiveness of the proposed approach is verified by taking the lunar lander landing control problem as an example.\",\"PeriodicalId\":338398,\"journal\":{\"name\":\"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRAE53653.2021.9657793\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAE53653.2021.9657793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated Reinforcement Learning Based on Parameter Sharing Network Architecture Search
The performance of machine learning depends on the choice of hyperparameters to a great extent. Only by choosing the appropriate hyperparameters can we learn the desired learning results. At present, the end-to-end learning algorithm is widely concerned in the academic circles, and realizes the agile design from the demand end to the execution end at the design task level, which can dramatically reduce the complexity of the design. However, there are still a large number of hyperparameters, which need to be tuned manually, increasing the difficulty of machine learning application. Thus, with the continuous development of high-performance parallel computing, automated machine learning method arises. In this paper, aiming at the automatic design of the hyperparameter, the neural network architecture of deep reinforcement learning in the field of motion control, LSTM recurrent neural network topology generation algorithm, parameter sharing based fast reinforcement learning and evaluation mechanism, and graph generator parameter learning algorithm based on policy gradient are combined. An automated search and optimization framework of neural network architecture in the deep reinforcement learning is proposed, realizing the automated generation of network architecture. Finally, the effectiveness of the proposed approach is verified by taking the lunar lander landing control problem as an example.