Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao
{"title":"基于轻量级神经网络结构搜索的火箭自学习控制","authors":"Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao","doi":"10.1109/ICUS55513.2022.9986957","DOIUrl":null,"url":null,"abstract":"Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.","PeriodicalId":345773,"journal":{"name":"2022 IEEE International Conference on Unmanned Systems (ICUS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rocket Self-learning Control based on Lightweight Neural Network Architecture Search\",\"authors\":\"Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao\",\"doi\":\"10.1109/ICUS55513.2022.9986957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.\",\"PeriodicalId\":345773,\"journal\":{\"name\":\"2022 IEEE International Conference on Unmanned Systems (ICUS)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Unmanned Systems (ICUS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICUS55513.2022.9986957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Unmanned Systems (ICUS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUS55513.2022.9986957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rocket Self-learning Control based on Lightweight Neural Network Architecture Search
Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.