基于轻量级神经网络结构搜索的火箭自学习控制

Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao
{"title":"基于轻量级神经网络结构搜索的火箭自学习控制","authors":"Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao","doi":"10.1109/ICUS55513.2022.9986957","DOIUrl":null,"url":null,"abstract":"Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.","PeriodicalId":345773,"journal":{"name":"2022 IEEE International Conference on Unmanned Systems (ICUS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rocket Self-learning Control based on Lightweight Neural Network Architecture Search\",\"authors\":\"Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao\",\"doi\":\"10.1109/ICUS55513.2022.9986957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.\",\"PeriodicalId\":345773,\"journal\":{\"name\":\"2022 IEEE International Conference on Unmanned Systems (ICUS)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Unmanned Systems (ICUS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICUS55513.2022.9986957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Unmanned Systems (ICUS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUS55513.2022.9986957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

针对传统控制律设计过程复杂且严重依赖精确数学模型的问题,采用深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)强化学习方法实现连续运动控制律的自学习。然而,由于DDPG算法的性能严重依赖于超参数,因此Actor-Critic框架神经网络架构没有明确的设计基础。考虑到强化学习需要大量的计算量,超参数的重复人工试错大大降低了算法的设计效率,增加了人工成本。在将网络架构设计问题转化为图拓扑生成问题的基础上,给出了深度强化学习神经网络结构的自动搜索与优化框架,其中基于LSTM递归神经网络的图拓扑生成算法、基于权值共享的深度强化网络参数轻量训练与评价机制、创新地结合了基于策略梯度的图拓扑生成器参数学习算法。从而自动优化DDPG算法中的神经网络超参数,并通过自学习训练得到控制律。最后,以火箭垂直回收控制为例,验证了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Rocket Self-learning Control based on Lightweight Neural Network Architecture Search
Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信