考虑详细奖励函数和两阶段学习协议的倒立摆平衡控制器设计

Symmetry Pub Date : 2024-09-18 DOI:10.3390/sym16091227
Xiaochen Liu, Sipeng Wang, Xingxing Li, Ze Cui
{"title":"考虑详细奖励函数和两阶段学习协议的倒立摆平衡控制器设计","authors":"Xiaochen Liu, Sipeng Wang, Xingxing Li, Ze Cui","doi":"10.3390/sym16091227","DOIUrl":null,"url":null,"abstract":"As a complex nonlinear system, the inverted pendulum (IP) system has the characteristics of asymmetry and instability. In this paper, the IP system is controlled by a learned deep neural network (DNN) that directly maps the system states to control commands in an end-to-end style. On the basis of deep reinforcement learning (DRL), the detail reward function (DRF) is designed to guide the DNN learning control strategy, which greatly enhances the pertinence and flexibility of the control. Moreover, a two-phase learning protocol (offline learning phase and online learning phase) is proposed to solve the “real gap” problem of the IP system. Firstly, the DNN learns the offline control strategy based on a simplified IP dynamic model and DRF. Then, a security controller is designed and used on the IP platform to optimize the DNN online. The experimental results demonstrate that the DNN has good robustness to model errors after secondary learning on the platform. When the length of the pendulum is reduced by 25% or increased by 25%, the steady-state error of the pendulum angle is less than 0.05 rad. The error is within the allowable range. The DNN is robust to changes in the length of the pendulum. The DRF and the two-phase learning protocol improve the adaptability of the controller to the complex and variable characteristics of the real platform and provide reference for other learning-based robot control problems.","PeriodicalId":501198,"journal":{"name":"Symmetry","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol\",\"authors\":\"Xiaochen Liu, Sipeng Wang, Xingxing Li, Ze Cui\",\"doi\":\"10.3390/sym16091227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a complex nonlinear system, the inverted pendulum (IP) system has the characteristics of asymmetry and instability. In this paper, the IP system is controlled by a learned deep neural network (DNN) that directly maps the system states to control commands in an end-to-end style. On the basis of deep reinforcement learning (DRL), the detail reward function (DRF) is designed to guide the DNN learning control strategy, which greatly enhances the pertinence and flexibility of the control. Moreover, a two-phase learning protocol (offline learning phase and online learning phase) is proposed to solve the “real gap” problem of the IP system. Firstly, the DNN learns the offline control strategy based on a simplified IP dynamic model and DRF. Then, a security controller is designed and used on the IP platform to optimize the DNN online. The experimental results demonstrate that the DNN has good robustness to model errors after secondary learning on the platform. When the length of the pendulum is reduced by 25% or increased by 25%, the steady-state error of the pendulum angle is less than 0.05 rad. The error is within the allowable range. The DNN is robust to changes in the length of the pendulum. The DRF and the two-phase learning protocol improve the adaptability of the controller to the complex and variable characteristics of the real platform and provide reference for other learning-based robot control problems.\",\"PeriodicalId\":501198,\"journal\":{\"name\":\"Symmetry\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symmetry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/sym16091227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symmetry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/sym16091227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

作为一种复杂的非线性系统,倒立摆(IP)系统具有非对称性和不稳定性的特点。本文采用学习型深度神经网络(DNN)控制倒立摆系统,以端到端方式将系统状态直接映射为控制指令。在深度强化学习(DRL)的基础上,设计了细节奖励函数(DRF)来指导 DNN 学习控制策略,大大增强了控制的针对性和灵活性。此外,还提出了两阶段学习协议(离线学习阶段和在线学习阶段)来解决 IP 系统的 "真实差距 "问题。首先,DNN 基于简化的 IP 动态模型和 DRF 学习离线控制策略。然后,在 IP 平台上设计并使用安全控制器,对 DNN 进行在线优化。实验结果表明,在平台上进行二次学习后,DNN 对模型误差具有良好的鲁棒性。当摆锤长度减少 25% 或增加 25% 时,摆锤角度的稳态误差小于 0.05 rad。误差在允许范围内。DNN 对摆长的变化具有鲁棒性。DRF 和两阶段学习协议提高了控制器对实际平台复杂多变特性的适应性,为其他基于学习的机器人控制问题提供了参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol
As a complex nonlinear system, the inverted pendulum (IP) system has the characteristics of asymmetry and instability. In this paper, the IP system is controlled by a learned deep neural network (DNN) that directly maps the system states to control commands in an end-to-end style. On the basis of deep reinforcement learning (DRL), the detail reward function (DRF) is designed to guide the DNN learning control strategy, which greatly enhances the pertinence and flexibility of the control. Moreover, a two-phase learning protocol (offline learning phase and online learning phase) is proposed to solve the “real gap” problem of the IP system. Firstly, the DNN learns the offline control strategy based on a simplified IP dynamic model and DRF. Then, a security controller is designed and used on the IP platform to optimize the DNN online. The experimental results demonstrate that the DNN has good robustness to model errors after secondary learning on the platform. When the length of the pendulum is reduced by 25% or increased by 25%, the steady-state error of the pendulum angle is less than 0.05 rad. The error is within the allowable range. The DNN is robust to changes in the length of the pendulum. The DRF and the two-phase learning protocol improve the adaptability of the controller to the complex and variable characteristics of the real platform and provide reference for other learning-based robot control problems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信