A New Optimal Design of Stable Feedback Control of Two-Wheel System Based on Reinforcement Learning

Zhenghong Yu, Xuebin Zhu
{"title":"A New Optimal Design of Stable Feedback Control of Two-Wheel System Based on Reinforcement Learning","authors":"Zhenghong Yu, Xuebin Zhu","doi":"10.4271/13-05-01-0004","DOIUrl":null,"url":null,"abstract":"The two-wheel system design is widely used in various mobile tools, such as remote-control vehicles and robots, due to its simplicity and stability. However, the specific wheel and body models in the real world can be complex, and the control accuracy of existing algorithms may not meet practical requirements. To address this issue, we propose a double inverted pendulum on mobile device (DIPM) model to improve control performances and reduce calculations. The model is based on the kinetic and potential energy of the DIPM system, known as the Euler-Lagrange equation, and is composed of three second-order nonlinear differential equations derived by specifying Lagrange. We also propose a stable feedback control method for mobile device drive systems. Our experiments compare several mainstream reinforcement learning (RL) methods, including linear quadratic regulator (LQR) and iterative linear quadratic regulator (ILQR), as well as Q-learning, SARSA, DQN (Deep Q Network), and AC. The simulation results demonstrate that the DQN and AC methods are superior to ILQR in our designed nonlinear system. In all aspects of the test, the performance of Q-learning and SARSA is comparable to that of ILQR, with some slight improvements. However, ILQR shows its advantages at 10 deg and 20 deg. In the small deflection (between 5 and 10 deg), the DQN and AC methods perform 2% better than the traditional ILQR, and in the large deflection (10–30 deg), the DQN and AC methods perform 15% better than the traditional ILQR. Overall, RL not only has the advantages of strong versatility, wide application range, and parameter customization but also greatly reduces the difficulty of control system design and human investment, making it a promising field for future research.","PeriodicalId":181105,"journal":{"name":"SAE International Journal of Sustainable Transportation, Energy, Environment, & Policy","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAE International Journal of Sustainable Transportation, Energy, Environment, & Policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4271/13-05-01-0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The two-wheel system design is widely used in various mobile tools, such as remote-control vehicles and robots, due to its simplicity and stability. However, the specific wheel and body models in the real world can be complex, and the control accuracy of existing algorithms may not meet practical requirements. To address this issue, we propose a double inverted pendulum on mobile device (DIPM) model to improve control performances and reduce calculations. The model is based on the kinetic and potential energy of the DIPM system, known as the Euler-Lagrange equation, and is composed of three second-order nonlinear differential equations derived by specifying Lagrange. We also propose a stable feedback control method for mobile device drive systems. Our experiments compare several mainstream reinforcement learning (RL) methods, including linear quadratic regulator (LQR) and iterative linear quadratic regulator (ILQR), as well as Q-learning, SARSA, DQN (Deep Q Network), and AC. The simulation results demonstrate that the DQN and AC methods are superior to ILQR in our designed nonlinear system. In all aspects of the test, the performance of Q-learning and SARSA is comparable to that of ILQR, with some slight improvements. However, ILQR shows its advantages at 10 deg and 20 deg. In the small deflection (between 5 and 10 deg), the DQN and AC methods perform 2% better than the traditional ILQR, and in the large deflection (10–30 deg), the DQN and AC methods perform 15% better than the traditional ILQR. Overall, RL not only has the advantages of strong versatility, wide application range, and parameter customization but also greatly reduces the difficulty of control system design and human investment, making it a promising field for future research.
基于强化学习的两轮系统稳定反馈控制新优化设计
两轮系统设计由于其简单和稳定,被广泛应用于各种移动工具,如遥控车辆和机器人。然而,现实世界中具体的车轮和车身模型可能很复杂,现有算法的控制精度可能无法满足实际要求。为了解决这个问题,我们提出了一个移动设备上的双倒立摆(DIPM)模型,以提高控制性能并减少计算量。该模型基于DIPM系统的动能和势能,称为欧拉-拉格朗日方程,由三个指定拉格朗日导出的二阶非线性微分方程组成。针对移动设备驱动系统,提出了一种稳定的反馈控制方法。我们的实验比较了几种主流的强化学习(RL)方法,包括线性二次调节器(LQR)和迭代线性二次调节器(ILQR),以及Q-learning、SARSA、DQN (Deep Q Network)和AC。仿真结果表明,在我们设计的非线性系统中,DQN和AC方法优于ILQR。在测试的各个方面,Q-learning和SARSA的性能与ILQR相当,并有轻微的改进。然而,在10°和20°时,ILQR显示出其优势。在小挠度(5 ~ 10°)时,DQN和AC方法比传统ILQR性能提高2%,在大挠度(10 ~ 30°)时,DQN和AC方法比传统ILQR性能提高15%。总体而言,强化学习不仅具有通用性强、适用范围广、参数可定制等优点,而且大大降低了控制系统的设计难度和人力投入,是未来研究的一个很有前景的领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信