Proximal policy optimization learning based control of congested freeway traffic

Shurong Mo, Nailong Wu, Jie Qi, Anqi Pan, Zhiguang Feng, Huaicheng Yan, Yueying Wang
{"title":"Proximal policy optimization learning based control of congested freeway traffic","authors":"Shurong Mo, Nailong Wu, Jie Qi, Anqi Pan, Zhiguang Feng, Huaicheng Yan, Yueying Wang","doi":"10.1002/oca.3068","DOIUrl":null,"url":null,"abstract":"Abstract In this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high‐speed traffic flow is characterized by a two‐by‐two Aw Rasle Zhang nonlinear first‐order partial differential equations (PDEs). Unlike the backstepping delay compensation control, 23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise.","PeriodicalId":105945,"journal":{"name":"Optimal Control Applications and Methods","volume":"20 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimal Control Applications and Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oca.3068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract In this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high‐speed traffic flow is characterized by a two‐by‐two Aw Rasle Zhang nonlinear first‐order partial differential equations (PDEs). Unlike the backstepping delay compensation control, 23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise.

Abstract Image

基于最近邻策略优化学习的高速公路交通拥挤控制
摘要本文提出了一种基于强化学习的延迟补偿反馈控制器,通过引入近端策略优化(PPO)方案来调整自适应巡航控制(ACC)车辆智能体在交通拥堵中的时间间隔。高速交通流用2 × 2 Aw - Rasle - Zhang非线性一阶偏微分方程(PDEs)表征。与后退延迟补偿控制不同,23本文提出的PPO控制器由当前交通流速度、当前交通流密度和前一步控制输入组成。由于交通流的系统动力学难以用数学表达,因此可以通过学习PPO与交通系统数字模拟器之间的相互作用来确定三种反馈的控制增益。通过数值仿真比较了李亚普诺夫控制、反步控制和PPO控制的性能。结果表明,对于无延迟交通系统,PPO控制在收敛速度和控制力度方面都优于Lyapunov控制。对于输入延迟值不稳定的交通系统,PPO控制器的性能也与后退控制器相当。此外,当参数对高斯噪声敏感时,PPO比反步控制器具有更强的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信