Proximal policy optimization learning based control of congested freeway traffic

Optimal Control Applications and Methods Pub Date : 2023-10-25 DOI:10.1002/oca.3068

Shurong Mo, Nailong Wu, Jie Qi, Anqi Pan, Zhiguang Feng, Huaicheng Yan, Yueying Wang

{"title":"Proximal policy optimization learning based control of congested freeway traffic","authors":"Shurong Mo, Nailong Wu, Jie Qi, Anqi Pan, Zhiguang Feng, Huaicheng Yan, Yueying Wang","doi":"10.1002/oca.3068","DOIUrl":null,"url":null,"abstract":"Abstract In this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high‐speed traffic flow is characterized by a two‐by‐two Aw Rasle Zhang nonlinear first‐order partial differential equations (PDEs). Unlike the backstepping delay compensation control, 23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise.","PeriodicalId":105945,"journal":{"name":"Optimal Control Applications and Methods","volume":"20 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimal Control Applications and Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oca.3068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract In this paper, a delay compensation feedback controller based on reinforcement learning is proposed to adjust the time interval of the adaptive cruise control (ACC) vehicle agents in the traffic congestion by introducing the proximal policy optimization (PPO) scheme. The high‐speed traffic flow is characterized by a two‐by‐two Aw Rasle Zhang nonlinear first‐order partial differential equations (PDEs). Unlike the backstepping delay compensation control, 23 the PPO controller proposed in this paper consists of the current traffic flow velocity, the current traffic flow density and the previous one step control input. Since the system dynamics of the traffic flow are difficult to be expressed mathematically, the control gains of the three feedback can be determined via learning from the interaction between the PPO and the digital simulator of the traffic system. The performance of Lyapunov control, backstepping control and PPO control are compared with numerical simulation. The results demonstrate that PPO control is superior to Lyapunov control in terms of the convergence rate and control efforts for the traffic system without delay. As for the traffic system with unstable input delay value, the performance of PPO controller is also equivalent to that of backstepping controller. Besides, PPO is more robust than backstepping controller when the parameter is sensitive to Gaussian noise.

Abstract Image

查看原文本刊更多论文

基于最近邻策略优化学习的高速公路交通拥挤控制

摘要本文提出了一种基于强化学习的延迟补偿反馈控制器，通过引入近端策略优化(PPO)方案来调整自适应巡航控制(ACC)车辆智能体在交通拥堵中的时间间隔。高速交通流用2 × 2 Aw - Rasle - Zhang非线性一阶偏微分方程(PDEs)表征。与后退延迟补偿控制不同，23本文提出的PPO控制器由当前交通流速度、当前交通流密度和前一步控制输入组成。由于交通流的系统动力学难以用数学表达，因此可以通过学习PPO与交通系统数字模拟器之间的相互作用来确定三种反馈的控制增益。通过数值仿真比较了李亚普诺夫控制、反步控制和PPO控制的性能。结果表明，对于无延迟交通系统，PPO控制在收敛速度和控制力度方面都优于Lyapunov控制。对于输入延迟值不稳定的交通系统，PPO控制器的性能也与后退控制器相当。此外，当参数对高斯噪声敏感时，PPO比反步控制器具有更强的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Optimal Control Applications and Methods

自引率

0.00%

发文量