Improving String Stability in Cooperative Adaptive Cruise Control Through Multiagent Reinforcement Learning With Potential-Driven Motivation

Kun Jiang;Min Hua;Xu He;Lu Dong;Quan Zhou;Hongming Xu;Changyin Sun
{"title":"Improving String Stability in Cooperative Adaptive Cruise Control Through Multiagent Reinforcement Learning With Potential-Driven Motivation","authors":"Kun Jiang;Min Hua;Xu He;Lu Dong;Quan Zhou;Hongming Xu;Changyin Sun","doi":"10.1109/TAI.2024.3511513","DOIUrl":null,"url":null,"abstract":"Cooperative adaptive cruise control (CACC) is regarded as a promising technology for achieving efficient and safe collaboration among connected and automated vehicles (CAVs) in a platoon, and multiagent reinforcement learning (MARL) methods are emerging as an effective approach to implementing the CACC technology. However, most MARL methods do not sufficiently tackle the prevalent string stability problem, even when integrating communication mechanisms to improve agents’ understanding of CACC scenarios. This limitation arises because these methods typically learn communication mechanisms based solely on the information directly observable by the agents, neglecting potentially valuable information present in the environment. In this article, we propose a multiagent actor–critic with a potential-driven motivation (MAACPM) approach, which utilizes variational inference theory to infer the potential motivation representation space in the CACC task, providing a more favorable opportunity for adjusting driving behavior within the platoon. Furthermore, we quantify the specific impact of potential motivation on each vehicle by measuring the difference between policies with and without potential motivation. We then utilize this difference as a potential reward signal to incentivize the agent to grasp effective potential motivation. The proposed method was validated in two typical CACC scenarios, where we compared the performance of our MAACPM algorithm with other state-of-the-art MARL methods to demonstrate its effectiveness. Furthermore, we illustrate potential real-world applications of our method by comparing it with actual vehicle driving data.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1114-1127"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10778266/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cooperative adaptive cruise control (CACC) is regarded as a promising technology for achieving efficient and safe collaboration among connected and automated vehicles (CAVs) in a platoon, and multiagent reinforcement learning (MARL) methods are emerging as an effective approach to implementing the CACC technology. However, most MARL methods do not sufficiently tackle the prevalent string stability problem, even when integrating communication mechanisms to improve agents’ understanding of CACC scenarios. This limitation arises because these methods typically learn communication mechanisms based solely on the information directly observable by the agents, neglecting potentially valuable information present in the environment. In this article, we propose a multiagent actor–critic with a potential-driven motivation (MAACPM) approach, which utilizes variational inference theory to infer the potential motivation representation space in the CACC task, providing a more favorable opportunity for adjusting driving behavior within the platoon. Furthermore, we quantify the specific impact of potential motivation on each vehicle by measuring the difference between policies with and without potential motivation. We then utilize this difference as a potential reward signal to incentivize the agent to grasp effective potential motivation. The proposed method was validated in two typical CACC scenarios, where we compared the performance of our MAACPM algorithm with other state-of-the-art MARL methods to demonstrate its effectiveness. Furthermore, we illustrate potential real-world applications of our method by comparing it with actual vehicle driving data.
基于潜在驱动动机的多智能体强化学习提高协同自适应巡航控制中管柱稳定性
协作式自适应巡航控制(CACC)被认为是一种很有前途的技术,可以实现联网和自动驾驶车辆(cav)之间高效、安全的协作,而多智能体强化学习(MARL)方法正在成为实现CACC技术的有效方法。然而,大多数MARL方法不能充分解决普遍存在的字符串稳定性问题,即使集成了通信机制来提高智能体对CACC场景的理解。出现这种限制是因为这些方法通常仅基于代理直接可观察到的信息来学习通信机制,而忽略了环境中存在的潜在有价值的信息。在本文中,我们提出了一种具有潜在驱动动机的多智能体行为者批评家方法(MAACPM),该方法利用变分推理理论来推断CACC任务中的潜在动机表示空间,为调整排内的驾驶行为提供了更有利的机会。此外,我们通过测量有和没有潜在动机的政策之间的差异,量化了潜在动机对每辆车的具体影响。然后,我们利用这种差异作为潜在的奖励信号来激励代理掌握有效的潜在动机。在两个典型的CACC场景中验证了所提出的方法,我们将MAACPM算法的性能与其他最先进的MARL方法进行了比较,以证明其有效性。此外,我们通过将我们的方法与实际车辆驾驶数据进行比较来说明我们的方法在现实世界中的潜在应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信