Kun Jiang;Min Hua;Xu He;Lu Dong;Quan Zhou;Hongming Xu;Changyin Sun
{"title":"Improving String Stability in Cooperative Adaptive Cruise Control Through Multiagent Reinforcement Learning With Potential-Driven Motivation","authors":"Kun Jiang;Min Hua;Xu He;Lu Dong;Quan Zhou;Hongming Xu;Changyin Sun","doi":"10.1109/TAI.2024.3511513","DOIUrl":null,"url":null,"abstract":"Cooperative adaptive cruise control (CACC) is regarded as a promising technology for achieving efficient and safe collaboration among connected and automated vehicles (CAVs) in a platoon, and multiagent reinforcement learning (MARL) methods are emerging as an effective approach to implementing the CACC technology. However, most MARL methods do not sufficiently tackle the prevalent string stability problem, even when integrating communication mechanisms to improve agents’ understanding of CACC scenarios. This limitation arises because these methods typically learn communication mechanisms based solely on the information directly observable by the agents, neglecting potentially valuable information present in the environment. In this article, we propose a multiagent actor–critic with a potential-driven motivation (MAACPM) approach, which utilizes variational inference theory to infer the potential motivation representation space in the CACC task, providing a more favorable opportunity for adjusting driving behavior within the platoon. Furthermore, we quantify the specific impact of potential motivation on each vehicle by measuring the difference between policies with and without potential motivation. We then utilize this difference as a potential reward signal to incentivize the agent to grasp effective potential motivation. The proposed method was validated in two typical CACC scenarios, where we compared the performance of our MAACPM algorithm with other state-of-the-art MARL methods to demonstrate its effectiveness. Furthermore, we illustrate potential real-world applications of our method by comparing it with actual vehicle driving data.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1114-1127"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10778266/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cooperative adaptive cruise control (CACC) is regarded as a promising technology for achieving efficient and safe collaboration among connected and automated vehicles (CAVs) in a platoon, and multiagent reinforcement learning (MARL) methods are emerging as an effective approach to implementing the CACC technology. However, most MARL methods do not sufficiently tackle the prevalent string stability problem, even when integrating communication mechanisms to improve agents’ understanding of CACC scenarios. This limitation arises because these methods typically learn communication mechanisms based solely on the information directly observable by the agents, neglecting potentially valuable information present in the environment. In this article, we propose a multiagent actor–critic with a potential-driven motivation (MAACPM) approach, which utilizes variational inference theory to infer the potential motivation representation space in the CACC task, providing a more favorable opportunity for adjusting driving behavior within the platoon. Furthermore, we quantify the specific impact of potential motivation on each vehicle by measuring the difference between policies with and without potential motivation. We then utilize this difference as a potential reward signal to incentivize the agent to grasp effective potential motivation. The proposed method was validated in two typical CACC scenarios, where we compared the performance of our MAACPM algorithm with other state-of-the-art MARL methods to demonstrate its effectiveness. Furthermore, we illustrate potential real-world applications of our method by comparing it with actual vehicle driving data.