基于模仿学习初始化强化学习的高效变道行为规划

2023 IEEE Intelligent Vehicles Symposium (IV) Pub Date : 2023-06-04 DOI:10.1109/IV55152.2023.10186577

Jiamin Shi, Tangyike Zhang, Junxiang Zhan, Shi-tao Chen, J. Xin, Nanning Zheng

{"title":"基于模仿学习初始化强化学习的高效变道行为规划","authors":"Jiamin Shi, Tangyike Zhang, Junxiang Zhan, Shi-tao Chen, J. Xin, Nanning Zheng","doi":"10.1109/IV55152.2023.10186577","DOIUrl":null,"url":null,"abstract":"Robust lane-changing behavior planning is critical to ensuring the safety and comfort of autonomous vehicles. In this paper, we proposed an efficient and robust vehicle lane-changing behavior decision-making method based on reinforcement learning (RL) and imitation learning (IL) initialization which learns the potential lane-changing driving mechanisms from driving mechanism from the interactions between vehicle and environment, so as to simplify the manual driving modeling and have good adaptability to the dynamic changes of lane-changing scene. Our method further makes the following improvements on the basis of the Proximal Policy Optimization (PPO) algorithm: (1) A dynamic hybrid reward mechanism for lane-changing tasks is adopted; (2) A state space construction method based on fuzzy logic and deformation pose is presented to enable behavior planning to learn more refined tactical decision-making; (3) An RL initialization method based on imitation learning which only requires a small amount of scene data is introduced to solve the low efficiency of RL learning under sparse reward. Experiments on the SUMO show the effectiveness of the proposed method, and the test on the CARLA simulator also verifies the generalization ability of the method.","PeriodicalId":195148,"journal":{"name":"2023 IEEE Intelligent Vehicles Symposium (IV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization\",\"authors\":\"Jiamin Shi, Tangyike Zhang, Junxiang Zhan, Shi-tao Chen, J. Xin, Nanning Zheng\",\"doi\":\"10.1109/IV55152.2023.10186577\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Robust lane-changing behavior planning is critical to ensuring the safety and comfort of autonomous vehicles. In this paper, we proposed an efficient and robust vehicle lane-changing behavior decision-making method based on reinforcement learning (RL) and imitation learning (IL) initialization which learns the potential lane-changing driving mechanisms from driving mechanism from the interactions between vehicle and environment, so as to simplify the manual driving modeling and have good adaptability to the dynamic changes of lane-changing scene. Our method further makes the following improvements on the basis of the Proximal Policy Optimization (PPO) algorithm: (1) A dynamic hybrid reward mechanism for lane-changing tasks is adopted; (2) A state space construction method based on fuzzy logic and deformation pose is presented to enable behavior planning to learn more refined tactical decision-making; (3) An RL initialization method based on imitation learning which only requires a small amount of scene data is introduced to solve the low efficiency of RL learning under sparse reward. Experiments on the SUMO show the effectiveness of the proposed method, and the test on the CARLA simulator also verifies the generalization ability of the method.\",\"PeriodicalId\":195148,\"journal\":{\"name\":\"2023 IEEE Intelligent Vehicles Symposium (IV)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Intelligent Vehicles Symposium (IV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IV55152.2023.10186577\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV55152.2023.10186577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

稳健的变道行为规划对于确保自动驾驶汽车的安全性和舒适性至关重要。本文提出了一种基于强化学习(RL)和模仿学习(IL)初始化的高效鲁棒的车辆变道行为决策方法，从车辆与环境的相互作用中从驾驶机制中学习潜在的变道驾驶机制，从而简化人工驾驶建模，对变道场景的动态变化具有良好的适应性。该方法在近端策略优化(PPO)算法的基础上做了以下改进:(1)对变道任务采用动态混合奖励机制;(2)提出了一种基于模糊逻辑和变形姿态的状态空间构建方法，使行为规划能够学习更精细的战术决策;(3)针对稀疏奖励下RL学习效率低的问题，提出了一种基于模仿学习且只需要少量场景数据的RL初始化方法。SUMO仿真实验证明了该方法的有效性，CARLA仿真实验也验证了该方法的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization

Robust lane-changing behavior planning is critical to ensuring the safety and comfort of autonomous vehicles. In this paper, we proposed an efficient and robust vehicle lane-changing behavior decision-making method based on reinforcement learning (RL) and imitation learning (IL) initialization which learns the potential lane-changing driving mechanisms from driving mechanism from the interactions between vehicle and environment, so as to simplify the manual driving modeling and have good adaptability to the dynamic changes of lane-changing scene. Our method further makes the following improvements on the basis of the Proximal Policy Optimization (PPO) algorithm: (1) A dynamic hybrid reward mechanism for lane-changing tasks is adopted; (2) A state space construction method based on fuzzy logic and deformation pose is presented to enable behavior planning to learn more refined tactical decision-making; (3) An RL initialization method based on imitation learning which only requires a small amount of scene data is introduced to solve the low efficiency of RL learning under sparse reward. Experiments on the SUMO show the effectiveness of the proposed method, and the test on the CARLA simulator also verifies the generalization ability of the method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE Intelligent Vehicles Symposium (IV)

自引率

0.00%

发文量