基于模仿学习初始化强化学习的高效变道行为规划

Jiamin Shi, Tangyike Zhang, Junxiang Zhan, Shi-tao Chen, J. Xin, Nanning Zheng
{"title":"基于模仿学习初始化强化学习的高效变道行为规划","authors":"Jiamin Shi, Tangyike Zhang, Junxiang Zhan, Shi-tao Chen, J. Xin, Nanning Zheng","doi":"10.1109/IV55152.2023.10186577","DOIUrl":null,"url":null,"abstract":"Robust lane-changing behavior planning is critical to ensuring the safety and comfort of autonomous vehicles. In this paper, we proposed an efficient and robust vehicle lane-changing behavior decision-making method based on reinforcement learning (RL) and imitation learning (IL) initialization which learns the potential lane-changing driving mechanisms from driving mechanism from the interactions between vehicle and environment, so as to simplify the manual driving modeling and have good adaptability to the dynamic changes of lane-changing scene. Our method further makes the following improvements on the basis of the Proximal Policy Optimization (PPO) algorithm: (1) A dynamic hybrid reward mechanism for lane-changing tasks is adopted; (2) A state space construction method based on fuzzy logic and deformation pose is presented to enable behavior planning to learn more refined tactical decision-making; (3) An RL initialization method based on imitation learning which only requires a small amount of scene data is introduced to solve the low efficiency of RL learning under sparse reward. Experiments on the SUMO show the effectiveness of the proposed method, and the test on the CARLA simulator also verifies the generalization ability of the method.","PeriodicalId":195148,"journal":{"name":"2023 IEEE Intelligent Vehicles Symposium (IV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization\",\"authors\":\"Jiamin Shi, Tangyike Zhang, Junxiang Zhan, Shi-tao Chen, J. Xin, Nanning Zheng\",\"doi\":\"10.1109/IV55152.2023.10186577\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Robust lane-changing behavior planning is critical to ensuring the safety and comfort of autonomous vehicles. In this paper, we proposed an efficient and robust vehicle lane-changing behavior decision-making method based on reinforcement learning (RL) and imitation learning (IL) initialization which learns the potential lane-changing driving mechanisms from driving mechanism from the interactions between vehicle and environment, so as to simplify the manual driving modeling and have good adaptability to the dynamic changes of lane-changing scene. Our method further makes the following improvements on the basis of the Proximal Policy Optimization (PPO) algorithm: (1) A dynamic hybrid reward mechanism for lane-changing tasks is adopted; (2) A state space construction method based on fuzzy logic and deformation pose is presented to enable behavior planning to learn more refined tactical decision-making; (3) An RL initialization method based on imitation learning which only requires a small amount of scene data is introduced to solve the low efficiency of RL learning under sparse reward. Experiments on the SUMO show the effectiveness of the proposed method, and the test on the CARLA simulator also verifies the generalization ability of the method.\",\"PeriodicalId\":195148,\"journal\":{\"name\":\"2023 IEEE Intelligent Vehicles Symposium (IV)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Intelligent Vehicles Symposium (IV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IV55152.2023.10186577\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV55152.2023.10186577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

稳健的变道行为规划对于确保自动驾驶汽车的安全性和舒适性至关重要。本文提出了一种基于强化学习(RL)和模仿学习(IL)初始化的高效鲁棒的车辆变道行为决策方法,从车辆与环境的相互作用中从驾驶机制中学习潜在的变道驾驶机制,从而简化人工驾驶建模,对变道场景的动态变化具有良好的适应性。该方法在近端策略优化(PPO)算法的基础上做了以下改进:(1)对变道任务采用动态混合奖励机制;(2)提出了一种基于模糊逻辑和变形姿态的状态空间构建方法,使行为规划能够学习更精细的战术决策;(3)针对稀疏奖励下RL学习效率低的问题,提出了一种基于模仿学习且只需要少量场景数据的RL初始化方法。SUMO仿真实验证明了该方法的有效性,CARLA仿真实验也验证了该方法的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization
Robust lane-changing behavior planning is critical to ensuring the safety and comfort of autonomous vehicles. In this paper, we proposed an efficient and robust vehicle lane-changing behavior decision-making method based on reinforcement learning (RL) and imitation learning (IL) initialization which learns the potential lane-changing driving mechanisms from driving mechanism from the interactions between vehicle and environment, so as to simplify the manual driving modeling and have good adaptability to the dynamic changes of lane-changing scene. Our method further makes the following improvements on the basis of the Proximal Policy Optimization (PPO) algorithm: (1) A dynamic hybrid reward mechanism for lane-changing tasks is adopted; (2) A state space construction method based on fuzzy logic and deformation pose is presented to enable behavior planning to learn more refined tactical decision-making; (3) An RL initialization method based on imitation learning which only requires a small amount of scene data is introduced to solve the low efficiency of RL learning under sparse reward. Experiments on the SUMO show the effectiveness of the proposed method, and the test on the CARLA simulator also verifies the generalization ability of the method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信