{"title":"基于支持集约束 BEAR 算法的自动驾驶汽车变道策略离线强化学习","authors":"Caixia Huang, Yuxiang Wang, Zhiyong Zhang, Wenming Feng, Dayang Huang","doi":"10.1177/09544070241265752","DOIUrl":null,"url":null,"abstract":"Imitation learning struggles to learn an optimal policy from datasets containing both expert and non-expert samples due to its inability to discern the quality differences between these samples. Furthermore, standard online reinforcement learning (RL) methodologies face significant exploration costs and safety risks during environmental interactions. Addressing these challenges, this study develops a lane-changing model for autonomous vehicles using the bootstrapping error accumulation reduction (BEAR) algorithm. The model initially examines the distributional shifts between behavioral and target policies in offline RL. It then incorporates the BEAR algorithm, enhanced with support set constraints, to mitigate this issue. The study subsequently proposes a lane-changing policy learning method based on the BEAR algorithm in offline RL. This method involves designing the state space, action set, and reward function. The reward function is tailored to guide the autonomous vehicle in executing lane changes while balancing safety, ride comfort, and traffic efficiency. In the final stage, the lane-changing policy is learned using a dataset of both expert and non-expert samples. Test results indicate that the lane-changing policy developed through this method shows higher success rates and safety levels compared to policies derived via imitation learning.","PeriodicalId":54568,"journal":{"name":"Proceedings of the Institution of Mechanical Engineers Part D-Journal of Automobile Engineering","volume":"79 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lane-changing policy offline reinforcement learning of autonomous vehicles based on BEAR algorithm with support set constraints\",\"authors\":\"Caixia Huang, Yuxiang Wang, Zhiyong Zhang, Wenming Feng, Dayang Huang\",\"doi\":\"10.1177/09544070241265752\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imitation learning struggles to learn an optimal policy from datasets containing both expert and non-expert samples due to its inability to discern the quality differences between these samples. Furthermore, standard online reinforcement learning (RL) methodologies face significant exploration costs and safety risks during environmental interactions. Addressing these challenges, this study develops a lane-changing model for autonomous vehicles using the bootstrapping error accumulation reduction (BEAR) algorithm. The model initially examines the distributional shifts between behavioral and target policies in offline RL. It then incorporates the BEAR algorithm, enhanced with support set constraints, to mitigate this issue. The study subsequently proposes a lane-changing policy learning method based on the BEAR algorithm in offline RL. This method involves designing the state space, action set, and reward function. The reward function is tailored to guide the autonomous vehicle in executing lane changes while balancing safety, ride comfort, and traffic efficiency. In the final stage, the lane-changing policy is learned using a dataset of both expert and non-expert samples. Test results indicate that the lane-changing policy developed through this method shows higher success rates and safety levels compared to policies derived via imitation learning.\",\"PeriodicalId\":54568,\"journal\":{\"name\":\"Proceedings of the Institution of Mechanical Engineers Part D-Journal of Automobile Engineering\",\"volume\":\"79 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Institution of Mechanical Engineers Part D-Journal of Automobile Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1177/09544070241265752\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, MECHANICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Institution of Mechanical Engineers Part D-Journal of Automobile Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1177/09544070241265752","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}
Lane-changing policy offline reinforcement learning of autonomous vehicles based on BEAR algorithm with support set constraints
Imitation learning struggles to learn an optimal policy from datasets containing both expert and non-expert samples due to its inability to discern the quality differences between these samples. Furthermore, standard online reinforcement learning (RL) methodologies face significant exploration costs and safety risks during environmental interactions. Addressing these challenges, this study develops a lane-changing model for autonomous vehicles using the bootstrapping error accumulation reduction (BEAR) algorithm. The model initially examines the distributional shifts between behavioral and target policies in offline RL. It then incorporates the BEAR algorithm, enhanced with support set constraints, to mitigate this issue. The study subsequently proposes a lane-changing policy learning method based on the BEAR algorithm in offline RL. This method involves designing the state space, action set, and reward function. The reward function is tailored to guide the autonomous vehicle in executing lane changes while balancing safety, ride comfort, and traffic efficiency. In the final stage, the lane-changing policy is learned using a dataset of both expert and non-expert samples. Test results indicate that the lane-changing policy developed through this method shows higher success rates and safety levels compared to policies derived via imitation learning.
期刊介绍:
The Journal of Automobile Engineering is an established, high quality multi-disciplinary journal which publishes the very best peer-reviewed science and engineering in the field.