Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic

arXiv - CS - Computational Engineering, Finance, and Science Pub Date : 2024-08-08 DOI:arxiv-2408.04447

Yuting Wang, Lu Liu, Maonan Wang, Xi Xiong

{"title":"Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic","authors":"Yuting Wang, Lu Liu, Maonan Wang, Xi Xiong","doi":"arxiv-2408.04447","DOIUrl":null,"url":null,"abstract":"The burgeoning field of autonomous driving necessitates the seamless\nintegration of autonomous vehicles (AVs) with human-driven vehicles, calling\nfor more predictable AV behavior and enhanced interaction with human drivers.\nHuman-like driving, particularly during lane-changing maneuvers on highways, is\na critical area of research due to its significant impact on safety and traffic\nflow. Traditional rule-based decision-making approaches often fail to\nencapsulate the nuanced boundaries of human behavior in diverse driving\nscenarios, while crafting reward functions for learning-based methods\nintroduces its own set of complexities. This study investigates the application\nof Reinforcement Learning from Human Feedback (RLHF) to emulate human-like\nlane-changing decisions in AVs. An initial RL policy is pre-trained to ensure\nsafe lane changes. Subsequently, this policy is employed to gather data, which\nis then annotated by humans to train a reward model that discerns lane changes\naligning with human preferences. This human-informed reward model supersedes\nthe original, guiding the refinement of the policy to reflect human-like\npreferences. The effectiveness of RLHF in producing human-like lane changes is\ndemonstrated through the development and evaluation of conservative and\naggressive lane-changing models within obstacle-rich environments and mixed\nautonomy traffic scenarios. The experimental outcomes underscore the potential\nof RLHF to diversify lane-changing behaviors in AVs, suggesting its viability\nfor enhancing the integration of AVs into the fabric of human-driven traffic.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The burgeoning field of autonomous driving necessitates the seamless integration of autonomous vehicles (AVs) with human-driven vehicles, calling for more predictable AV behavior and enhanced interaction with human drivers. Human-like driving, particularly during lane-changing maneuvers on highways, is a critical area of research due to its significant impact on safety and traffic flow. Traditional rule-based decision-making approaches often fail to encapsulate the nuanced boundaries of human behavior in diverse driving scenarios, while crafting reward functions for learning-based methods introduces its own set of complexities. This study investigates the application of Reinforcement Learning from Human Feedback (RLHF) to emulate human-like lane-changing decisions in AVs. An initial RL policy is pre-trained to ensure safe lane changes. Subsequently, this policy is employed to gather data, which is then annotated by humans to train a reward model that discerns lane changes aligning with human preferences. This human-informed reward model supersedes the original, guiding the refinement of the policy to reflect human-like preferences. The effectiveness of RLHF in producing human-like lane changes is demonstrated through the development and evaluation of conservative and aggressive lane-changing models within obstacle-rich environments and mixed autonomy traffic scenarios. The experimental outcomes underscore the potential of RLHF to diversify lane-changing behaviors in AVs, suggesting its viability for enhancing the integration of AVs into the fabric of human-driven traffic.

查看原文本刊更多论文

混合交通中自动驾驶车辆变道的人机反馈强化学习

蓬勃发展的自动驾驶领域要求自动驾驶汽车（AV）与人类驾驶的车辆无缝集成，这就要求自动驾驶汽车的行为更具可预测性，并加强与人类驾驶员的互动。仿人驾驶，尤其是在高速公路上的变道操作过程中，由于对安全和交通流量具有重大影响，因此是一个关键的研究领域。传统的基于规则的决策方法往往无法囊括不同驾驶场景中人类行为的细微界限，而为基于学习的方法设计奖励函数也带来了一系列复杂问题。本研究探讨了应用 "人类反馈强化学习"（RLHF）在自动驾驶汽车中模拟人类可能改变的决策。对初始 RL 策略进行了预训练，以确保安全变道。随后，利用该策略收集数据，再由人类对数据进行注释，从而训练出一个奖励模型，用于识别符合人类偏好的车道变更。这种由人类提供信息的奖励模型取代了原来的奖励模型，指导政策的改进，以反映类似人类的偏好。通过在障碍物丰富的环境和混合自动驾驶交通场景中开发和评估保守型和激进型变道模型，证明了 RLHF 在产生类人变道方面的有效性。实验结果凸显了 RLHF 在使自动驾驶汽车变道行为多样化方面的潜力，表明其在促进自动驾驶汽车融入人类驾驶交通结构方面的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computational Engineering, Finance, and Science

自引率

0.00%

发文量