Hybrid lane change strategy of autonomous vehicles based on SOAR cognitive architecture and deep reinforcement learning

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-10-05 DOI:10.1016/j.neucom.2024.128669

Rongliang Zhou , Haotian Cao , Jiakun Huang , Xiaolin Song , Jing Huang , Zhi Huang

{"title":"Hybrid lane change strategy of autonomous vehicles based on SOAR cognitive architecture and deep reinforcement learning","authors":"Rongliang Zhou , Haotian Cao , Jiakun Huang , Xiaolin Song , Jing Huang , Zhi Huang","doi":"10.1016/j.neucom.2024.128669","DOIUrl":null,"url":null,"abstract":"<div><div>Research on lane change strategies for autonomous vehicles holds paramount importance in optimizing traffic flow efficiency, enhancing driving safety, and adapting to complex traffic environments. While numerous rule-based or machine-learning approaches have been explored to tackle the challenge of lane change on highways, they frequently exhibit limited performance owing to the complexity of driving environments. This study proposes a novel lane change strategy for autonomous vehicles, which utilizes a hybrid framework integrating the SOAR cognitive architecture and deep reinforcement learning (DRL) to address the lane change challenge on highways. First, we introduce a rule extraction algorithm, the RuleCOSI+, which is based on tree ensemble algorithms, designed to extract concise lane change rules from large-scale human driving data. These straightforward rules, together with traffic regulations and safety rules, constitute the long-term memory of the SOAR cognitive architecture, enabling transparent decision-making processes. Next, by analyzing the clipping mechanism of the proximal policy optimization (PPO) algorithm, we propose an Adaptive Clipping PPO (ACPPO) algorithm which is based on the importance of samples. This algorithm adopts different clipping strategies for SOAR samples and ACPPO samples during the training process, enabling the algorithm to more effectively utilize samples with different levels of importance. Then, we propose a hybrid decision-making algorithm: SOAR-ACPPO, which combines the SOAR cognitive architecture with the ACPPO algorithm. This algorithm leverages SOAR’s prior knowledge to effectively and safely guide agent learning. Finally, by selecting appropriate intervention probability and weaning strategy, the system avoids inappropriate knowledge intervention and ensures adequate environment exploration. Simulation experiments conducted using the CARLA simulator illustrate that the proposed strategy not only improves model learning efficiency but also enhances driving efficiency and safety. Additionally, it demonstrates a certain degree of human-like characteristics and interpretability.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014401","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Research on lane change strategies for autonomous vehicles holds paramount importance in optimizing traffic flow efficiency, enhancing driving safety, and adapting to complex traffic environments. While numerous rule-based or machine-learning approaches have been explored to tackle the challenge of lane change on highways, they frequently exhibit limited performance owing to the complexity of driving environments. This study proposes a novel lane change strategy for autonomous vehicles, which utilizes a hybrid framework integrating the SOAR cognitive architecture and deep reinforcement learning (DRL) to address the lane change challenge on highways. First, we introduce a rule extraction algorithm, the RuleCOSI+, which is based on tree ensemble algorithms, designed to extract concise lane change rules from large-scale human driving data. These straightforward rules, together with traffic regulations and safety rules, constitute the long-term memory of the SOAR cognitive architecture, enabling transparent decision-making processes. Next, by analyzing the clipping mechanism of the proximal policy optimization (PPO) algorithm, we propose an Adaptive Clipping PPO (ACPPO) algorithm which is based on the importance of samples. This algorithm adopts different clipping strategies for SOAR samples and ACPPO samples during the training process, enabling the algorithm to more effectively utilize samples with different levels of importance. Then, we propose a hybrid decision-making algorithm: SOAR-ACPPO, which combines the SOAR cognitive architecture with the ACPPO algorithm. This algorithm leverages SOAR’s prior knowledge to effectively and safely guide agent learning. Finally, by selecting appropriate intervention probability and weaning strategy, the system avoids inappropriate knowledge intervention and ensures adequate environment exploration. Simulation experiments conducted using the CARLA simulator illustrate that the proposed strategy not only improves model learning efficiency but also enhances driving efficiency and safety. Additionally, it demonstrates a certain degree of human-like characteristics and interpretability.

查看原文本刊更多论文

基于 SOAR 认知架构和深度强化学习的自动驾驶汽车混合变道策略

研究自动驾驶车辆的变道策略对于优化交通流效率、提高驾驶安全性以及适应复杂的交通环境至关重要。虽然人们探索了许多基于规则或机器学习的方法来应对高速公路上的变道挑战，但由于驾驶环境的复杂性，这些方法往往表现出有限的性能。本研究提出了一种新颖的自动驾驶汽车变道策略，该策略采用了一种混合框架，集成了 SOAR 认知架构和深度强化学习（DRL），以应对高速公路上的变道挑战。首先，我们介绍了一种规则提取算法 RuleCOSI+，该算法基于树状集合算法，旨在从大规模人类驾驶数据中提取简明的变道规则。这些简单明了的规则与交通法规和安全规则一起构成了 SOAR 认知架构的长期记忆，使决策过程透明化。接下来，通过分析近端策略优化（PPO）算法的剪切机制，我们提出了基于样本重要性的自适应剪切 PPO（ACPPO）算法。该算法在训练过程中对 SOAR 样本和 ACPPO 样本采用不同的剪裁策略，从而使算法能更有效地利用不同重要程度的样本。然后，我们提出了一种混合决策算法：SOAR-ACPPO 将 SOAR 认知架构与 ACPPO 算法相结合。该算法利用 SOAR 的先验知识，有效、安全地指导代理学习。最后，通过选择适当的干预概率和断奶策略，系统避免了不适当的知识干预，确保了充分的环境探索。使用 CARLA 模拟器进行的仿真实验表明，所提出的策略不仅提高了模型学习效率，还增强了驾驶效率和安全性。此外，该系统还具有一定程度的类人特征和可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.