基于自适应采样监督行为批判和人类驾驶经验的自动驾驶汽车智能控制。

IF 2.6 4区工程技术 Q1 Mathematics

Mathematical Biosciences and Engineering Pub Date : 2024-05-24 DOI:10.3934/mbe.2024267

Jin Zhang, Nan Ma, Zhixuan Wu, Cheng Wang, Yongqiang Yao

{"title":"基于自适应采样监督行为批判和人类驾驶经验的自动驾驶汽车智能控制。","authors":"Jin Zhang, Nan Ma, Zhixuan Wu, Cheng Wang, Yongqiang Yao","doi":"10.3934/mbe.2024267","DOIUrl":null,"url":null,"abstract":"Due to the complexity of the driving environment and the dynamics of the behavior of traffic participants, self-driving in dense traffic flow is very challenging. Traditional methods usually rely on predefined rules, which are difficult to adapt to various driving scenarios. Deep reinforcement learning (DRL) shows advantages over rule-based methods in complex self-driving environments, demonstrating the great potential of intelligent decision-making. However, one of the problems of DRL is the inefficiency of exploration; typically, it requires a lot of trial and error to learn the optimal policy, which leads to its slow learning rate and makes it difficult for the agent to learn well-performing decision-making policies in self-driving scenarios. Inspired by the outstanding performance of supervised learning in classification tasks, we propose a self-driving intelligent control method that combines human driving experience and adaptive sampling supervised actor-critic algorithm. Unlike traditional DRL, we modified the learning process of the policy network by combining supervised learning and DRL and adding human driving experience to the learning samples to better guide the self-driving vehicle to learn the optimal policy through human driving experience and real-time human guidance. In addition, in order to make the agent learn more efficiently, we introduced real-time human guidance in its learning process, and an adaptive balanced sampling method was designed for improving the sampling performance. We also designed the reward function in detail for different evaluation indexes such as traffic efficiency, which further guides the agent to learn the self-driving intelligent control policy in a better way. The experimental results show that the method is able to control vehicles in complex traffic environments for self-driving tasks and exhibits better performance than other DRL methods.","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":"21 5","pages":"6077-6096"},"PeriodicalIF":2.6000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intelligent control of self-driving vehicles based on adaptive sampling supervised actor-critic and human driving experience.\",\"authors\":\"Jin Zhang, Nan Ma, Zhixuan Wu, Cheng Wang, Yongqiang Yao\",\"doi\":\"10.3934/mbe.2024267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the complexity of the driving environment and the dynamics of the behavior of traffic participants, self-driving in dense traffic flow is very challenging. Traditional methods usually rely on predefined rules, which are difficult to adapt to various driving scenarios. Deep reinforcement learning (DRL) shows advantages over rule-based methods in complex self-driving environments, demonstrating the great potential of intelligent decision-making. However, one of the problems of DRL is the inefficiency of exploration; typically, it requires a lot of trial and error to learn the optimal policy, which leads to its slow learning rate and makes it difficult for the agent to learn well-performing decision-making policies in self-driving scenarios. Inspired by the outstanding performance of supervised learning in classification tasks, we propose a self-driving intelligent control method that combines human driving experience and adaptive sampling supervised actor-critic algorithm. Unlike traditional DRL, we modified the learning process of the policy network by combining supervised learning and DRL and adding human driving experience to the learning samples to better guide the self-driving vehicle to learn the optimal policy through human driving experience and real-time human guidance. In addition, in order to make the agent learn more efficiently, we introduced real-time human guidance in its learning process, and an adaptive balanced sampling method was designed for improving the sampling performance. We also designed the reward function in detail for different evaluation indexes such as traffic efficiency, which further guides the agent to learn the self-driving intelligent control policy in a better way. The experimental results show that the method is able to control vehicles in complex traffic environments for self-driving tasks and exhibits better performance than other DRL methods.\",\"PeriodicalId\":49870,\"journal\":{\"name\":\"Mathematical Biosciences and Engineering\",\"volume\":\"21 5\",\"pages\":\"6077-6096\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematical Biosciences and Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3934/mbe.2024267\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2024267","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

摘要

由于驾驶环境的复杂性和交通参与者行为的动态性，在密集的交通流中进行自动驾驶非常具有挑战性。传统方法通常依赖于预定义的规则，很难适应各种驾驶场景。在复杂的自动驾驶环境中，深度强化学习（DRL）比基于规则的方法更具优势，显示出智能决策的巨大潜力。然而，DRL 的问题之一是探索效率低下，通常需要大量试错才能学习到最优策略，这导致其学习速度较慢，难以让驾驶员在自驾场景中学习到性能良好的决策策略。受监督学习在分类任务中出色表现的启发，我们提出了一种结合人类驾驶经验和自适应采样监督行为批判算法的自动驾驶智能控制方法。与传统的 DRL 不同，我们通过将监督学习和 DRL 相结合，并在学习样本中加入人类驾驶经验，对策略网络的学习过程进行了修改，以更好地引导自动驾驶汽车通过人类驾驶经验和实时人工指导学习最优策略。此外，为了使代理学习效率更高，我们在其学习过程中引入了实时人工指导，并设计了一种自适应平衡采样方法来提高采样性能。我们还针对交通效率等不同评价指标详细设计了奖励函数，从而进一步指导代理更好地学习自驾智能控制策略。实验结果表明，该方法能够在复杂的交通环境中控制车辆完成自驾任务，并比其他 DRL 方法表现出更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Intelligent control of self-driving vehicles based on adaptive sampling supervised actor-critic and human driving experience.

Due to the complexity of the driving environment and the dynamics of the behavior of traffic participants, self-driving in dense traffic flow is very challenging. Traditional methods usually rely on predefined rules, which are difficult to adapt to various driving scenarios. Deep reinforcement learning (DRL) shows advantages over rule-based methods in complex self-driving environments, demonstrating the great potential of intelligent decision-making. However, one of the problems of DRL is the inefficiency of exploration; typically, it requires a lot of trial and error to learn the optimal policy, which leads to its slow learning rate and makes it difficult for the agent to learn well-performing decision-making policies in self-driving scenarios. Inspired by the outstanding performance of supervised learning in classification tasks, we propose a self-driving intelligent control method that combines human driving experience and adaptive sampling supervised actor-critic algorithm. Unlike traditional DRL, we modified the learning process of the policy network by combining supervised learning and DRL and adding human driving experience to the learning samples to better guide the self-driving vehicle to learn the optimal policy through human driving experience and real-time human guidance. In addition, in order to make the agent learn more efficiently, we introduced real-time human guidance in its learning process, and an adaptive balanced sampling method was designed for improving the sampling performance. We also designed the reward function in detail for different evaluation indexes such as traffic efficiency, which further guides the agent to learn the self-driving intelligent control policy in a better way. The experimental results show that the method is able to control vehicles in complex traffic environments for self-driving tasks and exhibits better performance than other DRL methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mathematical Biosciences and Engineering 工程技术-数学跨学科应用

CiteScore

3.90

自引率

7.70%

发文量

586

审稿时长

>12 weeks

期刊介绍： Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing. MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).