基于多奖励体系结构的公路驾驶策略强化学习

2019 IEEE Intelligent Transportation Systems Conference (ITSC) Pub Date : 2019-10-01 DOI:10.1109/ITSC.2019.8917304

Wei Yuan, Ming Yang, Yuesheng He, Chunxiang Wang, B. Wang

{"title":"基于多奖励体系结构的公路驾驶策略强化学习","authors":"Wei Yuan, Ming Yang, Yuesheng He, Chunxiang Wang, B. Wang","doi":"10.1109/ITSC.2019.8917304","DOIUrl":null,"url":null,"abstract":"A safe and efficient driving policy is essential for the future autonomous highway driving. However, driving policies are hard for modeling because of the diversity of scenes and uncertainties of the interaction with surrounding vehicles. The state-of-the-art deep reinforcement learning method is unable to learn good domain knowledge for highway driving policies using single reward architecture. This paper proposes a Multi-Reward Architecture (MRA) based reinforcement learning for highway driving policies. A single reward function is decomposed to multi-reward functions for better representation of multi-dimensional driving policies. Besides the big penalty for collision, the overall reward is decomposed to three dimensional rewards: the reward for speed, the reward for overtake, and the reward for lane-change. Then, each reward trains a branch of Q-network for corresponding domain knowledge. The Q-network is divided into two parts: low-level network is shared by three branches of high-level networks, which approximate the corresponding Q-value for the different reward functions respectively. The agent car chooses the action based on the sum of Q vectors from three branches. Experiments are conducted in a simulation platform, which performs the highway driving process and the agent car is able to provide the commonly used sensor data: the image and the point cloud. Experiment results show that the proposed method performs better than the DQN method on single reward architecture with three evaluations: higher speed, lower frequency of lane-change, more quantity of overtaking, which is more efficient and safer for the future autonomous highway driving.","PeriodicalId":6717,"journal":{"name":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","volume":"24 1","pages":"3810-3815"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies\",\"authors\":\"Wei Yuan, Ming Yang, Yuesheng He, Chunxiang Wang, B. Wang\",\"doi\":\"10.1109/ITSC.2019.8917304\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A safe and efficient driving policy is essential for the future autonomous highway driving. However, driving policies are hard for modeling because of the diversity of scenes and uncertainties of the interaction with surrounding vehicles. The state-of-the-art deep reinforcement learning method is unable to learn good domain knowledge for highway driving policies using single reward architecture. This paper proposes a Multi-Reward Architecture (MRA) based reinforcement learning for highway driving policies. A single reward function is decomposed to multi-reward functions for better representation of multi-dimensional driving policies. Besides the big penalty for collision, the overall reward is decomposed to three dimensional rewards: the reward for speed, the reward for overtake, and the reward for lane-change. Then, each reward trains a branch of Q-network for corresponding domain knowledge. The Q-network is divided into two parts: low-level network is shared by three branches of high-level networks, which approximate the corresponding Q-value for the different reward functions respectively. The agent car chooses the action based on the sum of Q vectors from three branches. Experiments are conducted in a simulation platform, which performs the highway driving process and the agent car is able to provide the commonly used sensor data: the image and the point cloud. Experiment results show that the proposed method performs better than the DQN method on single reward architecture with three evaluations: higher speed, lower frequency of lane-change, more quantity of overtaking, which is more efficient and safer for the future autonomous highway driving.\",\"PeriodicalId\":6717,\"journal\":{\"name\":\"2019 IEEE Intelligent Transportation Systems Conference (ITSC)\",\"volume\":\"24 1\",\"pages\":\"3810-3815\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Intelligent Transportation Systems Conference (ITSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITSC.2019.8917304\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC.2019.8917304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

安全高效的驾驶策略对于未来的高速公路自动驾驶至关重要。然而，由于场景的多样性和与周围车辆相互作用的不确定性，驾驶策略很难建模。目前最先进的深度强化学习方法在使用单一奖励架构的情况下无法学习到良好的高速公路驾驶策略领域知识。提出了一种基于多奖励体系结构(Multi-Reward Architecture, MRA)的公路驾驶策略强化学习方法。为了更好地表示多维驾驶策略，将单个奖励函数分解为多个奖励函数。除了碰撞大的奖励外，整体奖励被分解为三个维度的奖励:速度奖励、超车奖励和变道奖励。然后，每个奖励训练q网络的一个分支来获取相应的领域知识。Q-network分为两部分:低级网络由高级网络的三个分支共享，它们分别近似不同奖励函数对应的q值。智能体根据三个分支的Q向量的和来选择行动。在仿真平台上进行了实验，仿真平台模拟了高速公路行驶过程，代理车能够提供常用的传感器数据:图像和点云。实验结果表明，该方法在单奖励体系上优于DQN方法，具有更高的速度、更低的变道频率、更多的超车次数三个评价指标，为未来的自动公路驾驶提供了更高的效率和安全性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies

A safe and efficient driving policy is essential for the future autonomous highway driving. However, driving policies are hard for modeling because of the diversity of scenes and uncertainties of the interaction with surrounding vehicles. The state-of-the-art deep reinforcement learning method is unable to learn good domain knowledge for highway driving policies using single reward architecture. This paper proposes a Multi-Reward Architecture (MRA) based reinforcement learning for highway driving policies. A single reward function is decomposed to multi-reward functions for better representation of multi-dimensional driving policies. Besides the big penalty for collision, the overall reward is decomposed to three dimensional rewards: the reward for speed, the reward for overtake, and the reward for lane-change. Then, each reward trains a branch of Q-network for corresponding domain knowledge. The Q-network is divided into two parts: low-level network is shared by three branches of high-level networks, which approximate the corresponding Q-value for the different reward functions respectively. The agent car chooses the action based on the sum of Q vectors from three branches. Experiments are conducted in a simulation platform, which performs the highway driving process and the agent car is able to provide the commonly used sensor data: the image and the point cloud. Experiment results show that the proposed method performs better than the DQN method on single reward architecture with three evaluations: higher speed, lower frequency of lane-change, more quantity of overtaking, which is more efficient and safer for the future autonomous highway driving.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

自引率

0.00%

发文量