人类是人工智能的导师：强化人在环强化学习，实现安全高效的自动驾驶

IF 14.5 Q1 TRANSPORTATION

Communications in Transportation Research Pub Date : 2024-05-08 DOI:10.1016/j.commtr.2024.100127

Zilin Huang, Zihao Sheng, Chengyuan Ma, Sikai Chen

{"title":"人类是人工智能的导师：强化人在环强化学习，实现安全高效的自动驾驶","authors":"Zilin Huang, Zihao Sheng, Chengyuan Ma, Sikai Chen","doi":"10.1016/j.commtr.2024.100127","DOIUrl":null,"url":null,"abstract":"<div><p>Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.</p></div>","PeriodicalId":100292,"journal":{"name":"Communications in Transportation Research","volume":"4 ","pages":"Article 100127"},"PeriodicalIF":14.5000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772424724000106/pdfft?md5=926541f5937b5ee27465791694dbead5&pid=1-s2.0-S2772424724000106-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving\",\"authors\":\"Zilin Huang, Zihao Sheng, Chengyuan Ma, Sikai Chen\",\"doi\":\"10.1016/j.commtr.2024.100127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.</p></div>\",\"PeriodicalId\":100292,\"journal\":{\"name\":\"Communications in Transportation Research\",\"volume\":\"4 \",\"pages\":\"Article 100127\"},\"PeriodicalIF\":14.5000,\"publicationDate\":\"2024-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772424724000106/pdfft?md5=926541f5937b5ee27465791694dbead5&pid=1-s2.0-S2772424724000106-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications in Transportation Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772424724000106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Transportation Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772424724000106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION","Score":null,"Total":0}

引用次数: 0

摘要

尽管自动驾驶汽车（AVs）取得了重大进展，但如何制定既能确保自动驾驶汽车安全又能提高交通流量效率的驾驶政策尚未得到充分探索。在本文中，我们提出了一种增强型环内强化学习方法，即基于人工智能导师的人类深度强化学习（HAIM-DRL）框架，该框架有助于在混合交通队列中实现安全高效的自动驾驶。从人类的学习过程中汲取灵感，我们首先介绍了一种创新的学习范式，它能有效地将人类智能注入人工智能，即 "人类即人工智能导师"（HAIM）。在这一范例中，人类专家充当人工智能代理的导师。在允许代理充分探索不确定环境的同时，人类专家可以在危险情况下进行控制，并示范正确的操作以避免潜在事故。另一方面，人工智能代理可以在指导下尽量减少对交通流的干扰，从而优化交通流效率。具体来说，HAIM-DRL 利用从自由探索和部分人类示范中收集的数据作为两个训练源。值得注意的是，我们避免了人工设计奖励函数的复杂过程，而是直接从部分人类演示中得出代理状态-行动值，以指导代理的策略学习。此外，我们还采用了最小干预技术，以减轻人类指导员的认知负担。比较结果表明，HAIM-DRL 在驾驶安全性、采样效率、减轻交通流干扰以及对未知交通场景的泛化能力方面均优于传统方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving

Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Communications in Transportation Research

CiteScore

15.20

自引率

0.00%

发文量