分布式多车辆追踪方案:生成式多对抗强化学习

Biomimetic Intelligence and Robotics Pub Date : 2023-09-13 DOI:10.20517/ir.2023.25

Xinhang Li, Yiying Yang, Qinwen Wang, Zheng Yuan, Chen Xu, Lei Li, Lin Zhang

{"title":"分布式多车辆追踪方案:生成式多对抗强化学习","authors":"Xinhang Li, Yiying Yang, Qinwen Wang, Zheng Yuan, Chen Xu, Lei Li, Lin Zhang","doi":"10.20517/ir.2023.25","DOIUrl":null,"url":null,"abstract":"Multi-vehicle pursuit (MVP) is one of the most challenging problems for intelligent traffic management systems due to multi-source heterogeneous data and its mission nature. While many reinforcement learning (RL) algorithms have shown promising abilities for MVP in structured grid-pattern roads, their lack of dynamic and effective traffic awareness limits pursuing efficiency. The sparse reward of pursuing tasks still hinders the optimization of these RL algorithms. Therefore, this paper proposes a distributed generative multi-adversarial RL for MVP (DGMARL-MVP) in urban traffic scenes. In DGMARL-MVP, a generative multi-adversarial network is designed to improve the Bellman equation by generating the potential dense reward, thereby properly guiding strategy optimization of distributed multi-agent RL. Moreover, a graph neural network-based intersecting cognition is proposed to extract integrated features of traffic situations and relationships among agents from multi-source heterogeneous data. These integrated and comprehensive traffic features are used to assist RL decision-making and improve pursuing efficiency. Extensive experimental results show that the DGMARL-MVP can reduce the pursuit time by 5.47% compared with proximal policy optimization and improve the pursuing average success rate up to 85.67%. Codes are open-sourced in Github.","PeriodicalId":100184,"journal":{"name":"Biomimetic Intelligence and Robotics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A distributed multi-vehicle pursuit scheme: generative multi-adversarial reinforcement learning\",\"authors\":\"Xinhang Li, Yiying Yang, Qinwen Wang, Zheng Yuan, Chen Xu, Lei Li, Lin Zhang\",\"doi\":\"10.20517/ir.2023.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-vehicle pursuit (MVP) is one of the most challenging problems for intelligent traffic management systems due to multi-source heterogeneous data and its mission nature. While many reinforcement learning (RL) algorithms have shown promising abilities for MVP in structured grid-pattern roads, their lack of dynamic and effective traffic awareness limits pursuing efficiency. The sparse reward of pursuing tasks still hinders the optimization of these RL algorithms. Therefore, this paper proposes a distributed generative multi-adversarial RL for MVP (DGMARL-MVP) in urban traffic scenes. In DGMARL-MVP, a generative multi-adversarial network is designed to improve the Bellman equation by generating the potential dense reward, thereby properly guiding strategy optimization of distributed multi-agent RL. Moreover, a graph neural network-based intersecting cognition is proposed to extract integrated features of traffic situations and relationships among agents from multi-source heterogeneous data. These integrated and comprehensive traffic features are used to assist RL decision-making and improve pursuing efficiency. Extensive experimental results show that the DGMARL-MVP can reduce the pursuit time by 5.47% compared with proximal policy optimization and improve the pursuing average success rate up to 85.67%. Codes are open-sourced in Github.\",\"PeriodicalId\":100184,\"journal\":{\"name\":\"Biomimetic Intelligence and Robotics\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomimetic Intelligence and Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20517/ir.2023.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomimetic Intelligence and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20517/ir.2023.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于多源异构数据及其任务特性，多车追踪是智能交通管理系统中最具挑战性的问题之一。虽然许多强化学习(RL)算法在结构化网格道路中表现出了很好的MVP能力，但它们缺乏动态和有效的交通感知，限制了对效率的追求。追求任务的稀疏奖励仍然阻碍了这些强化学习算法的优化。为此，本文提出了一种面向城市交通场景的分布式生成多对抗强化学习(DGMARL-MVP)。在dgmar - mvp中，设计了一个生成式多对抗网络，通过生成潜在的密集奖励来改进Bellman方程，从而正确指导分布式多智能体强化学习的策略优化。在此基础上，提出了一种基于图神经网络的交叉认知方法，从多源异构数据中提取交通状况的综合特征和agent之间的关系。这些综合综合的交通特征用于辅助RL决策，提高追求效率。大量的实验结果表明，与最近邻策略优化相比，dgmar - mvp可将追踪时间减少5.47%，将追踪平均成功率提高到85.67%。代码在Github中是开源的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A distributed multi-vehicle pursuit scheme: generative multi-adversarial reinforcement learning

Multi-vehicle pursuit (MVP) is one of the most challenging problems for intelligent traffic management systems due to multi-source heterogeneous data and its mission nature. While many reinforcement learning (RL) algorithms have shown promising abilities for MVP in structured grid-pattern roads, their lack of dynamic and effective traffic awareness limits pursuing efficiency. The sparse reward of pursuing tasks still hinders the optimization of these RL algorithms. Therefore, this paper proposes a distributed generative multi-adversarial RL for MVP (DGMARL-MVP) in urban traffic scenes. In DGMARL-MVP, a generative multi-adversarial network is designed to improve the Bellman equation by generating the potential dense reward, thereby properly guiding strategy optimization of distributed multi-agent RL. Moreover, a graph neural network-based intersecting cognition is proposed to extract integrated features of traffic situations and relationships among agents from multi-source heterogeneous data. These integrated and comprehensive traffic features are used to assist RL decision-making and improve pursuing efficiency. Extensive experimental results show that the DGMARL-MVP can reduce the pursuit time by 5.47% compared with proximal policy optimization and improve the pursuing average success rate up to 85.67%. Codes are open-sourced in Github.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomimetic Intelligence and Robotics

CiteScore

1.80

自引率

0.00%

发文量