Vidyasagar Sadhu, Chuanneng Sun, Arman Karimian, Roberto Tron, D. Pompili
{"title":"空中深度搜索:搜索任务的分布式多智能体深度强化学习","authors":"Vidyasagar Sadhu, Chuanneng Sun, Arman Karimian, Roberto Tron, D. Pompili","doi":"10.1109/MASS50613.2020.00030","DOIUrl":null,"url":null,"abstract":"Search and Rescue (SAR) is an important part of several applications of national and social interest. Existing solutions for search missions in both terrestrial and aerial domains are mostly limited to single agent and specific environments; however, search missions can significantly benefit from the use of multiple agents that can quickly adapt to new environments. In this paper, we propose a framework based on Multi-Agent Deep Reinforcement Learning (MADRL) that realizes the actor-critic framework in a distributed manner for coordinating multiple Unmanned Aerial Vehicles (UAVs) in the exploration of unknown regions. One of the original aspects of our work is that the actors represent simulated or actual UAVs exploring the environment in parallel instead of traditional computer threads. Also, we propose addition of Long Short Term Memory (LSTM) neural network layers to the actor and critic architectures to handle imperfect communication and partial observability scenarios. The proposed approach has been evaluated in a grid world and has been compared against other competing algorithms such as Multi-Agent Q-Learning, Multi-Agent Deep Q-Learning to show its advantages. More generally, our approach could be extended to image-based/continuous action space environments as well.","PeriodicalId":105795,"journal":{"name":"2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Aerial-DeepSearch: Distributed Multi-Agent Deep Reinforcement Learning for Search Missions\",\"authors\":\"Vidyasagar Sadhu, Chuanneng Sun, Arman Karimian, Roberto Tron, D. Pompili\",\"doi\":\"10.1109/MASS50613.2020.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Search and Rescue (SAR) is an important part of several applications of national and social interest. Existing solutions for search missions in both terrestrial and aerial domains are mostly limited to single agent and specific environments; however, search missions can significantly benefit from the use of multiple agents that can quickly adapt to new environments. In this paper, we propose a framework based on Multi-Agent Deep Reinforcement Learning (MADRL) that realizes the actor-critic framework in a distributed manner for coordinating multiple Unmanned Aerial Vehicles (UAVs) in the exploration of unknown regions. One of the original aspects of our work is that the actors represent simulated or actual UAVs exploring the environment in parallel instead of traditional computer threads. Also, we propose addition of Long Short Term Memory (LSTM) neural network layers to the actor and critic architectures to handle imperfect communication and partial observability scenarios. The proposed approach has been evaluated in a grid world and has been compared against other competing algorithms such as Multi-Agent Q-Learning, Multi-Agent Deep Q-Learning to show its advantages. More generally, our approach could be extended to image-based/continuous action space environments as well.\",\"PeriodicalId\":105795,\"journal\":{\"name\":\"2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MASS50613.2020.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASS50613.2020.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
摘要
搜索与救援(SAR)是涉及国家和社会利益的若干应用领域的重要组成部分。现有的地面和空中搜索任务解决方案大多局限于单个智能体和特定环境;然而,搜索任务可以显著受益于使用多个代理,这些代理可以快速适应新环境。在本文中,我们提出了一个基于多智能体深度强化学习(MADRL)的框架,该框架以分布式的方式实现了actor- critical框架,用于协调多架无人机(uav)在未知区域的探索。我们工作的一个原始方面是,演员代表模拟或实际的无人机并行探索环境,而不是传统的计算机线程。此外,我们建议在参与者和评论家架构中增加长短期记忆(LSTM)神经网络层,以处理不完美的通信和部分可观察性场景。该方法已在网格世界中进行了评估,并与其他竞争算法(如Multi-Agent Q-Learning, Multi-Agent Deep Q-Learning)进行了比较,以显示其优势。更一般地说,我们的方法也可以扩展到基于图像/连续的行动空间环境。
Aerial-DeepSearch: Distributed Multi-Agent Deep Reinforcement Learning for Search Missions
Search and Rescue (SAR) is an important part of several applications of national and social interest. Existing solutions for search missions in both terrestrial and aerial domains are mostly limited to single agent and specific environments; however, search missions can significantly benefit from the use of multiple agents that can quickly adapt to new environments. In this paper, we propose a framework based on Multi-Agent Deep Reinforcement Learning (MADRL) that realizes the actor-critic framework in a distributed manner for coordinating multiple Unmanned Aerial Vehicles (UAVs) in the exploration of unknown regions. One of the original aspects of our work is that the actors represent simulated or actual UAVs exploring the environment in parallel instead of traditional computer threads. Also, we propose addition of Long Short Term Memory (LSTM) neural network layers to the actor and critic architectures to handle imperfect communication and partial observability scenarios. The proposed approach has been evaluated in a grid world and has been compared against other competing algorithms such as Multi-Agent Q-Learning, Multi-Agent Deep Q-Learning to show its advantages. More generally, our approach could be extended to image-based/continuous action space environments as well.