On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
{"title":"On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration","authors":"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub","doi":"arxiv-2409.11058","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\nfields, including precision agriculture, search and rescue, and remote sensing.\nHowever, exploring unknown environments remains a significant challenge. This\nstudy aims to address this challenge by utilizing on-policy Reinforcement\nLearning (RL) with Proximal Policy Optimization (PPO) to explore the {two\ndimensional} area of interest with multiple UAVs. The UAVs will avoid collision\nwith obstacles and each other and do the exploration in a distributed manner.\nThe proposed solution includes actor-critic networks using deep convolutional\nneural networks {(CNN)} and long short-term memory (LSTM) for identifying the\nUAVs and areas that have already been covered. Compared to other RL techniques,\nsuch as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\nsimulation results demonstrate the superiority of the proposed PPO approach.\nAlso, the results show that combining LSTM with CNN in critic can improve\nexploration. Since the proposed exploration has to work in unknown\nenvironments, the results showed that the proposed setup can complete the\ncoverage when we have new maps that differ from the trained maps. Finally, we\nshowed how tuning hyper parameters may affect the overall performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the superiority of the proposed PPO approach. Also, the results show that combining LSTM with CNN in critic can improve exploration. Since the proposed exploration has to work in unknown environments, the results showed that the proposed setup can complete the coverage when we have new maps that differ from the trained maps. Finally, we showed how tuning hyper parameters may affect the overall performance.
用于多无人飞行器探索的政策上行动者批判强化学习
无人驾驶飞行器(UAV)在精准农业、搜救和遥感等各个领域越来越受欢迎。然而,探索未知环境仍然是一项重大挑战。本研究旨在利用策略强化学习(RL)和近端策略优化(PPO)来解决这一难题,利用多架无人飞行器探索{二维}感兴趣的区域。所提出的解决方案包括使用深度卷积神经网络 {(CNN)} 和长短期记忆(LSTM)的行动者批判网络,用于识别无人机和已覆盖区域。与其他 RL 技术(如策略梯度(PG)和异步优势行动者批判(A3C))相比,仿真结果证明了所提出的 PPO 方法的优越性。由于提议的探索必须在未知环境中工作,结果表明,当我们获得与训练地图不同的新地图时,提议的设置可以完成覆盖。最后,我们展示了调整超参数会如何影响整体性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信