End-to-End control of USV swarm using graph centric Multi-Agent Reinforcement Learning

Kanghoon Lee, Kyuree Ahn, Jinkyoo Park
{"title":"End-to-End control of USV swarm using graph centric Multi-Agent Reinforcement Learning","authors":"Kanghoon Lee, Kyuree Ahn, Jinkyoo Park","doi":"10.23919/ICCAS52745.2021.9649839","DOIUrl":null,"url":null,"abstract":"The Unmanned Surface Vehicles (USVs), which operate without a person at the surface, are used in various naval defense missions. Various missions can be conducted efficiently when a swarm of USVs are operated at the same time. However, it is challenging to establish a decentralised control strategy for all USVs. In addition, the strategy must consider various external factors, such as the ocean topography and the number of enemy forces. These difficulties necessitate a scalable and transferable decision-making module. This study proposes an algorithm to derive the decentralised and cooperative control strategy for the USV swarm using graph centric multi-agent reinforcement learning (MARL). The model first expresses the mission situation using a graph considering the various sensor ranges. Each USV agent encodes observed information into localized embedding and then derives coordinated action through communication with the surrounding agent. To derive a cooperative policy, we trained each agent's policy to maximize the team reward. Using the modified prey-predator environment of OpenAI gym, we have analyzed the effect of each component of the proposed model (state embedding, communication, and team reward). The ablation study shows that the proposed model could derive a scalable and transferable control policy of USVs, consistently achieving the highest win ratio.","PeriodicalId":411064,"journal":{"name":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS52745.2021.9649839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The Unmanned Surface Vehicles (USVs), which operate without a person at the surface, are used in various naval defense missions. Various missions can be conducted efficiently when a swarm of USVs are operated at the same time. However, it is challenging to establish a decentralised control strategy for all USVs. In addition, the strategy must consider various external factors, such as the ocean topography and the number of enemy forces. These difficulties necessitate a scalable and transferable decision-making module. This study proposes an algorithm to derive the decentralised and cooperative control strategy for the USV swarm using graph centric multi-agent reinforcement learning (MARL). The model first expresses the mission situation using a graph considering the various sensor ranges. Each USV agent encodes observed information into localized embedding and then derives coordinated action through communication with the surrounding agent. To derive a cooperative policy, we trained each agent's policy to maximize the team reward. Using the modified prey-predator environment of OpenAI gym, we have analyzed the effect of each component of the proposed model (state embedding, communication, and team reward). The ablation study shows that the proposed model could derive a scalable and transferable control policy of USVs, consistently achieving the highest win ratio.
基于图中心多智能体强化学习的USV群端到端控制
无人水面航行器(usv)在水面上无人操作,用于各种海军防御任务。当一群无人潜航器同时操作时,可以有效地执行各种任务。然而,为所有无人潜航器建立一个分散的控制策略是具有挑战性的。此外,战略必须考虑各种外部因素,如海洋地形和敌人的数量。这些困难需要一个可扩展和可转移的决策模块。本文提出了一种基于以图为中心的多智能体强化学习(MARL)的USV群分散协同控制策略。该模型首先用考虑不同传感器距离的图来表示任务情况。每个USV代理将观察到的信息编码成局部嵌入,然后通过与周围代理的通信派生出协调行动。为了得到合作策略,我们训练每个代理的策略以最大化团队奖励。利用改进的OpenAI gym的捕食环境,我们分析了所提出模型的各个组成部分(状态嵌入、通信和团队奖励)的效果。烧蚀研究表明,所提出的模型可以推导出可扩展和可转移的usv控制策略,始终如一地实现最高胜率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信