Formation control and intention compensating of AUVs using multi-agent reinforcement learning and predict network

IF 5.5 2区 工程技术 Q1 ENGINEERING, CIVIL
Mengqi Wang , Rongshun Juan , Zezhong Li , Zhongke Gao
{"title":"Formation control and intention compensating of AUVs using multi-agent reinforcement learning and predict network","authors":"Mengqi Wang ,&nbsp;Rongshun Juan ,&nbsp;Zezhong Li ,&nbsp;Zhongke Gao","doi":"10.1016/j.oceaneng.2025.122854","DOIUrl":null,"url":null,"abstract":"<div><div>Autonomous Underwater Vehicles (AUVs) have played an important role in numerous marine tasks, such as resource exploration, hydrological data acquisition, rescue operations, and military missions. In contrast to single AUV deployment, multi-AUV formations exhibit higher efficiency and improved task completion rates. Recently, multi-agent reinforcement learning (MARL) has emerged as a promising technique for AUV formation control. Nevertheless, conventional MARL approaches often suffer from instability in formation shapes, especially when managing a large number of AUVs. Additionally, communication delay and information dropout can further compromise formation performance. In this paper, we propose a novel method called Policy Compensate Multi-agent Twin Delayed Deep Deterministic Policy Gradient (PC-MATD3), which integrates imitation learning (IL) with MARL to improve formation stability. The proposed framework is designed to alleviate adverse effects caused by communication interruptions or information delays. We define distance and angular errors as key performance metrics and evaluate our method through two distinct simulation scenarios. Experimental results show that, under ideal communication conditions, our approach substantially reduces formation errors and improves overall stability. Additionally, in scenarios involving communication dropouts, the proposed method effectively predicts the positions of neighboring AUVs, enabling the restoration of the desired formation geometry.</div></div>","PeriodicalId":19403,"journal":{"name":"Ocean Engineering","volume":"342 ","pages":"Article 122854"},"PeriodicalIF":5.5000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ocean Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0029801825025375","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0

Abstract

Autonomous Underwater Vehicles (AUVs) have played an important role in numerous marine tasks, such as resource exploration, hydrological data acquisition, rescue operations, and military missions. In contrast to single AUV deployment, multi-AUV formations exhibit higher efficiency and improved task completion rates. Recently, multi-agent reinforcement learning (MARL) has emerged as a promising technique for AUV formation control. Nevertheless, conventional MARL approaches often suffer from instability in formation shapes, especially when managing a large number of AUVs. Additionally, communication delay and information dropout can further compromise formation performance. In this paper, we propose a novel method called Policy Compensate Multi-agent Twin Delayed Deep Deterministic Policy Gradient (PC-MATD3), which integrates imitation learning (IL) with MARL to improve formation stability. The proposed framework is designed to alleviate adverse effects caused by communication interruptions or information delays. We define distance and angular errors as key performance metrics and evaluate our method through two distinct simulation scenarios. Experimental results show that, under ideal communication conditions, our approach substantially reduces formation errors and improves overall stability. Additionally, in scenarios involving communication dropouts, the proposed method effectively predicts the positions of neighboring AUVs, enabling the restoration of the desired formation geometry.
基于多智能体强化学习与预测网络的auv群体控制与意向补偿
自主水下航行器(auv)在资源勘探、水文数据采集、救援行动和军事任务等众多海洋任务中发挥着重要作用。与单AUV部署相比,多AUV编队具有更高的效率和更高的任务完成率。近年来,多智能体强化学习(MARL)作为一种很有前途的水下机器人编队控制技术应运而生。然而,传统的MARL方法往往存在地层形状不稳定的问题,特别是在管理大量auv时。此外,通信延迟和信息丢失会进一步影响编队性能。本文提出了一种新颖的策略补偿多智能体双延迟深度确定性策略梯度(PC-MATD3)方法,该方法将模仿学习(IL)与MARL相结合,以提高编队稳定性。提出的框架旨在减轻通信中断或信息延迟造成的不利影响。我们将距离和角度误差定义为关键性能指标,并通过两个不同的仿真场景评估我们的方法。实验结果表明,在理想的通信条件下,我们的方法大大减少了地层误差,提高了整体稳定性。此外,在涉及通信中断的情况下,该方法可以有效地预测邻近auv的位置,从而恢复所需的地层几何形状。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ocean Engineering
Ocean Engineering 工程技术-工程:大洋
CiteScore
7.30
自引率
34.00%
发文量
2379
审稿时长
8.1 months
期刊介绍: Ocean Engineering provides a medium for the publication of original research and development work in the field of ocean engineering. Ocean Engineering seeks papers in the following topics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信