Formation control and intention compensating of AUVs using multi-agent reinforcement learning and predict network

IF 5.5 2区工程技术 Q1 ENGINEERING, CIVIL

Ocean Engineering Pub Date : 2025-10-03 DOI:10.1016/j.oceaneng.2025.122854

Mengqi Wang , Rongshun Juan , Zezhong Li , Zhongke Gao

{"title":"Formation control and intention compensating of AUVs using multi-agent reinforcement learning and predict network","authors":"Mengqi Wang , Rongshun Juan , Zezhong Li , Zhongke Gao","doi":"10.1016/j.oceaneng.2025.122854","DOIUrl":null,"url":null,"abstract":"<div><div>Autonomous Underwater Vehicles (AUVs) have played an important role in numerous marine tasks, such as resource exploration, hydrological data acquisition, rescue operations, and military missions. In contrast to single AUV deployment, multi-AUV formations exhibit higher efficiency and improved task completion rates. Recently, multi-agent reinforcement learning (MARL) has emerged as a promising technique for AUV formation control. Nevertheless, conventional MARL approaches often suffer from instability in formation shapes, especially when managing a large number of AUVs. Additionally, communication delay and information dropout can further compromise formation performance. In this paper, we propose a novel method called Policy Compensate Multi-agent Twin Delayed Deep Deterministic Policy Gradient (PC-MATD3), which integrates imitation learning (IL) with MARL to improve formation stability. The proposed framework is designed to alleviate adverse effects caused by communication interruptions or information delays. We define distance and angular errors as key performance metrics and evaluate our method through two distinct simulation scenarios. Experimental results show that, under ideal communication conditions, our approach substantially reduces formation errors and improves overall stability. Additionally, in scenarios involving communication dropouts, the proposed method effectively predicts the positions of neighboring AUVs, enabling the restoration of the desired formation geometry.</div></div>","PeriodicalId":19403,"journal":{"name":"Ocean Engineering","volume":"342 ","pages":"Article 122854"},"PeriodicalIF":5.5000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ocean Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0029801825025375","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

Abstract

Autonomous Underwater Vehicles (AUVs) have played an important role in numerous marine tasks, such as resource exploration, hydrological data acquisition, rescue operations, and military missions. In contrast to single AUV deployment, multi-AUV formations exhibit higher efficiency and improved task completion rates. Recently, multi-agent reinforcement learning (MARL) has emerged as a promising technique for AUV formation control. Nevertheless, conventional MARL approaches often suffer from instability in formation shapes, especially when managing a large number of AUVs. Additionally, communication delay and information dropout can further compromise formation performance. In this paper, we propose a novel method called Policy Compensate Multi-agent Twin Delayed Deep Deterministic Policy Gradient (PC-MATD3), which integrates imitation learning (IL) with MARL to improve formation stability. The proposed framework is designed to alleviate adverse effects caused by communication interruptions or information delays. We define distance and angular errors as key performance metrics and evaluate our method through two distinct simulation scenarios. Experimental results show that, under ideal communication conditions, our approach substantially reduces formation errors and improves overall stability. Additionally, in scenarios involving communication dropouts, the proposed method effectively predicts the positions of neighboring AUVs, enabling the restoration of the desired formation geometry.

查看原文本刊更多论文

基于多智能体强化学习与预测网络的auv群体控制与意向补偿

自主水下航行器（auv）在资源勘探、水文数据采集、救援行动和军事任务等众多海洋任务中发挥着重要作用。与单AUV部署相比，多AUV编队具有更高的效率和更高的任务完成率。近年来，多智能体强化学习（MARL）作为一种很有前途的水下机器人编队控制技术应运而生。然而，传统的MARL方法往往存在地层形状不稳定的问题，特别是在管理大量auv时。此外，通信延迟和信息丢失会进一步影响编队性能。本文提出了一种新颖的策略补偿多智能体双延迟深度确定性策略梯度（PC-MATD3）方法，该方法将模仿学习（IL）与MARL相结合，以提高编队稳定性。提出的框架旨在减轻通信中断或信息延迟造成的不利影响。我们将距离和角度误差定义为关键性能指标，并通过两个不同的仿真场景评估我们的方法。实验结果表明，在理想的通信条件下，我们的方法大大减少了地层误差，提高了整体稳定性。此外，在涉及通信中断的情况下，该方法可以有效地预测邻近auv的位置，从而恢复所需的地层几何形状。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ocean Engineering 工程技术-工程：大洋

CiteScore

7.30

自引率

34.00%

发文量

2379

审稿时长

8.1 months

期刊介绍： Ocean Engineering provides a medium for the publication of original research and development work in the field of ocean engineering. Ocean Engineering seeks papers in the following topics.