Model-Free Deep Reinforcement Learning with Multiple Line-of-Sight Guidance Laws for Autonomous Underwater Vehicles Full-Attitude and Velocity Control

IF 6.1 Q1 AUTOMATION & CONTROL SYSTEMS

Advanced intelligent systems (Weinheim an der Bergstrasse, Germany) Pub Date : 2025-06-30 DOI:10.1002/aisy.202400991

Chengren Yuan, Changgeng Shuai, Zhanshuo Zhang, Jianguo Ma, Yuan Fang, YuChen Sun

{"title":"Model-Free Deep Reinforcement Learning with Multiple Line-of-Sight Guidance Laws for Autonomous Underwater Vehicles Full-Attitude and Velocity Control","authors":"Chengren Yuan, Changgeng Shuai, Zhanshuo Zhang, Jianguo Ma, Yuan Fang, YuChen Sun","doi":"10.1002/aisy.202400991","DOIUrl":null,"url":null,"abstract":"<p>Autonomous underwater vehicles (AUVs) are increasingly utilized, driving the need for enhanced autonomy. Conventional proportional–integral–derivative (PID) algorithms require frequent control parameter adjustments under varying voyage conditions, which increases operational and experimental costs. To address this issue, a multiple line-of-sight guidance law integrated with a deep reinforcement learning control framework is proposed. This framework enables seamless switching among guidance modes, such as waypoint following, path following, and trajectory tracking, to achieve optimal attitude control. For comprehensive control of roll, pitch, yaw, and longitudinal velocity, an augmented-twin delayed deep deterministic policy gradient (A-TD3) algorithm streamlines the training of the control agent. It enables adaptation to large-range attitude variations using small-scale training data, thereby reducing computational costs for diverse missions. Simulations demonstrate the efficacy of the proposed approach: A-TD3 improves training speed by 30.8% while mitigating issues such as excessive rudder motion, poor operability, and high energy consumption across different missions. The attitude control experiments on the X-AUV prototype validate that A-TD3's control performance with PID method.</p>","PeriodicalId":93858,"journal":{"name":"Advanced intelligent systems (Weinheim an der Bergstrasse, Germany)","volume":"7 8","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://advanced.onlinelibrary.wiley.com/doi/epdf/10.1002/aisy.202400991","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced intelligent systems (Weinheim an der Bergstrasse, Germany)","FirstCategoryId":"1085","ListUrlMain":"https://advanced.onlinelibrary.wiley.com/doi/10.1002/aisy.202400991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Autonomous underwater vehicles (AUVs) are increasingly utilized, driving the need for enhanced autonomy. Conventional proportional–integral–derivative (PID) algorithms require frequent control parameter adjustments under varying voyage conditions, which increases operational and experimental costs. To address this issue, a multiple line-of-sight guidance law integrated with a deep reinforcement learning control framework is proposed. This framework enables seamless switching among guidance modes, such as waypoint following, path following, and trajectory tracking, to achieve optimal attitude control. For comprehensive control of roll, pitch, yaw, and longitudinal velocity, an augmented-twin delayed deep deterministic policy gradient (A-TD3) algorithm streamlines the training of the control agent. It enables adaptation to large-range attitude variations using small-scale training data, thereby reducing computational costs for diverse missions. Simulations demonstrate the efficacy of the proposed approach: A-TD3 improves training speed by 30.8% while mitigating issues such as excessive rudder motion, poor operability, and high energy consumption across different missions. The attitude control experiments on the X-AUV prototype validate that A-TD3's control performance with PID method.

Abstract Image

查看原文本刊更多论文

基于多视距制导律的无模型深度强化学习自主水下航行器全姿态速度控制

自主水下航行器（auv）被越来越多地使用，推动了对增强自主性的需求。传统的比例-积分-导数（PID）算法需要在不同航行条件下频繁调整控制参数，这增加了操作和实验成本。为了解决这一问题，提出了一种结合深度强化学习控制框架的多视距制导律。该框架能够在导航模式（如航路点跟踪、路径跟踪和轨迹跟踪）之间无缝切换，以实现最佳姿态控制。对于横摇、俯仰、偏航和纵向速度的综合控制，增强双延迟深度确定性策略梯度（A-TD3）算法简化了控制代理的训练。它能够使用小规模训练数据适应大范围的姿态变化，从而减少不同任务的计算成本。仿真结果证明了该方法的有效性：A-TD3将训练速度提高了30.8%，同时缓解了不同任务中方向舵运动过度、操作性差和高能耗等问题。通过对X-AUV原型机的姿态控制实验，验证了采用PID方法的A-TD3的控制性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊