Model-Free Deep Reinforcement Learning with Multiple Line-of-Sight Guidance Laws for Autonomous Underwater Vehicles Full-Attitude and Velocity Control

IF 6.1 Q1 AUTOMATION & CONTROL SYSTEMS
Chengren Yuan, Changgeng Shuai, Zhanshuo Zhang, Jianguo Ma, Yuan Fang, YuChen Sun
{"title":"Model-Free Deep Reinforcement Learning with Multiple Line-of-Sight Guidance Laws for Autonomous Underwater Vehicles Full-Attitude and Velocity Control","authors":"Chengren Yuan,&nbsp;Changgeng Shuai,&nbsp;Zhanshuo Zhang,&nbsp;Jianguo Ma,&nbsp;Yuan Fang,&nbsp;YuChen Sun","doi":"10.1002/aisy.202400991","DOIUrl":null,"url":null,"abstract":"<p>Autonomous underwater vehicles (AUVs) are increasingly utilized, driving the need for enhanced autonomy. Conventional proportional–integral–derivative (PID) algorithms require frequent control parameter adjustments under varying voyage conditions, which increases operational and experimental costs. To address this issue, a multiple line-of-sight guidance law integrated with a deep reinforcement learning control framework is proposed. This framework enables seamless switching among guidance modes, such as waypoint following, path following, and trajectory tracking, to achieve optimal attitude control. For comprehensive control of roll, pitch, yaw, and longitudinal velocity, an augmented-twin delayed deep deterministic policy gradient (A-TD3) algorithm streamlines the training of the control agent. It enables adaptation to large-range attitude variations using small-scale training data, thereby reducing computational costs for diverse missions. Simulations demonstrate the efficacy of the proposed approach: A-TD3 improves training speed by 30.8% while mitigating issues such as excessive rudder motion, poor operability, and high energy consumption across different missions. The attitude control experiments on the X-AUV prototype validate that A-TD3's control performance with PID method.</p>","PeriodicalId":93858,"journal":{"name":"Advanced intelligent systems (Weinheim an der Bergstrasse, Germany)","volume":"7 8","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://advanced.onlinelibrary.wiley.com/doi/epdf/10.1002/aisy.202400991","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced intelligent systems (Weinheim an der Bergstrasse, Germany)","FirstCategoryId":"1085","ListUrlMain":"https://advanced.onlinelibrary.wiley.com/doi/10.1002/aisy.202400991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Autonomous underwater vehicles (AUVs) are increasingly utilized, driving the need for enhanced autonomy. Conventional proportional–integral–derivative (PID) algorithms require frequent control parameter adjustments under varying voyage conditions, which increases operational and experimental costs. To address this issue, a multiple line-of-sight guidance law integrated with a deep reinforcement learning control framework is proposed. This framework enables seamless switching among guidance modes, such as waypoint following, path following, and trajectory tracking, to achieve optimal attitude control. For comprehensive control of roll, pitch, yaw, and longitudinal velocity, an augmented-twin delayed deep deterministic policy gradient (A-TD3) algorithm streamlines the training of the control agent. It enables adaptation to large-range attitude variations using small-scale training data, thereby reducing computational costs for diverse missions. Simulations demonstrate the efficacy of the proposed approach: A-TD3 improves training speed by 30.8% while mitigating issues such as excessive rudder motion, poor operability, and high energy consumption across different missions. The attitude control experiments on the X-AUV prototype validate that A-TD3's control performance with PID method.

Abstract Image

Abstract Image

Abstract Image

Abstract Image

基于多视距制导律的无模型深度强化学习自主水下航行器全姿态速度控制
自主水下航行器(auv)被越来越多地使用,推动了对增强自主性的需求。传统的比例-积分-导数(PID)算法需要在不同航行条件下频繁调整控制参数,这增加了操作和实验成本。为了解决这一问题,提出了一种结合深度强化学习控制框架的多视距制导律。该框架能够在导航模式(如航路点跟踪、路径跟踪和轨迹跟踪)之间无缝切换,以实现最佳姿态控制。对于横摇、俯仰、偏航和纵向速度的综合控制,增强双延迟深度确定性策略梯度(A-TD3)算法简化了控制代理的训练。它能够使用小规模训练数据适应大范围的姿态变化,从而减少不同任务的计算成本。仿真结果证明了该方法的有效性:A-TD3将训练速度提高了30.8%,同时缓解了不同任务中方向舵运动过度、操作性差和高能耗等问题。通过对X-AUV原型机的姿态控制实验,验证了采用PID方法的A-TD3的控制性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.30
自引率
0.00%
发文量
0
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信