Reinforcement learning-based trajectory tracking optimal control for underactuated unmanned surface vehicles under asymmetric input saturation

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-10-04 DOI:10.1016/j.engappai.2025.112307

Ziping Wei, Jialu Du

{"title":"Reinforcement learning-based trajectory tracking optimal control for underactuated unmanned surface vehicles under asymmetric input saturation","authors":"Ziping Wei, Jialu Du","doi":"10.1016/j.engappai.2025.112307","DOIUrl":null,"url":null,"abstract":"<div><div>For underactuated unmanned surface vehicles (USVs) under asymmetric input saturation caused by thrust-limit characteristics, as well as unknown dynamics and ocean environmental disturbances, a trajectory tracking optimal control (TTOC) scheme is proposed using the reinforcement learning (RL) method. Through coordinate transformations and mathematical derivation, an underactuated USV motion model is transformed into the standard affine nonlinear form. To address the asymmetric input saturation of underactuated USVs, a new inverse hyperbolic tangent-type penalty function is designed for control inputs, relaxing the assumption of input saturation limits being symmetric. Based on RL methods and adaptive neural networks (NNs), an actor-critic NN framework is developed, with weight update laws designed for NNs. This framework learns the TTOC law for underactuated USVs through the online interaction of actor and critic NNs while adapting to unknown dynamics and disturbances. In particular, a robustifying term is designed and added to the output of an actor NN to compensate for the adverse effects of a lumped residual term, which enhances the robustness of the TTOC law and thereby achieves asymptotic regulation of trajectory tracking errors. Theoretical analyses and simulation results indicate that the proposed TTOC scheme enables underactuated USVs to asymptotically track the desired trajectory.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112307"},"PeriodicalIF":8.0000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625023152","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

For underactuated unmanned surface vehicles (USVs) under asymmetric input saturation caused by thrust-limit characteristics, as well as unknown dynamics and ocean environmental disturbances, a trajectory tracking optimal control (TTOC) scheme is proposed using the reinforcement learning (RL) method. Through coordinate transformations and mathematical derivation, an underactuated USV motion model is transformed into the standard affine nonlinear form. To address the asymmetric input saturation of underactuated USVs, a new inverse hyperbolic tangent-type penalty function is designed for control inputs, relaxing the assumption of input saturation limits being symmetric. Based on RL methods and adaptive neural networks (NNs), an actor-critic NN framework is developed, with weight update laws designed for NNs. This framework learns the TTOC law for underactuated USVs through the online interaction of actor and critic NNs while adapting to unknown dynamics and disturbances. In particular, a robustifying term is designed and added to the output of an actor NN to compensate for the adverse effects of a lumped residual term, which enhances the robustness of the TTOC law and thereby achieves asymptotic regulation of trajectory tracking errors. Theoretical analyses and simulation results indicate that the proposed TTOC scheme enables underactuated USVs to asymptotically track the desired trajectory.

Abstract Image

查看原文本刊更多论文

非对称输入饱和下欠驱动无人水面车辆基于强化学习的轨迹跟踪最优控制

针对欠驱动无人水面航行器（usv）在推力限制特性、未知动力学和海洋环境扰动引起的非对称输入饱和情况下，提出了一种基于强化学习（RL）方法的轨迹跟踪最优控制（TTOC）方案。通过坐标变换和数学推导，将欠驱动USV运动模型转化为标准仿射非线性形式。为了解决欠驱动usv的输入饱和不对称问题，设计了一种新的反双曲切线型惩罚函数，放宽了输入饱和极限对称的假设。在RL方法和自适应神经网络的基础上，提出了一个actor-critic神经网络框架，并设计了神经网络的权值更新规律。该框架通过actor和critic nn的在线交互学习欠驱动usv的TTOC规律，同时适应未知的动态和干扰。特别地，设计了一个鲁棒项并加入到参与者神经网络的输出中，以补偿集总残差项的不利影响，从而增强了TTOC律的鲁棒性，从而实现了对轨迹跟踪误差的渐近调节。理论分析和仿真结果表明，所提出的TTOC方案能够使欠驱动无人潜航器渐近跟踪期望轨迹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.