Path following control of under-actuated autonomous surface vehicle based on random motion trajectory dataset and offline reinforcement learning

IF 11.8 1区工程技术 Q1 ENGINEERING, MARINE

Journal of Ocean Engineering and Science Pub Date : 2024-11-16 DOI:10.1016/j.joes.2024.11.001

Zhiyao Li , Yiming Zhu , Yiting Wang , Yong Zhang , Lei Wang

{"title":"Path following control of under-actuated autonomous surface vehicle based on random motion trajectory dataset and offline reinforcement learning","authors":"Zhiyao Li , Yiming Zhu , Yiting Wang , Yong Zhang , Lei Wang","doi":"10.1016/j.joes.2024.11.001","DOIUrl":null,"url":null,"abstract":"<div><div>To solve the path following problem in navigation tasks for under-actuated autonomous surface vehicles (ASVs), this paper proposed a path following control method which combines trajectory dataset of random ship motion and offline reinforcement learning (RM-ORL). The method does not require the reinforcement learning (RL) agent to interact with the environment while training the policy, and it can obtain training datasets with a lower cost. In RM-ORL, the irregular motion data of the ASV in open water is first collected. Then the desired path is reconstructed using the B-spline function and the path points along the motion trajectories. Thus the offline dataset will be enhanced with the motion data and the new path. Finally, the conservative Q-learning algorithm is utilized to train the path following controller. The path deviation in simulation maps, rudder data and ship motion parameters of RM-ORL, online RL and other offline RL policies trained on different datasets are compared. The simulation results illustrate that the RM-ORL achieves comparable path following accuracy to that of online RL agent and offline RL agent trained on expert data, while surpassing the one trained on online agent replay buffer data. The rudder steering amplitude of RM-ORL is also smaller than that of other policies, which verifies the effectiveness of our method applied to the path following control of under-actuated ASV.</div></div>","PeriodicalId":48514,"journal":{"name":"Journal of Ocean Engineering and Science","volume":"10 5","pages":"Pages 724-744"},"PeriodicalIF":11.8000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ocean Engineering and Science","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468013324000639","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MARINE","Score":null,"Total":0}

引用次数: 0

Abstract

To solve the path following problem in navigation tasks for under-actuated autonomous surface vehicles (ASVs), this paper proposed a path following control method which combines trajectory dataset of random ship motion and offline reinforcement learning (RM-ORL). The method does not require the reinforcement learning (RL) agent to interact with the environment while training the policy, and it can obtain training datasets with a lower cost. In RM-ORL, the irregular motion data of the ASV in open water is first collected. Then the desired path is reconstructed using the B-spline function and the path points along the motion trajectories. Thus the offline dataset will be enhanced with the motion data and the new path. Finally, the conservative Q-learning algorithm is utilized to train the path following controller. The path deviation in simulation maps, rudder data and ship motion parameters of RM-ORL, online RL and other offline RL policies trained on different datasets are compared. The simulation results illustrate that the RM-ORL achieves comparable path following accuracy to that of online RL agent and offline RL agent trained on expert data, while surpassing the one trained on online agent replay buffer data. The rudder steering amplitude of RM-ORL is also smaller than that of other policies, which verifies the effectiveness of our method applied to the path following control of under-actuated ASV.

查看原文本刊更多论文

基于随机运动轨迹数据集和离线强化学习的欠驱动自主地面车辆路径跟踪控制

为解决欠驱动自主水面车辆导航任务中的路径跟踪问题，提出了一种结合船舶随机运动轨迹数据集和离线强化学习（RM-ORL）的路径跟踪控制方法。该方法在训练策略时不需要强化学习（RL）代理与环境交互，并且可以以较低的成本获得训练数据集。在RM-ORL中，首先采集ASV在开阔水域的不规则运动数据。然后利用b样条函数和沿运动轨迹的路径点重构期望路径。因此，离线数据集将通过运动数据和新的路径得到增强。最后，利用保守q学习算法训练路径跟随控制器。比较了在不同数据集上训练的RM-ORL、在线RL和其他离线RL策略的仿真地图、舵数据和船舶运动参数的路径偏差。仿真结果表明，RM-ORL的路径跟踪精度与基于专家数据训练的在线RL智能体和离线RL智能体相当，优于基于在线智能体重播缓冲数据训练的路径跟踪精度。RM-ORL策略的方向舵转向幅值也小于其他策略，验证了该方法应用于欠驱动ASV路径跟随控制的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Ocean Engineering and Science Multiple-

CiteScore

11.50

自引率

19.70%

发文量

224

审稿时长

29 days

期刊介绍： The Journal of Ocean Engineering and Science (JOES) serves as a platform for disseminating original research and advancements in the realm of ocean engineering and science. JOES encourages the submission of papers covering various aspects of ocean engineering and science.