基于随机运动轨迹数据集和离线强化学习的欠驱动自主地面车辆路径跟踪控制

IF 11.8 1区 工程技术 Q1 ENGINEERING, MARINE
Zhiyao Li , Yiming Zhu , Yiting Wang , Yong Zhang , Lei Wang
{"title":"基于随机运动轨迹数据集和离线强化学习的欠驱动自主地面车辆路径跟踪控制","authors":"Zhiyao Li ,&nbsp;Yiming Zhu ,&nbsp;Yiting Wang ,&nbsp;Yong Zhang ,&nbsp;Lei Wang","doi":"10.1016/j.joes.2024.11.001","DOIUrl":null,"url":null,"abstract":"<div><div>To solve the path following problem in navigation tasks for under-actuated autonomous surface vehicles (ASVs), this paper proposed a path following control method which combines trajectory dataset of random ship motion and offline reinforcement learning (RM-ORL). The method does not require the reinforcement learning (RL) agent to interact with the environment while training the policy, and it can obtain training datasets with a lower cost. In RM-ORL, the irregular motion data of the ASV in open water is first collected. Then the desired path is reconstructed using the B-spline function and the path points along the motion trajectories. Thus the offline dataset will be enhanced with the motion data and the new path. Finally, the conservative Q-learning algorithm is utilized to train the path following controller. The path deviation in simulation maps, rudder data and ship motion parameters of RM-ORL, online RL and other offline RL policies trained on different datasets are compared. The simulation results illustrate that the RM-ORL achieves comparable path following accuracy to that of online RL agent and offline RL agent trained on expert data, while surpassing the one trained on online agent replay buffer data. The rudder steering amplitude of RM-ORL is also smaller than that of other policies, which verifies the effectiveness of our method applied to the path following control of under-actuated ASV.</div></div>","PeriodicalId":48514,"journal":{"name":"Journal of Ocean Engineering and Science","volume":"10 5","pages":"Pages 724-744"},"PeriodicalIF":11.8000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Path following control of under-actuated autonomous surface vehicle based on random motion trajectory dataset and offline reinforcement learning\",\"authors\":\"Zhiyao Li ,&nbsp;Yiming Zhu ,&nbsp;Yiting Wang ,&nbsp;Yong Zhang ,&nbsp;Lei Wang\",\"doi\":\"10.1016/j.joes.2024.11.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To solve the path following problem in navigation tasks for under-actuated autonomous surface vehicles (ASVs), this paper proposed a path following control method which combines trajectory dataset of random ship motion and offline reinforcement learning (RM-ORL). The method does not require the reinforcement learning (RL) agent to interact with the environment while training the policy, and it can obtain training datasets with a lower cost. In RM-ORL, the irregular motion data of the ASV in open water is first collected. Then the desired path is reconstructed using the B-spline function and the path points along the motion trajectories. Thus the offline dataset will be enhanced with the motion data and the new path. Finally, the conservative Q-learning algorithm is utilized to train the path following controller. The path deviation in simulation maps, rudder data and ship motion parameters of RM-ORL, online RL and other offline RL policies trained on different datasets are compared. The simulation results illustrate that the RM-ORL achieves comparable path following accuracy to that of online RL agent and offline RL agent trained on expert data, while surpassing the one trained on online agent replay buffer data. The rudder steering amplitude of RM-ORL is also smaller than that of other policies, which verifies the effectiveness of our method applied to the path following control of under-actuated ASV.</div></div>\",\"PeriodicalId\":48514,\"journal\":{\"name\":\"Journal of Ocean Engineering and Science\",\"volume\":\"10 5\",\"pages\":\"Pages 724-744\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2024-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Ocean Engineering and Science\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468013324000639\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MARINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ocean Engineering and Science","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468013324000639","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MARINE","Score":null,"Total":0}
引用次数: 0

摘要

为解决欠驱动自主水面车辆导航任务中的路径跟踪问题,提出了一种结合船舶随机运动轨迹数据集和离线强化学习(RM-ORL)的路径跟踪控制方法。该方法在训练策略时不需要强化学习(RL)代理与环境交互,并且可以以较低的成本获得训练数据集。在RM-ORL中,首先采集ASV在开阔水域的不规则运动数据。然后利用b样条函数和沿运动轨迹的路径点重构期望路径。因此,离线数据集将通过运动数据和新的路径得到增强。最后,利用保守q学习算法训练路径跟随控制器。比较了在不同数据集上训练的RM-ORL、在线RL和其他离线RL策略的仿真地图、舵数据和船舶运动参数的路径偏差。仿真结果表明,RM-ORL的路径跟踪精度与基于专家数据训练的在线RL智能体和离线RL智能体相当,优于基于在线智能体重播缓冲数据训练的路径跟踪精度。RM-ORL策略的方向舵转向幅值也小于其他策略,验证了该方法应用于欠驱动ASV路径跟随控制的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Path following control of under-actuated autonomous surface vehicle based on random motion trajectory dataset and offline reinforcement learning
To solve the path following problem in navigation tasks for under-actuated autonomous surface vehicles (ASVs), this paper proposed a path following control method which combines trajectory dataset of random ship motion and offline reinforcement learning (RM-ORL). The method does not require the reinforcement learning (RL) agent to interact with the environment while training the policy, and it can obtain training datasets with a lower cost. In RM-ORL, the irregular motion data of the ASV in open water is first collected. Then the desired path is reconstructed using the B-spline function and the path points along the motion trajectories. Thus the offline dataset will be enhanced with the motion data and the new path. Finally, the conservative Q-learning algorithm is utilized to train the path following controller. The path deviation in simulation maps, rudder data and ship motion parameters of RM-ORL, online RL and other offline RL policies trained on different datasets are compared. The simulation results illustrate that the RM-ORL achieves comparable path following accuracy to that of online RL agent and offline RL agent trained on expert data, while surpassing the one trained on online agent replay buffer data. The rudder steering amplitude of RM-ORL is also smaller than that of other policies, which verifies the effectiveness of our method applied to the path following control of under-actuated ASV.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
11.50
自引率
19.70%
发文量
224
审稿时长
29 days
期刊介绍: The Journal of Ocean Engineering and Science (JOES) serves as a platform for disseminating original research and advancements in the realm of ocean engineering and science. JOES encourages the submission of papers covering various aspects of ocean engineering and science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信