DWAS-RL: A safety-efficiency balanced reinforcement learning approach for path planning of Unmanned Surface Vehicles in complex marine environments

IF 4.6 2区工程技术 Q1 ENGINEERING, CIVIL

Ocean Engineering Pub Date : 2025-02-01 DOI:10.1016/j.oceaneng.2024.119641

Tianci Qu , Gang Xiong , Hub Ali , Xisong Dong , Yunjun Han , Zhen Shen , Fei-Yue Wang

{"title":"DWAS-RL: A safety-efficiency balanced reinforcement learning approach for path planning of Unmanned Surface Vehicles in complex marine environments","authors":"Tianci Qu , Gang Xiong , Hub Ali , Xisong Dong , Yunjun Han , Zhen Shen , Fei-Yue Wang","doi":"10.1016/j.oceaneng.2024.119641","DOIUrl":null,"url":null,"abstract":"<div><div>Navigating autonomous surface vehicles in dynamic marine environments, where uncertainties and disturbances like static or moving obstacles, ocean currents, and waves abound, poses a formidable challenge. Recent advancements in Deep Reinforcement Learning (DRL) have shown promising results in terms of adaptivity and timeliness through interaction with the environment. However, effectively addressing zero safety violations while achieving sample efficiency remains a dual challenge in practical applications. In this paper, we strive to ensure both safety and learning efficiency by combining the advantages of the Dynamic Window Approach (DWA) and safe reinforcement learning. First, a customized simulator for diverse marine conditions is developed, where various types of marine scenarios and algorithms are trained and testified. Then, the problem is formulated as a constrained Markov decision process and the DWA-based safe RL (DWAS-RL) approach is proposed. Specifically, to guarantee safety in the exploration process, we utilize DWA to observe and generate prudent actions by predicting potential near-future hazards, then utilize the safe RL framework for exploration and training. To improve sample efficiency, the technique called Hindsight Experience Replay is utilized to accelerate the training process. Simulation experiments demonstrate the effectiveness of our approach on the metrics of kinematics performance, safety and sample efficiency compared to the state-of-the-art DRL algorithms. These findings highlight the robustness and superiority of our approach, suggesting that our approach holds promise for addressing challenges in complex marine environments.</div></div>","PeriodicalId":19403,"journal":{"name":"Ocean Engineering","volume":"317 ","pages":"Article 119641"},"PeriodicalIF":4.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ocean Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0029801824029792","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

Abstract

Navigating autonomous surface vehicles in dynamic marine environments, where uncertainties and disturbances like static or moving obstacles, ocean currents, and waves abound, poses a formidable challenge. Recent advancements in Deep Reinforcement Learning (DRL) have shown promising results in terms of adaptivity and timeliness through interaction with the environment. However, effectively addressing zero safety violations while achieving sample efficiency remains a dual challenge in practical applications. In this paper, we strive to ensure both safety and learning efficiency by combining the advantages of the Dynamic Window Approach (DWA) and safe reinforcement learning. First, a customized simulator for diverse marine conditions is developed, where various types of marine scenarios and algorithms are trained and testified. Then, the problem is formulated as a constrained Markov decision process and the DWA-based safe RL (DWAS-RL) approach is proposed. Specifically, to guarantee safety in the exploration process, we utilize DWA to observe and generate prudent actions by predicting potential near-future hazards, then utilize the safe RL framework for exploration and training. To improve sample efficiency, the technique called Hindsight Experience Replay is utilized to accelerate the training process. Simulation experiments demonstrate the effectiveness of our approach on the metrics of kinematics performance, safety and sample efficiency compared to the state-of-the-art DRL algorithms. These findings highlight the robustness and superiority of our approach, suggesting that our approach holds promise for addressing challenges in complex marine environments.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Ocean Engineering 工程技术-工程：大洋

CiteScore

7.30

自引率

34.00%

发文量

2379

审稿时长

8.1 months

期刊介绍： Ocean Engineering provides a medium for the publication of original research and development work in the field of ocean engineering. Ocean Engineering seeks papers in the following topics.