{"title":"LSTM-DPPO based deep reinforcement learning controller for path following optimization of unmanned surface vehicle","authors":"Xia Jiawei;Zhu Xufang;Liu Zhong;Xia Qingtao","doi":"10.23919/JSEE.2023.000113","DOIUrl":null,"url":null,"abstract":"To solve the path following control problem for unmanned surface vehicles (USVs), a control method based on deep reinforcement learning (DRL) with long short-term memory (LSTM) networks is proposed. A distributed proximal policy optimization (DPPO) algorithm, which is a modified actorcritic-based type of reinforcement learning algorithm, is adapted to improve the controller performance in repeated trials. The LSTM network structure is introduced to solve the strong temporal correlation USV control problem. In addition, a specially designed path dataset, including straight and curved paths, is established to simulate various sailing scenarios so that the reinforcement learning controller can obtain as much handling experience as possible. Extensive numerical simulation results demonstrate that the proposed method has better control performance under missions involving complex maneuvers than trained with limited scenarios and can potentially be applied in practice.","PeriodicalId":50030,"journal":{"name":"Journal of Systems Engineering and Electronics","volume":"34 5","pages":"1343-1358"},"PeriodicalIF":1.9000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Engineering and Electronics","FirstCategoryId":"1087","ListUrlMain":"https://ieeexplore.ieee.org/document/10308771/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
To solve the path following control problem for unmanned surface vehicles (USVs), a control method based on deep reinforcement learning (DRL) with long short-term memory (LSTM) networks is proposed. A distributed proximal policy optimization (DPPO) algorithm, which is a modified actorcritic-based type of reinforcement learning algorithm, is adapted to improve the controller performance in repeated trials. The LSTM network structure is introduced to solve the strong temporal correlation USV control problem. In addition, a specially designed path dataset, including straight and curved paths, is established to simulate various sailing scenarios so that the reinforcement learning controller can obtain as much handling experience as possible. Extensive numerical simulation results demonstrate that the proposed method has better control performance under missions involving complex maneuvers than trained with limited scenarios and can potentially be applied in practice.