基于学习的AUV跟踪控制：混合策略改进和基于博弈的干扰抑制

IF 7.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology Pub Date : 2024-10-21 DOI:10.1049/cit2.12372

Jun Ye, Hongbo Gao, Manjiang Hu, Yougang Bian, Qingjia Cui, Xiaohui Qin, Rongjun Ding

{"title":"基于学习的AUV跟踪控制：混合策略改进和基于博弈的干扰抑制","authors":"Jun Ye, Hongbo Gao, Manjiang Hu, Yougang Bian, Qingjia Cui, Xiaohui Qin, Rongjun Ding","doi":"10.1049/cit2.12372","DOIUrl":null,"url":null,"abstract":"<p>A mixed adaptive dynamic programming (ADP) scheme based on zero-sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model-based and model-free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model-referenced information and conforms to the distribution of the real-world environment, which balances the influence of model-based control law and data-driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data-driven methods, concurrently also enhancing the tracking performance in comparison to model-based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero-sum game, and the actor-critic-disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line-of-sight guidance is presented to demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"510-528"},"PeriodicalIF":7.3000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12372","citationCount":"0","resultStr":"{\"title\":\"Learning-based tracking control of AUV: Mixed policy improvement and game-based disturbance rejection\",\"authors\":\"Jun Ye, Hongbo Gao, Manjiang Hu, Yougang Bian, Qingjia Cui, Xiaohui Qin, Rongjun Ding\",\"doi\":\"10.1049/cit2.12372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>A mixed adaptive dynamic programming (ADP) scheme based on zero-sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model-based and model-free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model-referenced information and conforms to the distribution of the real-world environment, which balances the influence of model-based control law and data-driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data-driven methods, concurrently also enhancing the tracking performance in comparison to model-based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero-sum game, and the actor-critic-disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line-of-sight guidance is presented to demonstrate the effectiveness of the proposed method.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"10 2\",\"pages\":\"510-528\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12372\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.12372\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.12372","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

针对自主水下航行器（AUV）系统受干扰和安全约束的最优控制问题，提出了一种基于零和博弈理论的混合自适应动态规划（ADP）方案。该方法通过将先验动态知识与实际采样数据相结合，有效缓解了动态模型不准确带来的缺陷，显著提高了ADP算法的训练速度。首先，在不考虑建模偏差的情况下，根据标称模型收集足够的参考数据来丰富数据集。此外，控制对象与真实环境交互，并在数据集中持续收集足够的采样数据。为了在训练过程中综合利用基于模型和无模型方法的优势，在具有模型参考信息且符合真实环境分布的数据集基础上引入自适应调整因子，平衡基于模型的控制律和数据驱动的策略梯度对策略改进方向的影响。因此，与数据驱动方法相比，该方法加快了学习速度，同时与基于模型的控制方法相比，该方法也提高了跟踪性能。此外，将扰动下的最优控制问题表述为零和博弈，并引入行动者-临界扰动框架分别逼近最优控制输入、成本函数和扰动策略。进一步分析了基于值迭代法的算法的收敛性。最后，给出了基于改进视距制导的AUV路径跟踪实例，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Learning-based tracking control of AUV: Mixed policy improvement and game-based disturbance rejection

查看原文本刊更多论文

Learning-based tracking control of AUV: Mixed policy improvement and game-based disturbance rejection

A mixed adaptive dynamic programming (ADP) scheme based on zero-sum game theory is developed to address optimal control problems of autonomous underwater vehicle (AUV) systems subject to disturbances and safe constraints. By combining prior dynamic knowledge and actual sampled data, the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm. Initially, the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias. Also, the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset. To comprehensively leverage the advantages of model-based and model-free methods during training, an adaptive tuning factor is introduced based on the dataset that possesses model-referenced information and conforms to the distribution of the real-world environment, which balances the influence of model-based control law and data-driven policy gradient on the direction of policy improvement. As a result, the proposed approach accelerates the learning speed compared to data-driven methods, concurrently also enhancing the tracking performance in comparison to model-based control methods. Moreover, the optimal control problem under disturbances is formulated as a zero-sum game, and the actor-critic-disturbance framework is introduced to approximate the optimal control input, cost function, and disturbance policy, respectively. Furthermore, the convergence property of the proposed algorithm based on the value iteration method is analysed. Finally, an example of AUV path following based on the improved line-of-sight guidance is presented to demonstrate the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.