自适应动态规划

IEEE Transactions on Systems Man and Cybernetics Part C-Applications and Re Pub Date : 2002-05-01 DOI:10.1109/TSMCC.2002.801727

J. Murray, C. Cox, G. Lendaris, R. Saeks

{"title":"自适应动态规划","authors":"J. Murray, C. Cox, G. Lendaris, R. Saeks","doi":"10.1109/TSMCC.2002.801727","DOIUrl":null,"url":null,"abstract":"Unlike the many soft computing applications where it suffices to achieve a \"good approximation most of the time,\" a control system must be stable all of the time. As such, if one desires to learn a control law in real-time, a fusion of soft computing techniques to learn the appropriate control law with hard computing techniques to maintain the stability constraint and guarantee convergence is required. The objective of the paper is to describe an adaptive dynamic programming algorithm (ADPA) which fuses soft computing techniques to learn the optimal cost (or return) functional for a stabilizable nonlinear system with unknown dynamics and hard computing techniques to verify the stability and convergence of the algorithm. Specifically, the algorithm is initialized with a (stabilizing) cost functional and the system is run with the corresponding control law (defined by the Hamilton-Jacobi-Bellman equation), with the resultant state trajectories used to update the cost functional in a soft computing mode. Hard computing techniques are then used to show that this process is globally convergent with stepwise stability to the optimal cost functional/control law pair for an (unknown) input affine system with an input quadratic performance measure (modulo the appropriate technical conditions). Three specific implementations of the ADPA are developed for 1) the linear case, 2) for the nonlinear case using a locally quadratic approximation to the cost functional, and 3) the nonlinear case using a radial basis function approximation of the cost functional; illustrated by applications to flight control.","PeriodicalId":55005,"journal":{"name":"IEEE Transactions on Systems Man and Cybernetics Part C-Applications and Re","volume":"11 1","pages":"140-153"},"PeriodicalIF":0.0000,"publicationDate":"2002-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"620","resultStr":"{\"title\":\"Adaptive dynamic programming\",\"authors\":\"J. Murray, C. Cox, G. Lendaris, R. Saeks\",\"doi\":\"10.1109/TSMCC.2002.801727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unlike the many soft computing applications where it suffices to achieve a \\\"good approximation most of the time,\\\" a control system must be stable all of the time. As such, if one desires to learn a control law in real-time, a fusion of soft computing techniques to learn the appropriate control law with hard computing techniques to maintain the stability constraint and guarantee convergence is required. The objective of the paper is to describe an adaptive dynamic programming algorithm (ADPA) which fuses soft computing techniques to learn the optimal cost (or return) functional for a stabilizable nonlinear system with unknown dynamics and hard computing techniques to verify the stability and convergence of the algorithm. Specifically, the algorithm is initialized with a (stabilizing) cost functional and the system is run with the corresponding control law (defined by the Hamilton-Jacobi-Bellman equation), with the resultant state trajectories used to update the cost functional in a soft computing mode. Hard computing techniques are then used to show that this process is globally convergent with stepwise stability to the optimal cost functional/control law pair for an (unknown) input affine system with an input quadratic performance measure (modulo the appropriate technical conditions). Three specific implementations of the ADPA are developed for 1) the linear case, 2) for the nonlinear case using a locally quadratic approximation to the cost functional, and 3) the nonlinear case using a radial basis function approximation of the cost functional; illustrated by applications to flight control.\",\"PeriodicalId\":55005,\"journal\":{\"name\":\"IEEE Transactions on Systems Man and Cybernetics Part C-Applications and Re\",\"volume\":\"11 1\",\"pages\":\"140-153\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"620\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Systems Man and Cybernetics Part C-Applications and Re\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSMCC.2002.801727\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man and Cybernetics Part C-Applications and Re","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSMCC.2002.801727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 620

摘要

与许多软计算应用程序不同的是，控制系统必须在任何时候都保持稳定，而软计算应用程序只需要在“大多数情况下获得良好的近似”就足够了。因此，如果希望实时学习控制律，就需要将软计算技术与硬计算技术相融合，以学习合适的控制律，以保持稳定性约束并保证收敛。本文的目的是描述一种自适应动态规划算法(ADPA)，该算法融合了软计算技术来学习具有未知动态的可稳定非线性系统的最优代价(或回报)泛函，并结合硬计算技术来验证算法的稳定性和收敛性。具体来说，该算法初始化为一个(稳定的)代价函数，系统在相应的控制律(由Hamilton-Jacobi-Bellman方程定义)下运行，生成的状态轨迹用于在软计算模式下更新代价函数。然后使用硬计算技术来证明该过程是全局收敛的，具有逐步稳定性，对于具有输入二次性能度量(对适当的技术条件取模)的(未知)输入仿射系统，其最优成本函数/控制律对。ADPA的三种具体实现是:1)线性情况;2)非线性情况，使用成本泛函的局部二次逼近;3)非线性情况，使用成本泛函的径向基函数逼近;说明应用飞行控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive dynamic programming

Unlike the many soft computing applications where it suffices to achieve a "good approximation most of the time," a control system must be stable all of the time. As such, if one desires to learn a control law in real-time, a fusion of soft computing techniques to learn the appropriate control law with hard computing techniques to maintain the stability constraint and guarantee convergence is required. The objective of the paper is to describe an adaptive dynamic programming algorithm (ADPA) which fuses soft computing techniques to learn the optimal cost (or return) functional for a stabilizable nonlinear system with unknown dynamics and hard computing techniques to verify the stability and convergence of the algorithm. Specifically, the algorithm is initialized with a (stabilizing) cost functional and the system is run with the corresponding control law (defined by the Hamilton-Jacobi-Bellman equation), with the resultant state trajectories used to update the cost functional in a soft computing mode. Hard computing techniques are then used to show that this process is globally convergent with stepwise stability to the optimal cost functional/control law pair for an (unknown) input affine system with an input quadratic performance measure (modulo the appropriate technical conditions). Three specific implementations of the ADPA are developed for 1) the linear case, 2) for the nonlinear case using a locally quadratic approximation to the cost functional, and 3) the nonlinear case using a radial basis function approximation of the cost functional; illustrated by applications to flight control.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Systems Man and Cybernetics Part C-Applications and Re 工程技术-计算机：控制论

自引率

0.00%

发文量

审稿时长

3 months