Yiting Feng , Ye Zhou , Hann Woei Ho , Hongyang Dong , Xiaowei Zhao
{"title":"Online adaptive critic designs with tensor product B-splines and incremental model techniques","authors":"Yiting Feng , Ye Zhou , Hann Woei Ho , Hongyang Dong , Xiaowei Zhao","doi":"10.1016/j.jfranklin.2024.107357","DOIUrl":null,"url":null,"abstract":"<div><div>As an effective optimal control scheme in the field of reinforcement learning, adaptive dynamic programming (ADP) has attracted extensive attention in recent decades. Neural networks (NNs) are commonly employed in ADP algorithms to realize nonlinear function approximation. However, the learning process with NN approximation typically requires a substantial amount of training data and places a heavy computational burden. To improve learning efficiency and alleviate computational load, this paper develops a novel model-free ADP approach based on multivariate splines and incremental control techniques, aimed at achieving online learning of nonlinear control. By utilizing input and output data, an incremental linear approximation model is identified online without any prior knowledge of system dynamics. To improve nonlinear approximation capabilities and lessen computational demands, tensor product B-splines, instead of NNs, are integrated into the critic module to approximate the value function, and the recursive least squares temporal difference (RLS-TD) algorithm is employed to update the weights of spline bases. Convergence analysis of the proposed control scheme is conducted based on the dual-timescale stochastic approximation theory. To illustrate the effectiveness and feasibility of the proposed control scheme, numerical simulations are performed in solving the online nonlinear control problem within the context of the inverted pendulum system.</div></div>","PeriodicalId":17283,"journal":{"name":"Journal of The Franklin Institute-engineering and Applied Mathematics","volume":"361 18","pages":"Article 107357"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of The Franklin Institute-engineering and Applied Mathematics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016003224007786","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
As an effective optimal control scheme in the field of reinforcement learning, adaptive dynamic programming (ADP) has attracted extensive attention in recent decades. Neural networks (NNs) are commonly employed in ADP algorithms to realize nonlinear function approximation. However, the learning process with NN approximation typically requires a substantial amount of training data and places a heavy computational burden. To improve learning efficiency and alleviate computational load, this paper develops a novel model-free ADP approach based on multivariate splines and incremental control techniques, aimed at achieving online learning of nonlinear control. By utilizing input and output data, an incremental linear approximation model is identified online without any prior knowledge of system dynamics. To improve nonlinear approximation capabilities and lessen computational demands, tensor product B-splines, instead of NNs, are integrated into the critic module to approximate the value function, and the recursive least squares temporal difference (RLS-TD) algorithm is employed to update the weights of spline bases. Convergence analysis of the proposed control scheme is conducted based on the dual-timescale stochastic approximation theory. To illustrate the effectiveness and feasibility of the proposed control scheme, numerical simulations are performed in solving the online nonlinear control problem within the context of the inverted pendulum system.
作为强化学习领域的一种有效优化控制方案,自适应动态编程(ADP)在近几十年来引起了广泛关注。ADP 算法中通常采用神经网络(NN)来实现非线性函数逼近。然而,使用神经网络逼近的学习过程通常需要大量的训练数据,并带来沉重的计算负担。为了提高学习效率并减轻计算负担,本文基于多元样条和增量控制技术,开发了一种新型的无模型 ADP 方法,旨在实现非线性控制的在线学习。通过利用输入和输出数据,可以在线识别增量线性近似模型,而无需事先了解系统动态。为了提高非线性逼近能力并减少计算需求,在批判模块中集成了张量积 B 样条而不是 NN 来逼近值函数,并采用递归最小二乘时差(RLS-TD)算法来更新样条基的权重。基于双时标随机逼近理论,对所提出的控制方案进行了收敛性分析。为了说明所提控制方案的有效性和可行性,在解决倒摆系统中的在线非线性控制问题时进行了数值模拟。
期刊介绍:
The Journal of The Franklin Institute has an established reputation for publishing high-quality papers in the field of engineering and applied mathematics. Its current focus is on control systems, complex networks and dynamic systems, signal processing and communications and their applications. All submitted papers are peer-reviewed. The Journal will publish original research papers and research review papers of substance. Papers and special focus issues are judged upon possible lasting value, which has been and continues to be the strength of the Journal of The Franklin Institute.