{"title":"On-policy Approximate Dynamic Programming for Optimal Control of non-linear systems","authors":"K. Shalini, D. Vrushabh, K. Sonam","doi":"10.1109/CoDIT49905.2020.9263879","DOIUrl":null,"url":null,"abstract":"Optimal control theory deals with finding the policy that minimizes the discounted infinite horizon quadratic cost function. For finding the optimal control policy, the solution of the Hamilton-Jacobi-Bellman (HJB) equation must be found i.e. the value function which satisfies the Bellman equation. However, the HJB is a partial differential equation that is difficult to solve for a nonlinear system. The paper employs the approximate dynamic programming method to solve the HJB equation for the deterministic nonlinear discrete-time systems in continuous state and action space. The approximate solution of the HJB is found by the policy iteration algorithm which has the framework of actor-critic architecture. The control policy and value function are approximated using function approximators such as neural network represented in the form of linearly independent basis function. The gradient descent optimization algorithm is employed to tune the weights of the actor and critic network. The control algorithm is implemented for cart pole inverted pendulum system, the effectiveness of this approach is provided in simulations.","PeriodicalId":355781,"journal":{"name":"2020 7th International Conference on Control, Decision and Information Technologies (CoDIT)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th International Conference on Control, Decision and Information Technologies (CoDIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoDIT49905.2020.9263879","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Optimal control theory deals with finding the policy that minimizes the discounted infinite horizon quadratic cost function. For finding the optimal control policy, the solution of the Hamilton-Jacobi-Bellman (HJB) equation must be found i.e. the value function which satisfies the Bellman equation. However, the HJB is a partial differential equation that is difficult to solve for a nonlinear system. The paper employs the approximate dynamic programming method to solve the HJB equation for the deterministic nonlinear discrete-time systems in continuous state and action space. The approximate solution of the HJB is found by the policy iteration algorithm which has the framework of actor-critic architecture. The control policy and value function are approximated using function approximators such as neural network represented in the form of linearly independent basis function. The gradient descent optimization algorithm is employed to tune the weights of the actor and critic network. The control algorithm is implemented for cart pole inverted pendulum system, the effectiveness of this approach is provided in simulations.