Solutions to finite horizon cost problems using actor-critic reinforcement learning

The 2013 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2013-08-01 DOI:10.1109/IJCNN.2013.6706755

I. Grondman, Hao Xu, S. Jagannathan, Robert Babuška

{"title":"Solutions to finite horizon cost problems using actor-critic reinforcement learning","authors":"I. Grondman, Hao Xu, S. Jagannathan, Robert Babuška","doi":"10.1109/IJCNN.2013.6706755","DOIUrl":null,"url":null,"abstract":"Actor-critic reinforcement learning algorithms have shown to be a successful tool in learning the optimal control for a range of (repetitive) tasks on systems with (partially) unknown dynamics, which may or may not be nonlinear. Most of the reinforcement learning literature published up to this point only deals with modeling the task at hand as a Markov decision process with an infinite horizon cost function. In practice, however, it is sometimes desired to have a solution for the case where the cost function is defined over a finite horizon, which means that the optimal control problem will be time-varying and thus harder to solve. This paper adapts two previously introduced actor-critic algorithms from the infinite horizon setting to the finite horizon setting and applies them to learning a task on a nonlinear system, without needing any assumptions or knowledge about the system dynamics, using radial basis function networks. Simulations on a typical nonlinear motion control problem are carried out, showing that actor-critic algorithms are capable of solving the difficult problem of time-varying optimal control. Moreover, the benefit of using a model learning technique is shown.","PeriodicalId":376975,"journal":{"name":"The 2013 International Joint Conference on Neural Networks (IJCNN)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2013.6706755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Actor-critic reinforcement learning algorithms have shown to be a successful tool in learning the optimal control for a range of (repetitive) tasks on systems with (partially) unknown dynamics, which may or may not be nonlinear. Most of the reinforcement learning literature published up to this point only deals with modeling the task at hand as a Markov decision process with an infinite horizon cost function. In practice, however, it is sometimes desired to have a solution for the case where the cost function is defined over a finite horizon, which means that the optimal control problem will be time-varying and thus harder to solve. This paper adapts two previously introduced actor-critic algorithms from the infinite horizon setting to the finite horizon setting and applies them to learning a task on a nonlinear system, without needing any assumptions or knowledge about the system dynamics, using radial basis function networks. Simulations on a typical nonlinear motion control problem are carried out, showing that actor-critic algorithms are capable of solving the difficult problem of time-varying optimal control. Moreover, the benefit of using a model learning technique is shown.

查看原文本刊更多论文

有限视界成本问题的actor-critic强化学习解决方案

Actor-critic强化学习算法已被证明是一种成功的工具，可以在具有(部分)未知动态(可能是也可能不是非线性)的系统上学习一系列(重复)任务的最优控制。到目前为止，大多数发表的强化学习文献只涉及将手头的任务建模为具有无限视界成本函数的马尔可夫决策过程。然而，在实践中，有时需要对成本函数在有限范围内定义的情况有一个解决方案，这意味着最优控制问题将是时变的，因此更难解决。本文将先前介绍的两种actor-critic算法从无限视界引入到有限视界，并将其应用于非线性系统的任务学习，而不需要任何关于系统动力学的假设或知识，使用径向基函数网络。对一个典型的非线性运动控制问题进行了仿真，结果表明演员评价算法能够解决时变最优控制难题。此外，还展示了使用模型学习技术的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 2013 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量