Reinforcement learning algorithms as function optimizers

International 1989 Joint Conference on Neural Networks Pub Date : 1989-12-01 DOI:10.1109/IJCNN.1989.118683

Ronald J. Williams

引用次数: 20

Abstract

Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. A description is given of the results of simulations in which the optima of several deterministic functions studied by D.H. Ackley (Ph.D. Diss., Carnegie-Mellon Univ., 1987) were sought using variants of REINFORCE algorithms. Results obtained for certain of these algorithms compare favorably to the best results found by Ackley.<>

查看原文本刊更多论文

作为函数优化器的强化学习算法

任何非关联强化学习算法都可以看作是通过对函数值进行采样(可能有噪声损坏)来执行函数优化的方法。本文描述了由D.H. Ackley (Diss博士)研究的几个确定性函数的最优解的仿真结果。，卡耐基梅隆大学，1987年)寻求使用变体强化算法。这些算法的某些结果与Ackley发现的最佳结果相比较是有利的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International 1989 Joint Conference on Neural Networks

自引率

0.00%

发文量