Toward effective combination of off-line and on-line training in ADP framework

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning Pub Date : 2007-04-01 DOI:10.1109/ADPRL.2007.368198

D. Prokhorov

引用次数: 17

Abstract

We are interested in finding the most effective combination between off-line and on-line/real-time training in approximate dynamic programming. We introduce our approach of combining proven off-line methods of training for robustness with a group of on-line methods. Training for robustness is carried out on reasonably accurate models with the multi-stream Kalman filter method (Feldkamp et al., 1998), whereas on-line adaptation is performed either with the help of a critic or by methods resembling reinforcement learning. We also illustrate importance of using recurrent neural networks for both controller/actor and critic

查看原文本刊更多论文

在ADP框架下实现离线和在线培训的有效结合

我们感兴趣的是在近似动态规划中找到离线和在线/实时训练之间最有效的结合。我们将鲁棒性训练的离线方法与一组在线方法相结合。鲁棒性训练是用多流卡尔曼滤波方法(Feldkamp等人，1998)在相当精确的模型上进行的，而在线适应是在评论家的帮助下或通过类似强化学习的方法进行的。我们还说明了在控制器/参与者和评论家中使用循环神经网络的重要性

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

自引率

0.00%

发文量