2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)最新文献_第4页

Adaptive dynamic programming with balanced weights seeking strategy 平衡权寻策略的自适应动态规划

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967373

Jian Fu, Haibo He, Zhen Ni

引用次数: 8

Active exploration for robot parameter selection in episodic reinforcement learning 情景强化学习中机器人参数选择的主动探索

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967378

Oliver Kroemer, Jan Peters

{"title":"Active exploration for robot parameter selection in episodic reinforcement learning","authors":"Oliver Kroemer, Jan Peters","doi":"10.1109/ADPRL.2011.5967378","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967378","url":null,"abstract":"As the complexity of robots and other autonomous systems increases, it becomes more important that these systems can adapt and optimize their settings actively. However, such optimization is rarely trivial. Sampling from the system is often expensive in terms of time and other costs, and excessive sampling should therefore be avoided. The parameter space is also usually continuous and multi-dimensional. Given the inherent exploration-exploitation dilemma of the problem, we propose treating it as an episodic reinforcement learning problem. In this reinforcement learning framework, the policy is defined by the system's parameters and the rewards are given by the system's performance. The rewards accumulate during each episode of a task. In this paper, we present a method for efficiently sampling and optimizing in continuous multidimensional spaces. The approach is based on Gaussian process regression, which can represent continuous non-linear mappings from parameters to system performance. We employ an upper confidence bound policy, which explicitly manages the trade-off between exploration and exploitation. Unlike many other policies for this kind of problem, we do not rely on a discretization of the action space. The presented method was evaluated on a real robot. The robot had to learn grasping parameters in order to adapt its grasping execution to different objects. The proposed method was also tested on a more general gain tuning problem. The results of the experiments show that the presented method can quickly determine suitable parameters and is applicable to real online learning applications.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122252247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Fitted policy search 拟合保单查询

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967368

M. Migliavacca, A. Pecorino, Matteo Pirotta, Marcello Restelli, Andrea Bonarini

{"title":"Fitted policy search","authors":"M. Migliavacca, A. Pecorino, Matteo Pirotta, Marcello Restelli, Andrea Bonarini","doi":"10.1109/ADPRL.2011.5967368","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967368","url":null,"abstract":"In this paper we address the combination of batch reinforcement-learning (BRL) techniques with direct policy search (DPS) algorithms in the context of robot learning. Batch value-based algorithms (such as fitted Q-iteration) have been proved to outperform online ones in many complex applications, but they share the same difficulties in solving problems with continuous action spaces, such as robotic ones. In these cases, actor-critic and DPS methods are preferable, since the optimization process is limited to a family of parameterized (usually smooth) policies. On the other hand, these methods (e.g., policy gradient and evolutionary methods) are generally very expensive, since finding the optimal parameterization may require to evaluate the performance of several policies, which in many real robotic applications is unfeasible or even dangerous. To overcome such problems, we exploit the fitted policy search (FPS) approach, in which the expected return of any policy considered during the optimization process is evaluated offline (without resorting to the robot) by reusing the data collected in the initial exploration phase. In this way, it is possible to take the advantages of both BRL and DPS algorithms, thus achieving an effective learning approach to solve robotic problems. A balancing task on a real two-wheeled robotic pendulum is used to analyze the properties and evaluate the effectiveness of the FPS approach.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134134198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Higher-level application of Adaptive Dynamic Programming/Reinforcement Learning - a next phase for controls and system identification? 自适应动态规划/强化学习的高级应用——控制和系统识别的下一阶段?

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967395

G. Lendaris

引用次数: 1

Near optimal control of mobile robot formations 移动机器人编队的近最优控制

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967369

T. Dierks, B. Brenner, S. Jagannathan

引用次数: 6

Approximate reinforcement learning: An overview 近似强化学习:概述

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967353

L. Buşoniu, D. Ernst, B. Schutter, Robert Babuška

引用次数: 62

N-step optimal time-invariant trajectory tracking control for a class of nonlinear systems 一类非线性系统的n步最优时不变轨迹跟踪控制

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967354

Ruizhuo Song, Huaguang Zhang

引用次数: 2

A reinforcement learning approach for sequential mastery testing 序列精通测试的强化学习方法

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967390

El-Sayed M. El-Alfy

引用次数: 1

On learning with imperfect representations 关于不完全表征的学习

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967379

Shivaram Kalyanakrishnan, P. Stone

{"title":"On learning with imperfect representations","authors":"Shivaram Kalyanakrishnan, P. Stone","doi":"10.1109/ADPRL.2011.5967379","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967379","url":null,"abstract":"In this paper we present a perspective on the relationship between learning and representation in sequential decision making tasks. We undertake a brief survey of existing real-world applications, which demonstrates that the classical “tabular” representation seldom applies in practice. Specifically, several practical tasks suffer from state aliasing, and most demand some form of generalization and function approximation. Coping with these representational aspects thus becomes an important direction for furthering the advent of reinforcement learning in practice. The central thesis we present in this position paper is that in practice, learning methods specifically developed to work with imperfect representations are likely to perform better than those developed for perfect representations and then applied in imperfect-representation settings. We specify an evaluation criterion for learning methods in practice, and propose a framework for their synthesis. In particular, we highlight the degrees of “representational bias” prevalent in different learning methods. We reference a variety of relevant literature as a background for this introspective essay.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126726171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

High-order local dynamic programming 高阶局部动态规划

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967350

Yuval Tassa, E. Todorov

引用次数: 7