{"title":"Adaptive dynamic programming with balanced weights seeking strategy","authors":"Jian Fu, Haibo He, Zhen Ni","doi":"10.1109/ADPRL.2011.5967373","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967373","url":null,"abstract":"In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptive dynamic programming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consider a balanced weight updating strategy with the consideration of both robustness and convergence during the online learning process. Specifically, a modified recursive Levenberg-Marquardt (LM) method is integrated into both the action network and critic network of the ADP design, and a detailed learning algorithm is proposed to implement this approach. We test the performance of our approach based on the triple link inverted pendulum, a popular benchmark in the community, to demonstrate online learning and control strategy. Experimental results and comparative study under different noise conditions demonstrate the effectiveness of this approach.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123218195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active exploration for robot parameter selection in episodic reinforcement learning","authors":"Oliver Kroemer, Jan Peters","doi":"10.1109/ADPRL.2011.5967378","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967378","url":null,"abstract":"As the complexity of robots and other autonomous systems increases, it becomes more important that these systems can adapt and optimize their settings actively. However, such optimization is rarely trivial. Sampling from the system is often expensive in terms of time and other costs, and excessive sampling should therefore be avoided. The parameter space is also usually continuous and multi-dimensional. Given the inherent exploration-exploitation dilemma of the problem, we propose treating it as an episodic reinforcement learning problem. In this reinforcement learning framework, the policy is defined by the system's parameters and the rewards are given by the system's performance. The rewards accumulate during each episode of a task. In this paper, we present a method for efficiently sampling and optimizing in continuous multidimensional spaces. The approach is based on Gaussian process regression, which can represent continuous non-linear mappings from parameters to system performance. We employ an upper confidence bound policy, which explicitly manages the trade-off between exploration and exploitation. Unlike many other policies for this kind of problem, we do not rely on a discretization of the action space. The presented method was evaluated on a real robot. The robot had to learn grasping parameters in order to adapt its grasping execution to different objects. The proposed method was also tested on a more general gain tuning problem. The results of the experiments show that the presented method can quickly determine suitable parameters and is applicable to real online learning applications.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122252247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Migliavacca, A. Pecorino, Matteo Pirotta, Marcello Restelli, Andrea Bonarini
{"title":"Fitted policy search","authors":"M. Migliavacca, A. Pecorino, Matteo Pirotta, Marcello Restelli, Andrea Bonarini","doi":"10.1109/ADPRL.2011.5967368","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967368","url":null,"abstract":"In this paper we address the combination of batch reinforcement-learning (BRL) techniques with direct policy search (DPS) algorithms in the context of robot learning. Batch value-based algorithms (such as fitted Q-iteration) have been proved to outperform online ones in many complex applications, but they share the same difficulties in solving problems with continuous action spaces, such as robotic ones. In these cases, actor-critic and DPS methods are preferable, since the optimization process is limited to a family of parameterized (usually smooth) policies. On the other hand, these methods (e.g., policy gradient and evolutionary methods) are generally very expensive, since finding the optimal parameterization may require to evaluate the performance of several policies, which in many real robotic applications is unfeasible or even dangerous. To overcome such problems, we exploit the fitted policy search (FPS) approach, in which the expected return of any policy considered during the optimization process is evaluated offline (without resorting to the robot) by reusing the data collected in the initial exploration phase. In this way, it is possible to take the advantages of both BRL and DPS algorithms, thus achieving an effective learning approach to solve robotic problems. A balancing task on a real two-wheeled robotic pendulum is used to analyze the properties and evaluate the effectiveness of the FPS approach.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134134198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Higher-level application of Adaptive Dynamic Programming/Reinforcement Learning - a next phase for controls and system identification?","authors":"G. Lendaris","doi":"10.1109/ADPRL.2011.5967395","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967395","url":null,"abstract":"In previous work it was shown that Adaptive-Critic-type Approximate Dynamic Programming could be applied in a “higher-level” way to create autonomous agents capable of using experience to discern context and select optimal, context-dependent control policies. Early experiments with this approach were based on full a priori knowledge of the system being monitored. The experiments reported in this paper, using small neural networks representing families of mappings, were designed to explore what happens when knowledge of the system is less precise. Results of these experiments show that agents trained with this approach perform well when subject to even large amounts of noise or when employing (slightly) imperfect models. The results also suggest that aspects of this method of context discernment are consistent with our intuition about human learning. The insights gained from these explorations can be used to guide further efforts for developing this approach into a general methodology for solving arbitrary identification and control problems.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"16 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132640456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Near optimal control of mobile robot formations","authors":"T. Dierks, B. Brenner, S. Jagannathan","doi":"10.1109/ADPRL.2011.5967369","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967369","url":null,"abstract":"In this paper, the infinite horizon optimal tracking control problem is solved online and forward-in-time for leader-follower based formation control of nonholonomic mobile robots. Using the backstepping design approach, the dynamical controller inputs for the robots are approximated from nonlinear optimal control techniques in order to track the control velocities designed to keep the formation. The proposed nonlinear optimal control technique, referred to as adaptive dynamic programming, uses neural networks (NN's) to solve the optimal formation control problem in discrete-time in the presence of unknown internal dynamics and a known control coefficient matrix. All NN's are tuned online using novel weight update laws, and the stability of the entire formation is demonstrated using Lyapunov methods. Simulation results are provided to demonstrate the effectiveness of the proposed approach.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125918582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate reinforcement learning: An overview","authors":"L. Buşoniu, D. Ernst, B. Schutter, Robert Babuška","doi":"10.1109/ADPRL.2011.5967353","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967353","url":null,"abstract":"Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132180230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"N-step optimal time-invariant trajectory tracking control for a class of nonlinear systems","authors":"Ruizhuo Song, Huaguang Zhang","doi":"10.1109/ADPRL.2011.5967354","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967354","url":null,"abstract":"In this paper, the time-invariant trajectory tracking control problem under N-step control is solved by finite horizon approximate dynamic programming (ADP) algorithms. At first, we convert the tracking control problem for time-invariant trajectory into a output regulation problem. The cost function guarantees the energy is minimum. Secondly, the regulation control scheme is proposed using finite horizon ADP technique to obtain the N-step control. Then two theorems are used to prove the convergence of the proposed control algorithm. Finally, the simulation is given to demonstrate the effectiveness and feasibility of the control scheme.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115617930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A reinforcement learning approach for sequential mastery testing","authors":"El-Sayed M. El-Alfy","doi":"10.1109/ADPRL.2011.5967390","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967390","url":null,"abstract":"This paper explores a novel application for reinforcement learning (RL) techniques to sequential mastery testing. In such systems, the goal is to classify each examined person, using the minimal number of test items, as master or non-master. Using RL, an intelligent agent autonomously learns from interactions to administer more informative and effective variable-length tests. Empirical results are also provided to evaluate the performance of the proposed approach as compared to two common approaches for variable-length testing (Bayesian decision and sequential probability ratio test) as well as to the fixed-length testing.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131956389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On learning with imperfect representations","authors":"Shivaram Kalyanakrishnan, P. Stone","doi":"10.1109/ADPRL.2011.5967379","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967379","url":null,"abstract":"In this paper we present a perspective on the relationship between learning and representation in sequential decision making tasks. We undertake a brief survey of existing real-world applications, which demonstrates that the classical “tabular” representation seldom applies in practice. Specifically, several practical tasks suffer from state aliasing, and most demand some form of generalization and function approximation. Coping with these representational aspects thus becomes an important direction for furthering the advent of reinforcement learning in practice. The central thesis we present in this position paper is that in practice, learning methods specifically developed to work with imperfect representations are likely to perform better than those developed for perfect representations and then applied in imperfect-representation settings. We specify an evaluation criterion for learning methods in practice, and propose a framework for their synthesis. In particular, we highlight the degrees of “representational bias” prevalent in different learning methods. We reference a variety of relevant literature as a background for this introspective essay.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126726171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-order local dynamic programming","authors":"Yuval Tassa, E. Todorov","doi":"10.1109/ADPRL.2011.5967350","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967350","url":null,"abstract":"We describe a new local dynamic programming algorithm for solving stochastic continuous Optimal Control problems. We use cubature integration to both propagate the state distribution and perform the Bellman backup. The algorithm can approximate the local policy and cost-to-go with arbitrary function bases. We compare the classic quadratic cost-to-go/linear-feedback controller to a cubic cost-to-go/quadratic policy controller on a 10-dimensional simulated swimming robot, and find that the higher order approximation yields a more general policy with a larger basin of attraction.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"41 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126076934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}