{"title":"Online near optimal control of unknown nonaffine systems with application to HCCI engines","authors":"H. Zargarzadeh, S. Jagannathan, J. Drallmeier","doi":"10.1109/ADPRL.2011.5967382","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967382","url":null,"abstract":"Multi-input and multi-output (MIMO) optimal control of unknown nonaffine nonlinear systems is a challenging problem due to the presence of control inputs inside the unknown nonlinearity. In this paper, the optimal control of MIMO nonlinear nonaffine discrete-time systems in input-output form is considered when the internal dynamics are unknown. First, the nonaffine nonlinear system is converted into an affine-like equivalent nonlinear system under the assumption that the higher-order terms are bounded. Next, a forward-in-time Hamilton-Jaccobi-Bellman (HJB) equation-based optimal approach is developed to control the affine-like nonlinear system using neural network (NN). To overcome the need to know the control gain matrix of the affine-like system for the optimal controller, an online identifier is introduced. Lyapunov stability of the overall system including the online identifier shows that the approximate control input approaches the optimal control with a bounded error. Finally, the optimal control approach is applied to the cycle-by-cycle discrete-time representation of the experimentally validated HCCI engine which is represented as a nonaffine nonlinear system. Simulation results are included to demonstrate the efficacy of the approach in presence of actuator disturbances.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128969400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Agent self-assessment: Determining policy quality without execution","authors":"A. Hans, S. Düll, S. Udluft","doi":"10.1109/ADPRL.2011.5967358","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967358","url":null,"abstract":"With the development of data-efficient reinforcement learning (RL) methods, a promising data-driven solution for optimal control of complex technical systems has become available. For the application of RL to a technical system, it is usually required to evaluate a policy before actually applying it to ensure it operates the system safely and within required performance bounds. In benchmark applications one can use the system dynamics directly to measure the policy quality. In real applications, however, this might be too expensive or even impossible. Being unable to evaluate the policy without using the actual system hinders the application of RL to autonomous controllers. As a first step toward agent self-assessment, we deal with discrete MDPs in this paper. We propose to use the value function along with its uncertainty to assess a policy's quality and show that, when dealing with an MDP estimated from observations, the value function itself can be misleading. We address this problem by determining the value function's uncertainty through uncertainty propagation and evaluate the approach using a number of benchmark applications.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114632424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Moving least-squares approximations for linearly-solvable MDP","authors":"Mingyuan Zhong, E. Todorov","doi":"10.1109/ADPRL.2011.5967383","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967383","url":null,"abstract":"By introducing Linearly-solvable Markov Decision Process (LMDP), a general class of nonlinear stochastic optimal control problems can be reduced to solving linear problems. However, in practice, LMDP defined on continuous state space remain difficult due to high dimensionality of the state space. Here we describe a new framework for finding this solution by using a moving least-squares approximation. We use efficient iterative solvers which do not require matrix factorization, so we could handle large numbers of bases. The basis functions are constructed based on collocation states which change over iterations of the algorithm, so as to provide higher resolution at the regions of state space that are visited more often. The shape of the bases is automatically defined given the collocation states, in a way that avoids gaps in the coverage and avoids fitting a tremendous amount of parameters. Numerical results on test problems are provided and demonstrate good behavior when scaled to problems with high dimensionality.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130270688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Witsch, R. Reichle, K. Geihs, S. Lange, Martin A. Riedmiller
{"title":"Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures","authors":"A. Witsch, R. Reichle, K. Geihs, S. Lange, Martin A. Riedmiller","doi":"10.1109/ADPRL.2011.5967352","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967352","url":null,"abstract":"Incomplete or imprecise models of control systems make it difficult to find an appropriate structure and parameter set for a corresponding control policy. These problems are addressed by reinforcement learning algorithms like policy gradient methods. We describe how to stabilise the policy gradient descent by introducing a regularisation term to enhance the episodic natural actor-critic approach. This allows a more policy independent usage.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active learning for personalizing treatment","authors":"Kun Deng, Joelle Pineau, S. Murphy","doi":"10.1109/ADPRL.2011.5967348","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967348","url":null,"abstract":"The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the different categories of individuals. In general, ITRs can be constructed using data from clinical trials, however these are generally very costly to run. In order to reduce the cost of learning an ITR, we explore active learning techniques designed to carefully decide whom to recruit, and which treatment to assign, throughout the online conduct of the clinical trial. As an initial investigation, we focus on simple ITRs that utilize a small number of subpopulation categories to personalize treatment. To minimize the maximal uncertainty regarding the treatment effects for each subpopulation, we propose the use of a minimax bandit model and provide an active learning policy for solving it. We evaluate our active learning policy using simulated data and data modeled after a clinical trial involving treatments for depressed individuals. We contrast this policy with other plausible active learning policies. The techniques presented in the paper may be generalized to tackle problems of efficient exploration in other domains.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115467056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems","authors":"Derong Liu, Ding Wang, Dongbin Zhao","doi":"10.1109/ADPRL.2011.5967357","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967357","url":null,"abstract":"An intelligent optimal control scheme for unknown nonlinear discrete-time systems with discount factor in the cost function is proposed in this paper. An iterative adaptive dynamic programming (ADP) algorithm via globalized dual heuristic programming (GDHP) technique is developed to obtain the optimal controller with convergence analysis. Three neural networks are used as parametric structures to facilitate the implementation of the iterative algorithm, which will approximate at each iteration the cost function, the optimal control law, and the unknown nonlinear system, respectively. Two simulation examples are provided to verify the effectiveness of the presented optimal control approach.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"320 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115603828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supervised adaptive dynamic programming based adaptive cruise control","authors":"Dongbin Zhao, Zhaohui Hu","doi":"10.1109/ADPRL.2011.5967371","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967371","url":null,"abstract":"This paper proposes a supervised adaptive dynamic programming (SADP) algorithm for the full range Adaptive cruise control (ACC) system. The full range ACC system considers both the ACC situation in highway system and the stop and go (SG) situation in urban street way system. It can autonomously drive the host vehicle with desired speed and distance to the preceding vehicle in both situations. A traditional adaptive dynamic programming (ADP) algorithm is suited for this problem, but it suffers from the low learning efficiency. We propose the concept of inducing range to construct the supervisor and finally formulate the SADP algorithm, which greatly speeds up the learning efficiency. Several driving scenarios are designed and tested with the trained controller compared to traditional ones by simulation results, showing that trained SADP performs very well in all the scenarios, so that it provides an effective approach for the full range ACC problem.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123472826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark","authors":"T. Gabel, C. Lutz, Martin A. Riedmiller","doi":"10.1109/ADPRL.2011.5967361","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967361","url":null,"abstract":"Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"473 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129665173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolutionary value function approximation","authors":"M. Davarynejad, J. V. Ast, J. Vrancken, J. Berg","doi":"10.1109/ADPRL.2011.5967349","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967349","url":null,"abstract":"The standard reinforcement learning algorithms have proven to be effective tools for letting an agent learn from its experiences generated by its interaction with an environment. In this paper an evolutionary approach is proposed to accelerate learning speed in tabular reinforcement learning algorithms. In the proposed approach, in order to accelerate the learning speed of agents, the state-value is not only approximated, but through using the concept of evolutionary algorithms, they are evolved, with extra bonus of giving each agent the opportunity to exchange its knowledge. The proposed evolutionary value function approximation, helps in moving from a single isolated learning stage to cooperative exploration of the search space and accelerating learning speed. The performance of the proposed algorithm is compared with the standard SARSA algorithm and some of its properties are discussed. The experimental analysis confirms that the proposed approach has higher convergent speed with a negligible increase in computational complexity.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"21 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126944030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complex object manipulation with hierarchical optimal control","authors":"Alex Simpkins, E. Todorov","doi":"10.1109/ADPRL.2011.5967393","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967393","url":null,"abstract":"This paper develops a hierarchical model predictive optimal control solution to the complex and interesting problem of object manipulation. Controlling an object through external manipulators is challenging, involving nonlinearities, redundancy, high dimensionality, contact breaking, underactuation, and more. Manipulation can be framed as essentially the same problem as locomotion (with slightly different parameters). Significant progress has recently been made on the locomotion problem. We develop a methodology to address the challenges of manipulation, extending the most current solutions to locomotion and solving the problem fast enough to run in a realtime implementation. We accomplish this by breaking up the single difficult problem into smaller more tractable problems. Results are presented supporting this method.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125527037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}