2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)最新文献

Online near optimal control of unknown nonaffine systems with application to HCCI engines 未知非仿射系统的在线近最优控制及其在HCCI发动机中的应用

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967382

H. Zargarzadeh, S. Jagannathan, J. Drallmeier

{"title":"Online near optimal control of unknown nonaffine systems with application to HCCI engines","authors":"H. Zargarzadeh, S. Jagannathan, J. Drallmeier","doi":"10.1109/ADPRL.2011.5967382","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967382","url":null,"abstract":"Multi-input and multi-output (MIMO) optimal control of unknown nonaffine nonlinear systems is a challenging problem due to the presence of control inputs inside the unknown nonlinearity. In this paper, the optimal control of MIMO nonlinear nonaffine discrete-time systems in input-output form is considered when the internal dynamics are unknown. First, the nonaffine nonlinear system is converted into an affine-like equivalent nonlinear system under the assumption that the higher-order terms are bounded. Next, a forward-in-time Hamilton-Jaccobi-Bellman (HJB) equation-based optimal approach is developed to control the affine-like nonlinear system using neural network (NN). To overcome the need to know the control gain matrix of the affine-like system for the optimal controller, an online identifier is introduced. Lyapunov stability of the overall system including the online identifier shows that the approximate control input approaches the optimal control with a bounded error. Finally, the optimal control approach is applied to the cycle-by-cycle discrete-time representation of the experimentally validated HCCI engine which is represented as a nonaffine nonlinear system. Simulation results are included to demonstrate the efficacy of the approach in presence of actuator disturbances.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128969400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Agent self-assessment: Determining policy quality without execution 代理自评估:在不执行的情况下确定策略质量

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967358

A. Hans, S. Düll, S. Udluft

{"title":"Agent self-assessment: Determining policy quality without execution","authors":"A. Hans, S. Düll, S. Udluft","doi":"10.1109/ADPRL.2011.5967358","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967358","url":null,"abstract":"With the development of data-efficient reinforcement learning (RL) methods, a promising data-driven solution for optimal control of complex technical systems has become available. For the application of RL to a technical system, it is usually required to evaluate a policy before actually applying it to ensure it operates the system safely and within required performance bounds. In benchmark applications one can use the system dynamics directly to measure the policy quality. In real applications, however, this might be too expensive or even impossible. Being unable to evaluate the policy without using the actual system hinders the application of RL to autonomous controllers. As a first step toward agent self-assessment, we deal with discrete MDPs in this paper. We propose to use the value function along with its uncertainty to assess a policy's quality and show that, when dealing with an MDP estimated from observations, the value function itself can be misleading. We address this problem by determining the value function's uncertainty through uncertainty propagation and evaluate the approach using a number of benchmark applications.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114632424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Moving least-squares approximations for linearly-solvable MDP 线性可解MDP的移动最小二乘逼近

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967383

Mingyuan Zhong, E. Todorov

引用次数: 1

Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures 通过正则化项增强情景自然演员-评论家算法以稳定控制结构的学习

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967352

A. Witsch, R. Reichle, K. Geihs, S. Lange, Martin A. Riedmiller

引用次数: 0

Active learning for personalizing treatment 主动学习个性化治疗

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967348

Kun Deng, Joelle Pineau, S. Murphy

{"title":"Active learning for personalizing treatment","authors":"Kun Deng, Joelle Pineau, S. Murphy","doi":"10.1109/ADPRL.2011.5967348","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967348","url":null,"abstract":"The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the different categories of individuals. In general, ITRs can be constructed using data from clinical trials, however these are generally very costly to run. In order to reduce the cost of learning an ITR, we explore active learning techniques designed to carefully decide whom to recruit, and which treatment to assign, throughout the online conduct of the clinical trial. As an initial investigation, we focus on simple ITRs that utilize a small number of subpopulation categories to personalize treatment. To minimize the maximal uncertainty regarding the treatment effects for each subpopulation, we propose the use of a minimax bandit model and provide an active learning policy for solving it. We evaluate our active learning policy using simulated data and data modeled after a clinical trial involving treatments for depressed individuals. We contrast this policy with other plausible active learning policies. The techniques presented in the paper may be generalized to tackle problems of efficient exploration in other domains.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115467056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems 未知非线性离散系统的自适应动态规划最优控制

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967357

Derong Liu, Ding Wang, Dongbin Zhao

引用次数: 14

Supervised adaptive dynamic programming based adaptive cruise control 基于监督自适应动态规划的自适应巡航控制

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967371

Dongbin Zhao, Zhaohui Hu

引用次数: 22

Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark 改进的神经拟合Q迭代应用于一种新的计算机游戏和学习基准

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967361

T. Gabel, C. Lutz, Martin A. Riedmiller

{"title":"Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark","authors":"T. Gabel, C. Lutz, Martin A. Riedmiller","doi":"10.1109/ADPRL.2011.5967361","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967361","url":null,"abstract":"Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"473 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129665173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Evolutionary value function approximation 演化值函数近似

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967349

M. Davarynejad, J. V. Ast, J. Vrancken, J. Berg

引用次数: 5

Complex object manipulation with hierarchical optimal control 基于层次最优控制的复杂对象操纵

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI: 10.1109/ADPRL.2011.5967393

Alex Simpkins, E. Todorov

引用次数: 10