2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)最新文献

筛选
英文 中文
Online near optimal control of unknown nonaffine systems with application to HCCI engines 未知非仿射系统的在线近最优控制及其在HCCI发动机中的应用
H. Zargarzadeh, S. Jagannathan, J. Drallmeier
{"title":"Online near optimal control of unknown nonaffine systems with application to HCCI engines","authors":"H. Zargarzadeh, S. Jagannathan, J. Drallmeier","doi":"10.1109/ADPRL.2011.5967382","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967382","url":null,"abstract":"Multi-input and multi-output (MIMO) optimal control of unknown nonaffine nonlinear systems is a challenging problem due to the presence of control inputs inside the unknown nonlinearity. In this paper, the optimal control of MIMO nonlinear nonaffine discrete-time systems in input-output form is considered when the internal dynamics are unknown. First, the nonaffine nonlinear system is converted into an affine-like equivalent nonlinear system under the assumption that the higher-order terms are bounded. Next, a forward-in-time Hamilton-Jaccobi-Bellman (HJB) equation-based optimal approach is developed to control the affine-like nonlinear system using neural network (NN). To overcome the need to know the control gain matrix of the affine-like system for the optimal controller, an online identifier is introduced. Lyapunov stability of the overall system including the online identifier shows that the approximate control input approaches the optimal control with a bounded error. Finally, the optimal control approach is applied to the cycle-by-cycle discrete-time representation of the experimentally validated HCCI engine which is represented as a nonaffine nonlinear system. Simulation results are included to demonstrate the efficacy of the approach in presence of actuator disturbances.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128969400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Agent self-assessment: Determining policy quality without execution 代理自评估:在不执行的情况下确定策略质量
A. Hans, S. Düll, S. Udluft
{"title":"Agent self-assessment: Determining policy quality without execution","authors":"A. Hans, S. Düll, S. Udluft","doi":"10.1109/ADPRL.2011.5967358","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967358","url":null,"abstract":"With the development of data-efficient reinforcement learning (RL) methods, a promising data-driven solution for optimal control of complex technical systems has become available. For the application of RL to a technical system, it is usually required to evaluate a policy before actually applying it to ensure it operates the system safely and within required performance bounds. In benchmark applications one can use the system dynamics directly to measure the policy quality. In real applications, however, this might be too expensive or even impossible. Being unable to evaluate the policy without using the actual system hinders the application of RL to autonomous controllers. As a first step toward agent self-assessment, we deal with discrete MDPs in this paper. We propose to use the value function along with its uncertainty to assess a policy's quality and show that, when dealing with an MDP estimated from observations, the value function itself can be misleading. We address this problem by determining the value function's uncertainty through uncertainty propagation and evaluate the approach using a number of benchmark applications.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114632424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Moving least-squares approximations for linearly-solvable MDP 线性可解MDP的移动最小二乘逼近
Mingyuan Zhong, E. Todorov
{"title":"Moving least-squares approximations for linearly-solvable MDP","authors":"Mingyuan Zhong, E. Todorov","doi":"10.1109/ADPRL.2011.5967383","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967383","url":null,"abstract":"By introducing Linearly-solvable Markov Decision Process (LMDP), a general class of nonlinear stochastic optimal control problems can be reduced to solving linear problems. However, in practice, LMDP defined on continuous state space remain difficult due to high dimensionality of the state space. Here we describe a new framework for finding this solution by using a moving least-squares approximation. We use efficient iterative solvers which do not require matrix factorization, so we could handle large numbers of bases. The basis functions are constructed based on collocation states which change over iterations of the algorithm, so as to provide higher resolution at the regions of state space that are visited more often. The shape of the bases is automatically defined given the collocation states, in a way that avoids gaps in the coverage and avoids fitting a tremendous amount of parameters. Numerical results on test problems are provided and demonstrate good behavior when scaled to problems with high dimensionality.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130270688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures 通过正则化项增强情景自然演员-评论家算法以稳定控制结构的学习
A. Witsch, R. Reichle, K. Geihs, S. Lange, Martin A. Riedmiller
{"title":"Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures","authors":"A. Witsch, R. Reichle, K. Geihs, S. Lange, Martin A. Riedmiller","doi":"10.1109/ADPRL.2011.5967352","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967352","url":null,"abstract":"Incomplete or imprecise models of control systems make it difficult to find an appropriate structure and parameter set for a corresponding control policy. These problems are addressed by reinforcement learning algorithms like policy gradient methods. We describe how to stabilise the policy gradient descent by introducing a regularisation term to enhance the episodic natural actor-critic approach. This allows a more policy independent usage.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active learning for personalizing treatment 主动学习个性化治疗
Kun Deng, Joelle Pineau, S. Murphy
{"title":"Active learning for personalizing treatment","authors":"Kun Deng, Joelle Pineau, S. Murphy","doi":"10.1109/ADPRL.2011.5967348","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967348","url":null,"abstract":"The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the different categories of individuals. In general, ITRs can be constructed using data from clinical trials, however these are generally very costly to run. In order to reduce the cost of learning an ITR, we explore active learning techniques designed to carefully decide whom to recruit, and which treatment to assign, throughout the online conduct of the clinical trial. As an initial investigation, we focus on simple ITRs that utilize a small number of subpopulation categories to personalize treatment. To minimize the maximal uncertainty regarding the treatment effects for each subpopulation, we propose the use of a minimax bandit model and provide an active learning policy for solving it. We evaluate our active learning policy using simulated data and data modeled after a clinical trial involving treatments for depressed individuals. We contrast this policy with other plausible active learning policies. The techniques presented in the paper may be generalized to tackle problems of efficient exploration in other domains.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115467056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems 未知非线性离散系统的自适应动态规划最优控制
Derong Liu, Ding Wang, Dongbin Zhao
{"title":"Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems","authors":"Derong Liu, Ding Wang, Dongbin Zhao","doi":"10.1109/ADPRL.2011.5967357","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967357","url":null,"abstract":"An intelligent optimal control scheme for unknown nonlinear discrete-time systems with discount factor in the cost function is proposed in this paper. An iterative adaptive dynamic programming (ADP) algorithm via globalized dual heuristic programming (GDHP) technique is developed to obtain the optimal controller with convergence analysis. Three neural networks are used as parametric structures to facilitate the implementation of the iterative algorithm, which will approximate at each iteration the cost function, the optimal control law, and the unknown nonlinear system, respectively. Two simulation examples are provided to verify the effectiveness of the presented optimal control approach.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"320 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115603828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Supervised adaptive dynamic programming based adaptive cruise control 基于监督自适应动态规划的自适应巡航控制
Dongbin Zhao, Zhaohui Hu
{"title":"Supervised adaptive dynamic programming based adaptive cruise control","authors":"Dongbin Zhao, Zhaohui Hu","doi":"10.1109/ADPRL.2011.5967371","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967371","url":null,"abstract":"This paper proposes a supervised adaptive dynamic programming (SADP) algorithm for the full range Adaptive cruise control (ACC) system. The full range ACC system considers both the ACC situation in highway system and the stop and go (SG) situation in urban street way system. It can autonomously drive the host vehicle with desired speed and distance to the preceding vehicle in both situations. A traditional adaptive dynamic programming (ADP) algorithm is suited for this problem, but it suffers from the low learning efficiency. We propose the concept of inducing range to construct the supervisor and finally formulate the SADP algorithm, which greatly speeds up the learning efficiency. Several driving scenarios are designed and tested with the trained controller compared to traditional ones by simulation results, showing that trained SADP performs very well in all the scenarios, so that it provides an effective approach for the full range ACC problem.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123472826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark 改进的神经拟合Q迭代应用于一种新的计算机游戏和学习基准
T. Gabel, C. Lutz, Martin A. Riedmiller
{"title":"Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark","authors":"T. Gabel, C. Lutz, Martin A. Riedmiller","doi":"10.1109/ADPRL.2011.5967361","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967361","url":null,"abstract":"Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"473 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129665173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Evolutionary value function approximation 演化值函数近似
M. Davarynejad, J. V. Ast, J. Vrancken, J. Berg
{"title":"Evolutionary value function approximation","authors":"M. Davarynejad, J. V. Ast, J. Vrancken, J. Berg","doi":"10.1109/ADPRL.2011.5967349","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967349","url":null,"abstract":"The standard reinforcement learning algorithms have proven to be effective tools for letting an agent learn from its experiences generated by its interaction with an environment. In this paper an evolutionary approach is proposed to accelerate learning speed in tabular reinforcement learning algorithms. In the proposed approach, in order to accelerate the learning speed of agents, the state-value is not only approximated, but through using the concept of evolutionary algorithms, they are evolved, with extra bonus of giving each agent the opportunity to exchange its knowledge. The proposed evolutionary value function approximation, helps in moving from a single isolated learning stage to cooperative exploration of the search space and accelerating learning speed. The performance of the proposed algorithm is compared with the standard SARSA algorithm and some of its properties are discussed. The experimental analysis confirms that the proposed approach has higher convergent speed with a negligible increase in computational complexity.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"21 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126944030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Complex object manipulation with hierarchical optimal control 基于层次最优控制的复杂对象操纵
Alex Simpkins, E. Todorov
{"title":"Complex object manipulation with hierarchical optimal control","authors":"Alex Simpkins, E. Todorov","doi":"10.1109/ADPRL.2011.5967393","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967393","url":null,"abstract":"This paper develops a hierarchical model predictive optimal control solution to the complex and interesting problem of object manipulation. Controlling an object through external manipulators is challenging, involving nonlinearities, redundancy, high dimensionality, contact breaking, underactuation, and more. Manipulation can be framed as essentially the same problem as locomotion (with slightly different parameters). Significant progress has recently been made on the locomotion problem. We develop a methodology to address the challenges of manipulation, extending the most current solutions to locomotion and solving the problem fast enough to run in a realtime implementation. We accomplish this by breaking up the single difficult problem into smaller more tractable problems. Results are presented supporting this method.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125527037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信