2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)最新文献

筛选
英文 中文
Reinforcement learning algorithms for solving classification problems 解决分类问题的强化学习算法
M. Wiering, H. V. Hasselt, Auke-Dirk Pietersma, Lambert Schomaker
{"title":"Reinforcement learning algorithms for solving classification problems","authors":"M. Wiering, H. V. Hasselt, Auke-Dirk Pietersma, Lambert Schomaker","doi":"10.1109/ADPRL.2011.5967372","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967372","url":null,"abstract":"We describe a new framework for applying reinforcement learning (RL) algorithms to solve classification tasks by letting an agent act on the inputs and learn value functions. This paper describes how classification problems can be modeled using classification Markov decision processes and introduces the Max-Min ACLA algorithm, an extension of the novel RL algorithm called actor-critic learning automaton (ACLA). Experiments are performed using 8 datasets from the UCI repository, where our RL method is combined with multi-layer perceptrons that serve as function approximators. The RL method is compared to conventional multi-layer perceptrons and support vector machines and the results show that our method slightly outperforms the multi-layer perceptron and performs equally well as the support vector machine. Finally, many possible extensions are described to our basic method, so that much future research can be done to make the proposed method even better.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130840259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Global optimal strategies of a class of finite-horizon continuous-time nonaffine nonlinear zero-sum game using a new iteration algorithm 一类有限视界连续时间非仿射非线性零和博弈的全局最优策略
Xin Zhang, Huaguang Zhang, Lili Cui, Yanhong Luo
{"title":"Global optimal strategies of a class of finite-horizon continuous-time nonaffine nonlinear zero-sum game using a new iteration algorithm","authors":"Xin Zhang, Huaguang Zhang, Lili Cui, Yanhong Luo","doi":"10.1109/ADPRL.2011.5967360","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967360","url":null,"abstract":"In this paper we ami to solve the global optimal strategies of a class of finite-horizon continuous-time nonaffine nonlinear zero-sum game. The idea is to use a iterative algorithm to obtain the saddle point. The iterative algorithm is between two sequences which are a sequence of linear quadratic zero-sum game and a sequence of Riccati differential equation. The necessary conditions of global optimal strategies are established. A simulation example is given to illustrate the perfoermance of the proposed approach.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127159721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Tree-based variable selection for dimensionality reduction of large-scale control systems 基于树的大型控制系统降维变量选择
A. Castelletti, S. Galelli, Marcello Restelli, R. Soncini-Sessa
{"title":"Tree-based variable selection for dimensionality reduction of large-scale control systems","authors":"A. Castelletti, S. Galelli, Marcello Restelli, R. Soncini-Sessa","doi":"10.1109/ADPRL.2011.5967387","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967387","url":null,"abstract":"This paper is about dimensionality reduction by variable selection in high-dimensional real-world control problems, where designing controllers by conventional means is either impractical or results in poor performance.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114389785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Higher order Q-Learning 高阶q学习
Ashley D. Edwards, W. Pottenger
{"title":"Higher order Q-Learning","authors":"Ashley D. Edwards, W. Pottenger","doi":"10.1109/ADPRL.2011.5967385","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967385","url":null,"abstract":"Higher order learning is a statistical relational learning framework in which relationships between different instances of the same class are leveraged (Ganiz, Lytkin and Pottenger, 2009). Learning can be supervised or unsupervised. In contrast, reinforcement learning (Q-Learning) is a technique for learning in an unknown state space. Action selection is often based on a greedy, or epsilon greedy approach. The problem with this approach is that there is often a large amount of initial exploration before convergence. In this article we introduce a novel approach to this problem that treats a state space as a collection of data from which latent information can be extrapolated. From this data, we classify actions as leading to a high reward or low reward, and formulate behaviors based on this information. We provide experimental evidence that this technique drastically reduces the amount of exploration required in the initial stages of learning. We evaluate our algorithm in a well-known reinforcement learning domain, grid-world.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116072186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Active exploration by searching for experiments that falsify the computed control policy 主动探索,寻找伪造计算控制策略的实验
R. Fonteneau, S. Murphy, L. Wehenkel, D. Ernst
{"title":"Active exploration by searching for experiments that falsify the computed control policy","authors":"R. Fonteneau, S. Murphy, L. Wehenkel, D. Ernst","doi":"10.1109/ADPRL.2011.5967364","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967364","url":null,"abstract":"We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124790807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Online adaptive learning of optimal control solutions using integral reinforcement learning 使用积分强化学习的最优控制解的在线自适应学习
K. Vamvoudakis, D. Vrabie, F. Lewis
{"title":"Online adaptive learning of optimal control solutions using integral reinforcement learning","authors":"K. Vamvoudakis, D. Vrabie, F. Lewis","doi":"10.1109/ADPRL.2011.5967359","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967359","url":null,"abstract":"In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system's drift dynamics. The adaptive algorithm is based on policy iteration, and it is implemented on an actor/critic structure. Both actor and critic neural networks are adapted simultaneously a persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129789256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Application of reinforcement learning-based algorithms in CO2 allowance and electricity markets 基于强化学习的算法在二氧化碳配额和电力市场中的应用
V. Nanduri
{"title":"Application of reinforcement learning-based algorithms in CO2 allowance and electricity markets","authors":"V. Nanduri","doi":"10.1109/ADPRL.2011.5967367","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967367","url":null,"abstract":"Climate change is one of the most important challenges faced by the world this century. In the U.S., the electric power industry is the largest emitter of CO2, contributing to the climate crisis. Federal emissions control bills in the form of cap-and-trade programs are currently idling in the U.S. Congress. In the mean time, ten states in the northeastern U.S. have adopted a regional cap-and-trade program to reduce CO2 levels and also to increase investments in cleaner technologies. Many of the states in which the cap-and-trade programs are active operate under a restructured market paradigm, where generators compete to supply power. This research presents a bi-level game-theoretic model to capture competition between generators in cap-and-trade markets and restructured electricity markets. The solution to the game-theoretic model is obtained using a reinforcement learning based algorithm.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128236428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Model-building semi-Markov adaptive critics 模型构建半马尔可夫自适应批评
A. Gosavi, S. Murray, Jiaqiao Hu
{"title":"Model-building semi-Markov adaptive critics","authors":"A. Gosavi, S. Murray, Jiaqiao Hu","doi":"10.1109/ADPRL.2011.5967374","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967374","url":null,"abstract":"Adaptive or actor critics are a class of reinforcement learning (RL) or approximate dynamic programming (ADP) algorithms in which one searches over stochastic policies in order to determine the optimal deterministic policy. Classically, these algorithms have been studied for Markov decision processes (MDPs) in the context of model-free updates in which transition probabilities are avoided altogether. A model-free version for the semi-MDP (SMDP) for discounted reward in which the transition time of each transition can be a random variable was proposed in Gosavi [1]. In this paper, we propose a variant in which the transition probability model is built simultaneously with the value function and action-probability functions. While our new algorithm does not require the transition probabilities apriori, it generates them along with the estimation of the value function and the action-probability functions required in adaptive critics. Model-building and model-based versions of algorithms have numerous advantages in contrast to their model-free counterparts. In particular, they are more stable and may require less training. However the additional steps of building the model may require increased storage in the computer's memory. In addition to enumerating potential application areas for our algorithm, we will analyze the advantages and disadvantages of model building.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134134197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Feedback controller parameterizations for Reinforcement Learning 强化学习的反馈控制器参数化
John W. Roberts, I. Manchester, Russ Tedrake
{"title":"Feedback controller parameterizations for Reinforcement Learning","authors":"John W. Roberts, I. Manchester, Russ Tedrake","doi":"10.1109/ADPRL.2011.5967370","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967370","url":null,"abstract":"Reinforcement Learning offers a very general framework for learning controllers, but its effectiveness is closely tied to the controller parameterization used. Especially when learning feedback controllers for weakly stable systems, ineffective parameterizations can result in unstable controllers and poor performance both in terms of learning convergence and in the cost of the resulting policy. In this paper we explore four linear controller parameterizations in the context of REINFORCE, applying them to the control of a reaching task with a linearized flexible manipulator. We find that some natural but naive parameterizations perform very poorly, while the Youla Parameterization (a popular parameterization from the controls literature) offers a number of robustness and performance advantages.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125012485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Information space receding horizon control 信息空间后退地平线控制
S. Chakravorty, R. Erwin
{"title":"Information space receding horizon control","authors":"S. Chakravorty, R. Erwin","doi":"10.1109/ADPRL.2011.5967362","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967362","url":null,"abstract":"In this paper, we present a receding horizon solution to the problem of optimal sensor scheduling problem. The optimal sensor scheduling problem can be posed as a Partially Observed Markov Decision Process (POMDP) whose solution is given by an Information Space (I-space) Dynamic Programming (DP) problem. We present a simulation based stochastic optimization technique that, combined with a receding horizon approach, obviates the need to solve the computationally intractable I-space DP problem. The technique is tested on a simple sensor scheduling problem where a sensor has to choose among the measurements of N dynamical systems such that the information regarding the aggregate system is maximized over an infinite horizon.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126365196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信