{"title":"Grounding subgoals in information transitions","authors":"S. V. Dijk, D. Polani","doi":"10.1109/ADPRL.2011.5967384","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967384","url":null,"abstract":"In reinforcement learning problems, the construction of subgoals has been identified as an important step to speed up learning and to enable skill transfer. For this purpose, one typically extracts states from various saliency properties of an MDP transition graph, most notably bottleneck states. Here we introduce an alternative approach to this problem: assuming a family of MDPs with multiple goals but with a fixed transition graph, we introduce the relevant goal information as the amount of Shannon information that the agent needs to maintain about the current goal at a given state to select the appropriate action. We show that there are distinct transition states in the MDP at which new relevant goal information has to be considered for selecting the next action. We argue that these transition states can be interpreted as subgoals for the current task class, and we use these states to automatically create a hierarchical policy, according to the well-established Options model for hierarchical reinforcement learning.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115127208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shimon Whiteson, B. Tanner, Matthew E. Taylor, P. Stone
{"title":"Protecting against evaluation overfitting in empirical reinforcement learning","authors":"Shimon Whiteson, B. Tanner, Matthew E. Taylor, P. Stone","doi":"10.1109/ADPRL.2011.5967363","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967363","url":null,"abstract":"Empirical evaluations play an important role in machine learning. However, the usefulness of any evaluation depends on the empirical methodology employed. Designing good empirical methodologies is difficult in part because agents can overfit test evaluations and thereby obtain misleadingly high scores. We argue that reinforcement learning is particularly vulnerable to environment overfitting and propose as a remedy generalized methodologies, in which evaluations are based on multiple environments sampled from a distribution. In addition, we consider how to summarize performance when scores from different environments may not have commensurate values. Finally, we present proof-of-concept results demonstrating how these methodologies can validate an intuitively useful range-adaptive tile coding method.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116924617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian active learning with basis functions","authors":"I. Ryzhov, Warrren B Powell","doi":"10.1109/ADPRL.2011.5967365","DOIUrl":"https://doi.org/10.1109/ADPRL.2011.5967365","url":null,"abstract":"A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the exploration/exploitation dilemma: an inaccurate approximation may lead to poor decisions, making it necessary to sometimes explore actions that appear to be suboptimal. We propose a Bayesian strategy for active learning with basis functions, based on the knowledge gradient concept from the optimal learning literature. The new method performs well in numerical experiments conducted on an energy storage problem.","PeriodicalId":406195,"journal":{"name":"2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126624389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}