{"title":"基于模型的强化学习的重要性抽样","authors":"Orhan Sonmez, A. Cemgil","doi":"10.1109/SIU.2012.6204703","DOIUrl":null,"url":null,"abstract":"Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.","PeriodicalId":256154,"journal":{"name":"2012 20th Signal Processing and Communications Applications Conference (SIU)","volume":"30 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Importance sampling for model-based reinforcement learning\",\"authors\":\"Orhan Sonmez, A. Cemgil\",\"doi\":\"10.1109/SIU.2012.6204703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.\",\"PeriodicalId\":256154,\"journal\":{\"name\":\"2012 20th Signal Processing and Communications Applications Conference (SIU)\",\"volume\":\"30 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 20th Signal Processing and Communications Applications Conference (SIU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU.2012.6204703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 20th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2012.6204703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Importance sampling for model-based reinforcement learning
Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.