基于模型的强化学习的重要性抽样

2012 20th Signal Processing and Communications Applications Conference (SIU) Pub Date : 2012-04-18 DOI:10.1109/SIU.2012.6204703

Orhan Sonmez, A. Cemgil

{"title":"基于模型的强化学习的重要性抽样","authors":"Orhan Sonmez, A. Cemgil","doi":"10.1109/SIU.2012.6204703","DOIUrl":null,"url":null,"abstract":"Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.","PeriodicalId":256154,"journal":{"name":"2012 20th Signal Processing and Communications Applications Conference (SIU)","volume":"30 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Importance sampling for model-based reinforcement learning\",\"authors\":\"Orhan Sonmez, A. Cemgil\",\"doi\":\"10.1109/SIU.2012.6204703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.\",\"PeriodicalId\":256154,\"journal\":{\"name\":\"2012 20th Signal Processing and Communications Applications Conference (SIU)\",\"volume\":\"30 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 20th Signal Processing and Communications Applications Conference (SIU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU.2012.6204703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 20th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2012.6204703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

大多数最先进的强化学习算法都是基于Bellman方程，并利用不动点迭代方法收敛到次优解。然而，最近的一些方法通过使用适当的图形模型将强化学习问题转化为等效的似然最大化问题。因此，它允许采用概率推理方法。在这里，我们提出了一种期望最大化方法，该方法在其e步中使用重要抽样来估计可能性，然后确定最优策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Importance sampling for model-based reinforcement learning

Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 20th Signal Processing and Communications Applications Conference (SIU)

自引率

0.00%

发文量