具有自适应状态聚合的马尔可夫决策过程学习算法

Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187) Pub Date : 2000-12-12 DOI:10.1109/CDC.2000.912220

J. Baras, V. Borkar

{"title":"具有自适应状态聚合的马尔可夫决策过程学习算法","authors":"J. Baras, V. Borkar","doi":"10.1109/CDC.2000.912220","DOIUrl":null,"url":null,"abstract":"We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.","PeriodicalId":217237,"journal":{"name":"Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"A learning algorithm for Markov decision processes with adaptive state aggregation\",\"authors\":\"J. Baras, V. Borkar\",\"doi\":\"10.1109/CDC.2000.912220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.\",\"PeriodicalId\":217237,\"journal\":{\"name\":\"Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2000.912220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2000.912220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

我们提出了一种基于仿真的马尔可夫决策过程学习策略的算法，该决策过程具有未知的转移律和聚合状态。状态聚合本身可以通过辅助学习算法在较慢的时间尺度上进行调整。为这两种算法提供了严格的证明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A learning algorithm for Markov decision processes with adaptive state aggregation

We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187)

自引率

0.00%

发文量