{"title":"Computationally efficient adaptive control algorithms for Markov chains","authors":"A. Jalali, M. Ferguson","doi":"10.1109/CDC.1989.70344","DOIUrl":null,"url":null,"abstract":"Algorithms for adaptive control of unknown finite Markov chains are proposed. The algorithms consist of two parts: part one estimates the unknown parameters; part two computes the optimal policy. In this study the emphasis is on efficient online computation of the optimal policy. No a priori knowledge of the optimal policy is assumed. The optimal policy is computed recursively online. At each step a small amount of computation is required. At each transition of the chain, only the act corresponding to the present state of the chain is updated. The algorithms are easy to implement and converge to the optimal policy in finite time.<<ETX>>","PeriodicalId":156565,"journal":{"name":"Proceedings of the 28th IEEE Conference on Decision and Control,","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th IEEE Conference on Decision and Control,","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.1989.70344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32
Abstract
Algorithms for adaptive control of unknown finite Markov chains are proposed. The algorithms consist of two parts: part one estimates the unknown parameters; part two computes the optimal policy. In this study the emphasis is on efficient online computation of the optimal policy. No a priori knowledge of the optimal policy is assumed. The optimal policy is computed recursively online. At each step a small amount of computation is required. At each transition of the chain, only the act corresponding to the present state of the chain is updated. The algorithms are easy to implement and converge to the optimal policy in finite time.<>