{"title":"Certainty equivalence control with forcing: revisited","authors":"R. Agrawal, D. Teneketzis","doi":"10.1109/CDC.1989.70538","DOIUrl":null,"url":null,"abstract":"Summary form only given, as follows. Stochastic adaptive optimization problems are considered with the objective of minimizing the rate of increase of the learning loss, i.e. the additional cost one has to pay due to the inbuilt learning tasks in such problems. In particular, an examination is made of two problems: the multiarmed bandit problem, and the adaptive control of Markov chains. Previous work has shown that the minimum rate of increase of the learning loss for these problems is typically O(log n). The schemes that achieve this minimum are quite complicated. The authors show that, with simple schemes of the certainty equivalence control with forcing type, one can come arbitrarily close to the optimal performance. Specifically, they construct a class of schemes so that, for any delta >0, they have a scheme whose learning loss is O((log n)/sup 1+ delta /).<<ETX>>","PeriodicalId":156565,"journal":{"name":"Proceedings of the 28th IEEE Conference on Decision and Control,","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th IEEE Conference on Decision and Control,","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.1989.70538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Summary form only given, as follows. Stochastic adaptive optimization problems are considered with the objective of minimizing the rate of increase of the learning loss, i.e. the additional cost one has to pay due to the inbuilt learning tasks in such problems. In particular, an examination is made of two problems: the multiarmed bandit problem, and the adaptive control of Markov chains. Previous work has shown that the minimum rate of increase of the learning loss for these problems is typically O(log n). The schemes that achieve this minimum are quite complicated. The authors show that, with simple schemes of the certainty equivalence control with forcing type, one can come arbitrarily close to the optimal performance. Specifically, they construct a class of schemes so that, for any delta >0, they have a scheme whose learning loss is O((log n)/sup 1+ delta /).<>