Certainty equivalence control with forcing: revisited

Proceedings of the 28th IEEE Conference on Decision and Control, Pub Date : 1989-12-13 DOI:10.1109/CDC.1989.70538

R. Agrawal, D. Teneketzis

引用次数: 16

Abstract

Summary form only given, as follows. Stochastic adaptive optimization problems are considered with the objective of minimizing the rate of increase of the learning loss, i.e. the additional cost one has to pay due to the inbuilt learning tasks in such problems. In particular, an examination is made of two problems: the multiarmed bandit problem, and the adaptive control of Markov chains. Previous work has shown that the minimum rate of increase of the learning loss for these problems is typically O(log n). The schemes that achieve this minimum are quite complicated. The authors show that, with simple schemes of the certainty equivalence control with forcing type, one can come arbitrarily close to the optimal performance. Specifically, they construct a class of schemes so that, for any delta >0, they have a scheme whose learning loss is O((log n)/sup 1+ delta /).<>

查看原文本刊更多论文

带强迫的确定性等价控制:重述

仅给出摘要形式，如下。考虑随机自适应优化问题的目标是最小化学习损失的增长率，即最小化由于此类问题中内置的学习任务而必须支付的额外成本。重点研究了多臂强盗问题和马尔可夫链的自适应控制问题。以前的研究表明，这些问题的学习损失的最小增长率通常是O(log n)。达到这个最小值的方案非常复杂。作者表明，采用强制型确定性等价控制的简单方案，可以任意接近最优性能。具体来说，他们构造了一类方案，使得对于任何>0的δ，他们都有一个学习损失为O((log n)/sup 1+ δ /)的方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 28th IEEE Conference on Decision and Control,

自引率

0.00%

发文量