Thompson Sampling with Unrestricted Delays

Proceedings of the 23rd ACM Conference on Economics and Computation Pub Date : 2022-02-24 DOI:10.1145/3490486.3538376

Hang Wu, Stefan Wager

引用次数: 4

Abstract

We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary delay distributions, including ones with unbounded expectation. Our bounds are qualitatively comparable to the best available bounds derived via ad-hoc algorithms, and only depend on delays via selected quantiles of the delay distributions. Furthermore, in extensive simulation experiments, we find that Thompson Sampling outperforms a number of alternative proposals, including methods specifically designed for settings with delayed feedback.

查看原文本刊更多论文

具有无限制延迟的Thompson采样

研究了具有延迟反馈的随机多臂盗匪问题的汤普森抽样性质。在具有i.i.d延迟的情况下，我们建立了具有任意延迟分布的汤普森采样的第一遗憾界，包括具有无界期望的汤普森采样。我们的边界在质量上可与通过ad-hoc算法导出的最佳可用边界相媲美，并且仅依赖于延迟分布的选定分位数的延迟。此外，在广泛的模拟实验中，我们发现汤普森采样优于许多替代方案，包括专门为延迟反馈设置设计的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23rd ACM Conference on Economics and Computation

自引率

0.00%

发文量