Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

Synthesis Lectures on Communication Networks Pub Date : 2019-11-20 DOI:10.2200/s00941ed2v01y201907cnt022

Qing Zhao

引用次数: 25

Abstract

Abstract Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments. Since the first bandit problem posed by Thompson in 1933 for the application...

查看原文本刊更多论文

多武装强盗:网络在线学习的理论与应用

多臂盗匪问题属于未知环境下的最优顺序决策和学习问题。自从1933年汤普森提出的第一个土匪问题申请…

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Synthesis Lectures on Communication Networks

自引率

0.00%

发文量