Strategic Choices: Small Budgets and Simple Regret

Cheng-Wei Chou, Ping-Chiang Chou, Chang-Shing Lee, D. St-Pierre, O. Teytaud, Mei-Hui Wang, Li-Wen Wu, Shi-Jim Yen
{"title":"Strategic Choices: Small Budgets and Simple Regret","authors":"Cheng-Wei Chou, Ping-Chiang Chou, Chang-Shing Lee, D. St-Pierre, O. Teytaud, Mei-Hui Wang, Li-Wen Wu, Shi-Jim Yen","doi":"10.1109/TAAI.2012.35","DOIUrl":null,"url":null,"abstract":"In many decision problems, there are two levels of choice: The first one is strategic and the second is tactical. We formalize the difference between both and discuss the relevance of the bandit literature for strategic decisions and test the quality of different bandit algorithms in real world examples such as board games and card games. For exploration-exploitation algorithm, we evaluate the Upper Confidence Bounds and Exponential Weights, as well as algorithms designed for simple regret, such as Successive Reject. For the exploitation, we also evaluate Bernstein Races and Uniform Sampling. As for the recommandation part, we test Empirically Best Arm, Most Played, Lower ConfidenceBounds and Empirical Distribution. In the one-player case, we recommend Upper Confidence Bound as an exploration algorithm (and in particular its variants adaptUCBE for parameter-free simple regret) and Lower Confidence Bound or Most Played Arm as recommendation algorithms. In the two-player case, we point out the commodity and efficiency of the EXP3 algorithm, and the very clear improvement provided by the truncation algorithm TEXP3. Incidentally our algorithm won some games against professional players in kill-all Go (to the best of our knowledge, for the first time in computer games).","PeriodicalId":385063,"journal":{"name":"2012 Conference on Technologies and Applications of Artificial Intelligence","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Conference on Technologies and Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2012.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In many decision problems, there are two levels of choice: The first one is strategic and the second is tactical. We formalize the difference between both and discuss the relevance of the bandit literature for strategic decisions and test the quality of different bandit algorithms in real world examples such as board games and card games. For exploration-exploitation algorithm, we evaluate the Upper Confidence Bounds and Exponential Weights, as well as algorithms designed for simple regret, such as Successive Reject. For the exploitation, we also evaluate Bernstein Races and Uniform Sampling. As for the recommandation part, we test Empirically Best Arm, Most Played, Lower ConfidenceBounds and Empirical Distribution. In the one-player case, we recommend Upper Confidence Bound as an exploration algorithm (and in particular its variants adaptUCBE for parameter-free simple regret) and Lower Confidence Bound or Most Played Arm as recommendation algorithms. In the two-player case, we point out the commodity and efficiency of the EXP3 algorithm, and the very clear improvement provided by the truncation algorithm TEXP3. Incidentally our algorithm won some games against professional players in kill-all Go (to the best of our knowledge, for the first time in computer games).
战略选择:小预算和简单的遗憾
在许多决策问题中,有两个层次的选择:第一个是战略的,第二个是战术的。我们形式化了两者之间的区别,讨论了强盗文献与战略决策的相关性,并在现实世界的例子(如棋盘游戏和纸牌游戏)中测试了不同强盗算法的质量。对于探索-开发算法,我们评估了上置信度和指数权重,以及为简单后悔设计的算法,如连续拒绝。为了开发,我们还评估了伯恩斯坦种族和均匀抽样。对于推荐部分,我们测试了经验最佳臂,最发挥,低置信限和经验分布。在单人游戏的情况下,我们推荐Upper Confidence Bound作为一种探索算法(特别是它的变体adaptUCBE用于无参数的简单遗憾),Lower Confidence Bound或Most Played Arm作为推荐算法。在双玩家情况下,我们指出了EXP3算法的实用性和效率,以及截断算法TEXP3提供的非常明显的改进。顺便说一句,我们的算法在与职业棋手的围棋对弈中赢得了一些胜利(据我们所知,这是第一次在电脑游戏中获胜)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信