Strategic Choices: Small Budgets and Simple Regret

2012 Conference on Technologies and Applications of Artificial Intelligence Pub Date : 2012-11-16 DOI:10.1109/TAAI.2012.35

Cheng-Wei Chou, Ping-Chiang Chou, Chang-Shing Lee, D. St-Pierre, O. Teytaud, Mei-Hui Wang, Li-Wen Wu, Shi-Jim Yen

{"title":"Strategic Choices: Small Budgets and Simple Regret","authors":"Cheng-Wei Chou, Ping-Chiang Chou, Chang-Shing Lee, D. St-Pierre, O. Teytaud, Mei-Hui Wang, Li-Wen Wu, Shi-Jim Yen","doi":"10.1109/TAAI.2012.35","DOIUrl":null,"url":null,"abstract":"In many decision problems, there are two levels of choice: The first one is strategic and the second is tactical. We formalize the difference between both and discuss the relevance of the bandit literature for strategic decisions and test the quality of different bandit algorithms in real world examples such as board games and card games. For exploration-exploitation algorithm, we evaluate the Upper Confidence Bounds and Exponential Weights, as well as algorithms designed for simple regret, such as Successive Reject. For the exploitation, we also evaluate Bernstein Races and Uniform Sampling. As for the recommandation part, we test Empirically Best Arm, Most Played, Lower ConfidenceBounds and Empirical Distribution. In the one-player case, we recommend Upper Confidence Bound as an exploration algorithm (and in particular its variants adaptUCBE for parameter-free simple regret) and Lower Confidence Bound or Most Played Arm as recommendation algorithms. In the two-player case, we point out the commodity and efficiency of the EXP3 algorithm, and the very clear improvement provided by the truncation algorithm TEXP3. Incidentally our algorithm won some games against professional players in kill-all Go (to the best of our knowledge, for the first time in computer games).","PeriodicalId":385063,"journal":{"name":"2012 Conference on Technologies and Applications of Artificial Intelligence","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Conference on Technologies and Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2012.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In many decision problems, there are two levels of choice: The first one is strategic and the second is tactical. We formalize the difference between both and discuss the relevance of the bandit literature for strategic decisions and test the quality of different bandit algorithms in real world examples such as board games and card games. For exploration-exploitation algorithm, we evaluate the Upper Confidence Bounds and Exponential Weights, as well as algorithms designed for simple regret, such as Successive Reject. For the exploitation, we also evaluate Bernstein Races and Uniform Sampling. As for the recommandation part, we test Empirically Best Arm, Most Played, Lower ConfidenceBounds and Empirical Distribution. In the one-player case, we recommend Upper Confidence Bound as an exploration algorithm (and in particular its variants adaptUCBE for parameter-free simple regret) and Lower Confidence Bound or Most Played Arm as recommendation algorithms. In the two-player case, we point out the commodity and efficiency of the EXP3 algorithm, and the very clear improvement provided by the truncation algorithm TEXP3. Incidentally our algorithm won some games against professional players in kill-all Go (to the best of our knowledge, for the first time in computer games).

查看原文本刊更多论文

战略选择:小预算和简单的遗憾

在许多决策问题中，有两个层次的选择:第一个是战略的，第二个是战术的。我们形式化了两者之间的区别，讨论了强盗文献与战略决策的相关性，并在现实世界的例子(如棋盘游戏和纸牌游戏)中测试了不同强盗算法的质量。对于探索-开发算法，我们评估了上置信度和指数权重，以及为简单后悔设计的算法，如连续拒绝。为了开发，我们还评估了伯恩斯坦种族和均匀抽样。对于推荐部分，我们测试了经验最佳臂，最发挥，低置信限和经验分布。在单人游戏的情况下，我们推荐Upper Confidence Bound作为一种探索算法(特别是它的变体adaptUCBE用于无参数的简单遗憾)，Lower Confidence Bound或Most Played Arm作为推荐算法。在双玩家情况下，我们指出了EXP3算法的实用性和效率，以及截断算法TEXP3提供的非常明显的改进。顺便说一句，我们的算法在与职业棋手的围棋对弈中赢得了一些胜利(据我们所知，这是第一次在电脑游戏中获胜)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 Conference on Technologies and Applications of Artificial Intelligence

自引率

0.00%

发文量