缩放多武装强盗算法

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI:10.1145/3292500.3330862

Edouard Fouché, Junpei Komiyama, Klemens Böhm

{"title":"缩放多武装强盗算法","authors":"Edouard Fouché, Junpei Komiyama, Klemens Böhm","doi":"10.1145/3292500.3330862","DOIUrl":null,"url":null,"abstract":"The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, the decision maker selects a set of arms and observes a reward from each of the chosen arms. In this paper, we present a variant of the problem, which we call the Scaling MAB (S-MAB): The goal of the decision maker is not only to maximize the cumulative rewards, i.e., choosing the arms with the highest expected reward, but also to decide how many arms to select so that, in expectation, the cost of selecting arms does not exceed the rewards. This problem is relevant to many real-world applications, e.g., online advertising, financial investments or data stream monitoring. We propose an extension of Thompson Sampling, which has strong theoretical guarantees and is reported to perform well in practice. Our extension dynamically controls the number of arms to draw. Furthermore, we combine the proposed method with ADWIN, a state-of-the-art change detector, to deal with non-static environments. We illustrate the benefits of our contribution via a real-world use case on predictive maintenance.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Scaling Multi-Armed Bandit Algorithms\",\"authors\":\"Edouard Fouché, Junpei Komiyama, Klemens Böhm\",\"doi\":\"10.1145/3292500.3330862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, the decision maker selects a set of arms and observes a reward from each of the chosen arms. In this paper, we present a variant of the problem, which we call the Scaling MAB (S-MAB): The goal of the decision maker is not only to maximize the cumulative rewards, i.e., choosing the arms with the highest expected reward, but also to decide how many arms to select so that, in expectation, the cost of selecting arms does not exceed the rewards. This problem is relevant to many real-world applications, e.g., online advertising, financial investments or data stream monitoring. We propose an extension of Thompson Sampling, which has strong theoretical guarantees and is reported to perform well in practice. Our extension dynamically controls the number of arms to draw. Furthermore, we combine the proposed method with ADWIN, a state-of-the-art change detector, to deal with non-static environments. We illustrate the benefits of our contribution via a real-world use case on predictive maintenance.\",\"PeriodicalId\":186134,\"journal\":{\"name\":\"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3292500.3330862\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292500.3330862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

多武装强盗模型(Multi-Armed Bandit, MAB)是一个基本模型，它反映了顺序决策中勘探与开发之间的两难困境。在每一个时间步骤中，决策者选择一组手臂，并从每一个选择的手臂中观察奖励。在本文中，我们提出了该问题的一个变体，我们称之为尺度MAB (S-MAB):决策者的目标不仅是最大化累积奖励，即选择期望奖励最高的武器，而且还要决定选择多少武器，以便在期望中，选择武器的成本不超过奖励。这个问题与许多现实世界的应用相关，例如，在线广告、金融投资或数据流监控。我们提出了一种扩展的汤普森抽样，它有很强的理论保证，并在实践中表现良好。我们的扩展动态控制手臂的数量绘制。此外，我们将提出的方法与最先进的变化检测器ADWIN相结合，以处理非静态环境。我们通过预测性维护的实际用例说明了我们的贡献的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scaling Multi-Armed Bandit Algorithms

The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, the decision maker selects a set of arms and observes a reward from each of the chosen arms. In this paper, we present a variant of the problem, which we call the Scaling MAB (S-MAB): The goal of the decision maker is not only to maximize the cumulative rewards, i.e., choosing the arms with the highest expected reward, but also to decide how many arms to select so that, in expectation, the cost of selecting arms does not exceed the rewards. This problem is relevant to many real-world applications, e.g., online advertising, financial investments or data stream monitoring. We propose an extension of Thompson Sampling, which has strong theoretical guarantees and is reported to perform well in practice. Our extension dynamically controls the number of arms to draw. Furthermore, we combine the proposed method with ADWIN, a state-of-the-art change detector, to deal with non-static environments. We illustrate the benefits of our contribution via a real-world use case on predictive maintenance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

自引率

0.00%

发文量