带开关约束的强盗中的相变

ERN: Other Econometrics: Mathematical Methods & Programming (Topic) Pub Date : 2019-04-30 DOI:10.2139/ssrn.3380783

D. Simchi-Levi, Yunzong Xu

{"title":"带开关约束的强盗中的相变","authors":"D. Simchi-Levi, Yunzong Xu","doi":"10.2139/ssrn.3380783","DOIUrl":null,"url":null,"abstract":"We consider the classical stochastic multi-armed bandit problem with a constraint on the total cost incurred by switching between actions. We prove matching upper and lower bounds on regret and provide near-optimal algorithms for this problem. Surprisingly, we discover phase transitions and cyclic phenomena of the optimal regret. That is, we show that associated with the multi-armed bandit problem, there are phases defined by the number of arms and switching costs, where the regret upper and lower bounds in each phase remain the same and drop significantly between phases. The results enable us to fully characterize the trade-off between regret and incurred switching cost in the stochastic multi-armed bandit problem, contributing new insights to this fundamental problem. Under the general switching cost structure, the results reveal a deep connection between bandit problems and graph traversal problems, such as the shortest Hamiltonian path problem.","PeriodicalId":365755,"journal":{"name":"ERN: Other Econometrics: Mathematical Methods & Programming (Topic)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Phase Transitions in Bandits with Switching Constraints\",\"authors\":\"D. Simchi-Levi, Yunzong Xu\",\"doi\":\"10.2139/ssrn.3380783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the classical stochastic multi-armed bandit problem with a constraint on the total cost incurred by switching between actions. We prove matching upper and lower bounds on regret and provide near-optimal algorithms for this problem. Surprisingly, we discover phase transitions and cyclic phenomena of the optimal regret. That is, we show that associated with the multi-armed bandit problem, there are phases defined by the number of arms and switching costs, where the regret upper and lower bounds in each phase remain the same and drop significantly between phases. The results enable us to fully characterize the trade-off between regret and incurred switching cost in the stochastic multi-armed bandit problem, contributing new insights to this fundamental problem. Under the general switching cost structure, the results reveal a deep connection between bandit problems and graph traversal problems, such as the shortest Hamiltonian path problem.\",\"PeriodicalId\":365755,\"journal\":{\"name\":\"ERN: Other Econometrics: Mathematical Methods & Programming (Topic)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Other Econometrics: Mathematical Methods & Programming (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3380783\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Other Econometrics: Mathematical Methods & Programming (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3380783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

我们考虑了典型的随机多臂盗匪问题，该问题具有在不同行动之间切换所产生的总成本约束。我们证明了遗憾的上界和下界的匹配，并给出了近似最优算法。令人惊讶的是，我们发现了最优后悔的相变和循环现象。也就是说，我们证明了与多臂强盗问题相关，存在由臂数和切换成本定义的阶段，其中每个阶段的后悔上界和下界保持不变，并且在阶段之间显著下降。该结果使我们能够充分表征随机多臂强盗问题中后悔与发生的转换成本之间的权衡关系，为这一基本问题提供了新的见解。在一般的切换代价结构下，结果揭示了强盗问题与图遍历问题(如最短哈密顿路径问题)之间的深刻联系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Phase Transitions in Bandits with Switching Constraints

We consider the classical stochastic multi-armed bandit problem with a constraint on the total cost incurred by switching between actions. We prove matching upper and lower bounds on regret and provide near-optimal algorithms for this problem. Surprisingly, we discover phase transitions and cyclic phenomena of the optimal regret. That is, we show that associated with the multi-armed bandit problem, there are phases defined by the number of arms and switching costs, where the regret upper and lower bounds in each phase remain the same and drop significantly between phases. The results enable us to fully characterize the trade-off between regret and incurred switching cost in the stochastic multi-armed bandit problem, contributing new insights to this fundamental problem. Under the general switching cost structure, the results reveal a deep connection between bandit problems and graph traversal problems, such as the shortest Hamiltonian path problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ERN: Other Econometrics: Mathematical Methods & Programming (Topic)

自引率

0.00%

发文量