One-Armed Bandit Problem and the Mirror Descent Algorithm

IF 0.6 4区数学 Q3 MATHEMATICS

Doklady Mathematics Pub Date : 2025-03-28 DOI:10.1134/S1064562424702429

D. N. Shiyan

{"title":"One-Armed Bandit Problem and the Mirror Descent Algorithm","authors":"D. N. Shiyan","doi":"10.1134/S1064562424702429","DOIUrl":null,"url":null,"abstract":"<p>The application of the mirror descent algorithm (MDA) in the one-armed bandit problem in the minimax setting in relation to data processing has been considered. This problem has also been known as a game with nature, in which the payoff function of the player is the mathematical expectation of the total income. The player must determine the most effective method of the two available ones during the control process and ensure its preferential use. In this case, the a priori efficiency of one of the methods is known. In this paper, a modification of the MDA that makes it possible to improve the control efficiency by using additional information has been considered. The proposed strategy preserves the characteristic property of strategies for one-armed bandits: if a known action is applied once, it will be applied until the end of control. Modifications for the algorithm for single processing and for its batch version have been considered. Batch processing is interesting in that the total processing time is determined by the number of packets, and not by the original amount of data, with the possibility of providing parallel processing of data in packets. For the proposed algorithms, the optimal values of the adjustable parameters have been calculated using Monte Carlo simulation and minimax risk estimates have been obtained.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 2 supplement","pages":"S399 - S408"},"PeriodicalIF":0.6000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Doklady Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1134/S1064562424702429","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

The application of the mirror descent algorithm (MDA) in the one-armed bandit problem in the minimax setting in relation to data processing has been considered. This problem has also been known as a game with nature, in which the payoff function of the player is the mathematical expectation of the total income. The player must determine the most effective method of the two available ones during the control process and ensure its preferential use. In this case, the a priori efficiency of one of the methods is known. In this paper, a modification of the MDA that makes it possible to improve the control efficiency by using additional information has been considered. The proposed strategy preserves the characteristic property of strategies for one-armed bandits: if a known action is applied once, it will be applied until the end of control. Modifications for the algorithm for single processing and for its batch version have been considered. Batch processing is interesting in that the total processing time is determined by the number of packets, and not by the original amount of data, with the possibility of providing parallel processing of data in packets. For the proposed algorithms, the optimal values of the adjustable parameters have been calculated using Monte Carlo simulation and minimax risk estimates have been obtained.

Abstract Image

查看原文本刊更多论文

单臂强盗问题与镜像下降算法

考虑了极大极小设置下的单臂强盗问题中镜像下降算法（MDA）在数据处理中的应用。这个问题也被称为带有自然属性的游戏，其中玩家的收益函数是总收益的数学期望。玩家必须在控制过程中确定两种方法中最有效的方法，并确保其优先使用。在这种情况下，其中一种方法的先验效率是已知的。本文考虑对MDA进行修改，利用附加信息提高控制效率。所提出的策略保留了单臂强盗策略的特征属性：如果一个已知动作被应用一次，它将被应用直到控制结束。对该算法的单次处理和批处理版本进行了修改。批处理的有趣之处在于，总的处理时间是由数据包的数量决定的，而不是由原始数据量决定的，并且可以对数据包中的数据进行并行处理。对于所提出的算法，利用蒙特卡罗模拟计算了可调参数的最优值，得到了最小最大风险估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Doklady Mathematics 数学-数学

CiteScore

1.00

自引率

16.70%

发文量

审稿时长

3-6 weeks

期刊介绍： Doklady Mathematics is a journal of the Presidium of the Russian Academy of Sciences. It contains English translations of papers published in Doklady Akademii Nauk (Proceedings of the Russian Academy of Sciences), which was founded in 1933 and is published 36 times a year. Doklady Mathematics includes the materials from the following areas: mathematics, mathematical physics, computer science, control theory, and computers. It publishes brief scientific reports on previously unpublished significant new research in mathematics and its applications. The main contributors to the journal are Members of the RAS, Corresponding Members of the RAS, and scientists from the former Soviet Union and other foreign countries. Among the contributors are the outstanding Russian mathematicians.