强盗实验中的风险和最优策略

IF 7.1 1区经济学 Q1 ECONOMICS

Econometrica Pub Date : 2025-06-10 DOI:10.3982/ECTA21075

Karun Adusumilli

{"title":"强盗实验中的风险和最优策略","authors":"Karun Adusumilli","doi":"10.3982/ECTA21075","DOIUrl":null,"url":null,"abstract":"<div>\n <p>We provide a decision-theoretic analysis of bandit experiments under local asymptotics. Working within the framework of diffusion processes, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines or Monte Carlo methods. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling; the risk of the latter is often twice as high.</p>\n </div>","PeriodicalId":50556,"journal":{"name":"Econometrica","volume":"93 3","pages":"1003-1029"},"PeriodicalIF":7.1000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.3982/ECTA21075","citationCount":"0","resultStr":"{\"title\":\"Risk and Optimal Policies in Bandit Experiments\",\"authors\":\"Karun Adusumilli\",\"doi\":\"10.3982/ECTA21075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>We provide a decision-theoretic analysis of bandit experiments under local asymptotics. Working within the framework of diffusion processes, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines or Monte Carlo methods. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling; the risk of the latter is often twice as high.</p>\\n </div>\",\"PeriodicalId\":50556,\"journal\":{\"name\":\"Econometrica\",\"volume\":\"93 3\",\"pages\":\"1003-1029\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.3982/ECTA21075\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Econometrica\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.3982/ECTA21075\",\"RegionNum\":1,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Econometrica","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.3982/ECTA21075","RegionNum":1,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

摘要

给出了局部渐近条件下强盗实验的决策理论分析。在扩散过程的框架内，我们为这些实验定义了合适的渐近贝叶斯和极大极小风险的概念。对于正态分布的奖励，最小贝叶斯风险可以表征为二阶偏微分方程（PDE）的解。利用实验极限方法，我们证明了在奖励的参数和非参数分布下，这种PDE表征也渐近地成立。该方法进一步描述了渐近足以限制注意力的状态变量，从而提出了一种实用的降维策略。利用稀疏矩阵例程或蒙特卡罗方法可以有效地求解具有最小贝叶斯风险的偏微分方程。我们从它们的数值解中导出了最优贝叶斯策略和极大极小策略。这些最优策略基本上主导了现有的方法，如汤普森抽样；后者的风险通常是前者的两倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Risk and Optimal Policies in Bandit Experiments

查看原文本刊更多论文

Risk and Optimal Policies in Bandit Experiments

We provide a decision-theoretic analysis of bandit experiments under local asymptotics. Working within the framework of diffusion processes, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines or Monte Carlo methods. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling; the risk of the latter is often twice as high.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Econometrica 社会科学-数学跨学科应用

CiteScore

11.00

自引率

3.30%

发文量

审稿时长

6-12 weeks

期刊介绍： Econometrica publishes original articles in all branches of economics - theoretical and empirical, abstract and applied, providing wide-ranging coverage across the subject area. It promotes studies that aim at the unification of the theoretical-quantitative and the empirical-quantitative approach to economic problems and that are penetrated by constructive and rigorous thinking. It explores a unique range of topics each year - from the frontier of theoretical developments in many new and important areas, to research on current and applied economic problems, to methodologically innovative, theoretical and applied studies in econometrics. Econometrica maintains a long tradition that submitted articles are refereed carefully and that detailed and thoughtful referee reports are provided to the author as an aid to scientific research, thus ensuring the high calibre of papers found in Econometrica. An international board of editors, together with the referees it has selected, has succeeded in substantially reducing editorial turnaround time, thereby encouraging submissions of the highest quality. We strongly encourage recent Ph. D. graduates to submit their work to Econometrica. Our policy is to take into account the fact that recent graduates are less experienced in the process of writing and submitting papers.