强盗实验中的风险和最优策略

IF 6.6 1区 经济学 Q1 ECONOMICS
Econometrica Pub Date : 2025-06-10 DOI:10.3982/ECTA21075
Karun Adusumilli
{"title":"强盗实验中的风险和最优策略","authors":"Karun Adusumilli","doi":"10.3982/ECTA21075","DOIUrl":null,"url":null,"abstract":"<div>\n <p>We provide a decision-theoretic analysis of bandit experiments under local asymptotics. Working within the framework of diffusion processes, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines or Monte Carlo methods. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling; the risk of the latter is often twice as high.</p>\n </div>","PeriodicalId":50556,"journal":{"name":"Econometrica","volume":"93 3","pages":"1003-1029"},"PeriodicalIF":6.6000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.3982/ECTA21075","citationCount":"0","resultStr":"{\"title\":\"Risk and Optimal Policies in Bandit Experiments\",\"authors\":\"Karun Adusumilli\",\"doi\":\"10.3982/ECTA21075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>We provide a decision-theoretic analysis of bandit experiments under local asymptotics. Working within the framework of diffusion processes, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines or Monte Carlo methods. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling; the risk of the latter is often twice as high.</p>\\n </div>\",\"PeriodicalId\":50556,\"journal\":{\"name\":\"Econometrica\",\"volume\":\"93 3\",\"pages\":\"1003-1029\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.3982/ECTA21075\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Econometrica\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.3982/ECTA21075\",\"RegionNum\":1,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Econometrica","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.3982/ECTA21075","RegionNum":1,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

摘要

给出了局部渐近条件下强盗实验的决策理论分析。在扩散过程的框架内,我们为这些实验定义了合适的渐近贝叶斯和极大极小风险的概念。对于正态分布的奖励,最小贝叶斯风险可以表征为二阶偏微分方程(PDE)的解。利用实验极限方法,我们证明了在奖励的参数和非参数分布下,这种PDE表征也渐近地成立。该方法进一步描述了渐近足以限制注意力的状态变量,从而提出了一种实用的降维策略。利用稀疏矩阵例程或蒙特卡罗方法可以有效地求解具有最小贝叶斯风险的偏微分方程。我们从它们的数值解中导出了最优贝叶斯策略和极大极小策略。这些最优策略基本上主导了现有的方法,如汤普森抽样;后者的风险通常是前者的两倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Risk and Optimal Policies in Bandit Experiments

We provide a decision-theoretic analysis of bandit experiments under local asymptotics. Working within the framework of diffusion processes, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines or Monte Carlo methods. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling; the risk of the latter is often twice as high.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Econometrica
Econometrica 社会科学-数学跨学科应用
CiteScore
11.00
自引率
3.30%
发文量
75
审稿时长
6-12 weeks
期刊介绍: Econometrica publishes original articles in all branches of economics - theoretical and empirical, abstract and applied, providing wide-ranging coverage across the subject area. It promotes studies that aim at the unification of the theoretical-quantitative and the empirical-quantitative approach to economic problems and that are penetrated by constructive and rigorous thinking. It explores a unique range of topics each year - from the frontier of theoretical developments in many new and important areas, to research on current and applied economic problems, to methodologically innovative, theoretical and applied studies in econometrics. Econometrica maintains a long tradition that submitted articles are refereed carefully and that detailed and thoughtful referee reports are provided to the author as an aid to scientific research, thus ensuring the high calibre of papers found in Econometrica. An international board of editors, together with the referees it has selected, has succeeded in substantially reducing editorial turnaround time, thereby encouraging submissions of the highest quality. We strongly encourage recent Ph. D. graduates to submit their work to Econometrica. Our policy is to take into account the fact that recent graduates are less experienced in the process of writing and submitting papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信