Self-fulfilling Bandits: Dynamic Selection in Algorithmic Decision-making

ERN: Other Econometrics: Econometric & Statistical Methods - Special Topics (Topic) Pub Date : 2021-10-19 DOI:10.2139/ssrn.3912989

Jin Li, Ye Luo, Xiaowei Zhang

引用次数: 1

Abstract

This paper identifies and addresses dynamic selection problems that arise in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, we show that a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose a class of algorithms to correct for the bias by incorporating instrumental variables into leading online learning algorithms. These algorithms lead to the true parameter values and meanwhile attain low (logarithmic-like) regret levels. We further prove a central limit theorem for statistical inference of the parameters of interest. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.

查看原文本刊更多论文

自我实现的强盗:算法决策中的动态选择

本文识别并解决了在使用内生数据的在线学习算法中出现的动态选择问题。在上下文多臂强盗模型中，我们发现了一种新的偏差(自我实现偏差)，因为数据的内生性影响了决策的选择，影响了未来收集和分析数据的分布。我们提出了一类算法，通过将工具变量纳入领先的在线学习算法来纠正偏差。这些算法导致真实的参数值，同时获得低(对数样)后悔水平。进一步证明了感兴趣参数统计推断的中心极限定理。为了建立理论属性，我们开发了一种通用技术来解开数据和动作之间的相互依赖关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ERN: Other Econometrics: Econometric & Statistical Methods - Special Topics (Topic)

自引率

0.00%

发文量