Online Decision-Making with High-Dimensional Covariates

Operations Research eJournal Pub Date : 2015-06-05 DOI:10.2139/ssrn.2661896

Hamsa Bastani, M. Bayati

{"title":"Online Decision-Making with High-Dimensional Covariates","authors":"Hamsa Bastani, M. Bayati","doi":"10.2139/ssrn.2661896","DOIUrl":null,"url":null,"abstract":"Big data has enabled decision-makers to tailor decisions at the individual-level in a variety of domains such as personalized medicine and online advertising. This involves learning a model of decision rewards conditional on individual-specific covariates. In many practical settings, these covariates are high-dimensional; however, typically only a small subset of the observed features are predictive of a decision's success. We formulate this problem as a multi-armed bandit with high-dimensional covariates, and present a new efficient bandit algorithm based on the LASSO estimator. Our regret analysis establishes that our algorithm achieves near-optimal performance in comparison to an oracle that knows all the problem parameters. The key step in our analysis is proving a new oracle inequality that guarantees the convergence of the LASSO estimator despite the non-i.i.d. data induced by the bandit policy. Furthermore, we illustrate the practical relevance of our algorithm by evaluating it on a real-world clinical problem of warfarin dosing. A patient's optimal warfarin dosage depends on the patient's genetic profile and medical records; incorrect initial dosage may result in adverse consequences such as stroke or bleeding. We show that our algorithm outperforms existing bandit methods as well as physicians to correctly dose a majority of patients.","PeriodicalId":275253,"journal":{"name":"Operations Research eJournal","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"245","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operations Research eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2661896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 245

Abstract

Big data has enabled decision-makers to tailor decisions at the individual-level in a variety of domains such as personalized medicine and online advertising. This involves learning a model of decision rewards conditional on individual-specific covariates. In many practical settings, these covariates are high-dimensional; however, typically only a small subset of the observed features are predictive of a decision's success. We formulate this problem as a multi-armed bandit with high-dimensional covariates, and present a new efficient bandit algorithm based on the LASSO estimator. Our regret analysis establishes that our algorithm achieves near-optimal performance in comparison to an oracle that knows all the problem parameters. The key step in our analysis is proving a new oracle inequality that guarantees the convergence of the LASSO estimator despite the non-i.i.d. data induced by the bandit policy. Furthermore, we illustrate the practical relevance of our algorithm by evaluating it on a real-world clinical problem of warfarin dosing. A patient's optimal warfarin dosage depends on the patient's genetic profile and medical records; incorrect initial dosage may result in adverse consequences such as stroke or bleeding. We show that our algorithm outperforms existing bandit methods as well as physicians to correctly dose a majority of patients.

查看原文本刊更多论文

具有高维协变量的在线决策

大数据使决策者能够在个性化医疗和在线广告等各种领域定制个人层面的决策。这涉及到学习一个以个体特定协变量为条件的决策奖励模型。在许多实际设置中，这些协变量是高维的;然而，通常只有一小部分观察到的特征可以预测决策的成功。我们将该问题表述为具有高维协变量的多臂盗匪问题，并提出了一种新的基于LASSO估计的高效盗匪算法。我们的遗憾分析表明，与知道所有问题参数的oracle相比，我们的算法实现了近乎最佳的性能。我们分析的关键步骤是证明了一个新的oracle不等式，该不等式保证了LASSO估计器在非id情况下的收敛性。由强盗政策引起的数据。此外，我们通过评估华法林剂量的实际临床问题来说明我们的算法的实际相关性。患者的最佳华法林剂量取决于患者的遗传概况和医疗记录;不正确的初始剂量可能导致不良后果，如中风或出血。我们表明，我们的算法优于现有的土匪方法以及医生正确给大多数患者剂量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Operations Research eJournal

自引率

0.00%

发文量