{"title":"Partially Observable Contextual Bandits with Linear Payoffs","authors":"Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh","doi":"arxiv-2409.11521","DOIUrl":null,"url":null,"abstract":"The standard contextual bandit framework assumes fully observable and\nactionable contexts. In this work, we consider a new bandit setting with\npartially observable, correlated contexts and linear payoffs, motivated by the\napplications in finance where decision making is based on market information\nthat typically displays temporal correlation and is not fully observed. We make\nthe following contributions marrying ideas from statistical signal processing\nwith bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, which\nintegrates system identification, filtering, and classic contextual bandit\nalgorithms into an iterative method alternating between latent parameter\nestimation and decision making. (ii) We analyze EMKF-Bandit when we select\nThompson sampling as the bandit algorithm and show that it incurs a sub-linear\nregret under conditions on filtering. (iii) We conduct numerical simulations\nthat demonstrate the benefits and practical applicability of the proposed\npipeline.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The standard contextual bandit framework assumes fully observable and
actionable contexts. In this work, we consider a new bandit setting with
partially observable, correlated contexts and linear payoffs, motivated by the
applications in finance where decision making is based on market information
that typically displays temporal correlation and is not fully observed. We make
the following contributions marrying ideas from statistical signal processing
with bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, which
integrates system identification, filtering, and classic contextual bandit
algorithms into an iterative method alternating between latent parameter
estimation and decision making. (ii) We analyze EMKF-Bandit when we select
Thompson sampling as the bandit algorithm and show that it incurs a sub-linear
regret under conditions on filtering. (iii) We conduct numerical simulations
that demonstrate the benefits and practical applicability of the proposed
pipeline.