{"title":"Particle Thompson Sampling with Static Particles","authors":"Zeyu Zhou, B. Hajek","doi":"10.1109/CISS56502.2023.10089653","DOIUrl":null,"url":null,"abstract":"Particle Thompson sampling (PTS) is a simple and flexible approximation of Thompson sampling for solving stochastic bandit problems. PTS circumvents the intractability of maintaining a continuous posterior distribution in Thompson sampling by replacing the continuous distribution with a discrete distribution supported at a set of weighted static particles. We analyze the dynamics of particles' weights in PTS for general stochastic bandits without assuming that the set of particles contains the unknown system parameter. It is shown that fit particles survive and unfit particles decay, with the fitness measured in KL-divergence. For Bernoulli bandit problems, all but a few fit particles decay.","PeriodicalId":243775,"journal":{"name":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS56502.2023.10089653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Particle Thompson sampling (PTS) is a simple and flexible approximation of Thompson sampling for solving stochastic bandit problems. PTS circumvents the intractability of maintaining a continuous posterior distribution in Thompson sampling by replacing the continuous distribution with a discrete distribution supported at a set of weighted static particles. We analyze the dynamics of particles' weights in PTS for general stochastic bandits without assuming that the set of particles contains the unknown system parameter. It is shown that fit particles survive and unfit particles decay, with the fitness measured in KL-divergence. For Bernoulli bandit problems, all but a few fit particles decay.