{"title":"An Exploratory Analysis of the Multi-Armed Bandit Problem","authors":"Stanton Hudja, Daniel Woods","doi":"10.2139/ssrn.3942930","DOIUrl":null,"url":null,"abstract":"This paper conducts a laboratory experiment to analyze individual behavior in multi-armed bandit problems. Our experiment consists of four types of multi-armed bandit problems: (i) a two-armed indefinite horizon problem, (ii) a two-armed finite horizon problem, (iii) a three-armed indefinite horizon problem, and (iv) a three-armed finite horizon problem. We find that differences in behavior (switching, experimentation, best arm percentage) between these types of multi-armed bandit problems are consistent with predictions. However, we find that subjects use strategies that are different than predicted. We find that commonly suggested deterministic strategies are poor descriptors of subject behavior and that probabilistic strategies better fit the data. In particular, we find that a simple probabilistic ‘win-stay lose-shift’ strategy best fits most subjects.","PeriodicalId":263662,"journal":{"name":"ERN: Behavioral Economics (Topic)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Behavioral Economics (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3942930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper conducts a laboratory experiment to analyze individual behavior in multi-armed bandit problems. Our experiment consists of four types of multi-armed bandit problems: (i) a two-armed indefinite horizon problem, (ii) a two-armed finite horizon problem, (iii) a three-armed indefinite horizon problem, and (iv) a three-armed finite horizon problem. We find that differences in behavior (switching, experimentation, best arm percentage) between these types of multi-armed bandit problems are consistent with predictions. However, we find that subjects use strategies that are different than predicted. We find that commonly suggested deterministic strategies are poor descriptors of subject behavior and that probabilistic strategies better fit the data. In particular, we find that a simple probabilistic ‘win-stay lose-shift’ strategy best fits most subjects.