{"title":"平滑-自适应上下文强盗","authors":"Y. Gur, Ahmadreza Momeni, Stefan Wager","doi":"10.2139/ssrn.3893198","DOIUrl":null,"url":null,"abstract":"In nonparametric contextual bandit formulations, a key complexity driver is the smoothness of payoff functions with respect to covariates. In many practical settings, the smoothness of payoffs is unknown, and misspecification of smoothness may severely deteriorate the performance of existing methods. In the paper “Smoothness-Adaptive Contextual Bandits,” Yonatan Gur, Ahmadreza Momeni, and Stefan Wager consider a framework where the smoothness of payoff functions is unknown and study when and how algorithms may adapt to unknown smoothness. First, they establish that designing algorithms that adapt to unknown smoothness is, in general, impossible. However, under a natural self-similarity condition, they establish that adapting to unknown smoothness is possible and devise a general policy for achieving smoothness-adaptive performance. The policy infers the smoothness of payoffs throughout the decision-making process while leveraging the structure of off-the-shelf nonadaptive policies. It matches (up to a logarithmic scale) the performance that is achievable when the smoothness of payoffs is known in advance.","PeriodicalId":320844,"journal":{"name":"PSN: Econometrics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Smoothness-Adaptive Contextual Bandits\",\"authors\":\"Y. Gur, Ahmadreza Momeni, Stefan Wager\",\"doi\":\"10.2139/ssrn.3893198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In nonparametric contextual bandit formulations, a key complexity driver is the smoothness of payoff functions with respect to covariates. In many practical settings, the smoothness of payoffs is unknown, and misspecification of smoothness may severely deteriorate the performance of existing methods. In the paper “Smoothness-Adaptive Contextual Bandits,” Yonatan Gur, Ahmadreza Momeni, and Stefan Wager consider a framework where the smoothness of payoff functions is unknown and study when and how algorithms may adapt to unknown smoothness. First, they establish that designing algorithms that adapt to unknown smoothness is, in general, impossible. However, under a natural self-similarity condition, they establish that adapting to unknown smoothness is possible and devise a general policy for achieving smoothness-adaptive performance. The policy infers the smoothness of payoffs throughout the decision-making process while leveraging the structure of off-the-shelf nonadaptive policies. It matches (up to a logarithmic scale) the performance that is achievable when the smoothness of payoffs is known in advance.\",\"PeriodicalId\":320844,\"journal\":{\"name\":\"PSN: Econometrics\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PSN: Econometrics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3893198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PSN: Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3893198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In nonparametric contextual bandit formulations, a key complexity driver is the smoothness of payoff functions with respect to covariates. In many practical settings, the smoothness of payoffs is unknown, and misspecification of smoothness may severely deteriorate the performance of existing methods. In the paper “Smoothness-Adaptive Contextual Bandits,” Yonatan Gur, Ahmadreza Momeni, and Stefan Wager consider a framework where the smoothness of payoff functions is unknown and study when and how algorithms may adapt to unknown smoothness. First, they establish that designing algorithms that adapt to unknown smoothness is, in general, impossible. However, under a natural self-similarity condition, they establish that adapting to unknown smoothness is possible and devise a general policy for achieving smoothness-adaptive performance. The policy infers the smoothness of payoffs throughout the decision-making process while leveraging the structure of off-the-shelf nonadaptive policies. It matches (up to a logarithmic scale) the performance that is achievable when the smoothness of payoffs is known in advance.