Online Bayesian Recommendation with No Regret

Proceedings of the 23rd ACM Conference on Economics and Computation Pub Date : 2022-02-12 DOI:10.1145/3490486.3538327

Yiding Feng, Wei Tang, Haifeng Xu

{"title":"Online Bayesian Recommendation with No Regret","authors":"Yiding Feng, Wei Tang, Haifeng Xu","doi":"10.1145/3490486.3538327","DOIUrl":null,"url":null,"abstract":"We introduce and study the online Bayesian recommendation problem for a platform, who can observe a utility-relevant state of a product, repeatedly interacting with a population of myopic users through an online recommendation mechanism. This paradigm is common in a wide range of scenarios in the current Internet economy. For each user with her own private preference and belief, the platform commits to a recommendation strategy to utilize his information advantage on the product state to persuade the self-interested user to follow the recommendation. The platform does not know user's preferences and beliefs, and has to use an adaptive recommendation strategy to persuade with gradually learning user's preferences and beliefs in the process. We aim to design online learning policies with no Stackelberg regret for the platform, i.e., against the optimum policy in hindsight under the assumption that users will correspondingly adapt their behaviors to the benchmark policy. Our first result is an online policy that achieves double logarithm regret dependence on the number of rounds. We then present a hardness result showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform's problem as optimizing a linear program with membership oracle access, we present our second online policy that achieves regret with polynomial dependence on the number of states but logarithm dependence on the number of rounds.","PeriodicalId":209859,"journal":{"name":"Proceedings of the 23rd ACM Conference on Economics and Computation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3490486.3538327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

We introduce and study the online Bayesian recommendation problem for a platform, who can observe a utility-relevant state of a product, repeatedly interacting with a population of myopic users through an online recommendation mechanism. This paradigm is common in a wide range of scenarios in the current Internet economy. For each user with her own private preference and belief, the platform commits to a recommendation strategy to utilize his information advantage on the product state to persuade the self-interested user to follow the recommendation. The platform does not know user's preferences and beliefs, and has to use an adaptive recommendation strategy to persuade with gradually learning user's preferences and beliefs in the process. We aim to design online learning policies with no Stackelberg regret for the platform, i.e., against the optimum policy in hindsight under the assumption that users will correspondingly adapt their behaviors to the benchmark policy. Our first result is an online policy that achieves double logarithm regret dependence on the number of rounds. We then present a hardness result showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform's problem as optimizing a linear program with membership oracle access, we present our second online policy that achieves regret with polynomial dependence on the number of states but logarithm dependence on the number of rounds.

查看原文本刊更多论文

无悔的在线贝叶斯推荐

我们引入并研究了一个平台的在线贝叶斯推荐问题，该平台可以观察产品的效用相关状态，并通过在线推荐机制与一群近视用户进行重复交互。这种模式在当前互联网经济的许多场景中都很常见。对于每一个有自己私人偏好和信仰的用户，平台承诺一个推荐策略，利用他在产品状态上的信息优势，说服自利用户遵循推荐。平台不知道用户的偏好和信念，必须使用自适应的推荐策略进行说服，并在此过程中逐渐学习用户的偏好和信念。我们的目标是为平台设计没有Stackelberg遗憾的在线学习策略，即在假设用户将相应地调整其行为以适应基准策略的情况下，在事后反对最优策略。我们的第一个结果是一个在线策略，它实现了对轮数的双对数后悔依赖。然后，我们给出了一个硬度结果，表明没有自适应在线策略可以更好地依赖于轮数来实现后悔。最后，通过将平台问题表述为优化具有成员oracle访问的线性规划，我们提出了第二个在线策略，该策略通过多项式依赖于状态数而对数依赖于轮数来实现遗憾。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23rd ACM Conference on Economics and Computation

自引率

0.00%

发文量