Online Bayesian Recommendation with No Regret

Yiding Feng, Wei Tang, Haifeng Xu
{"title":"Online Bayesian Recommendation with No Regret","authors":"Yiding Feng, Wei Tang, Haifeng Xu","doi":"10.1145/3490486.3538327","DOIUrl":null,"url":null,"abstract":"We introduce and study the online Bayesian recommendation problem for a platform, who can observe a utility-relevant state of a product, repeatedly interacting with a population of myopic users through an online recommendation mechanism. This paradigm is common in a wide range of scenarios in the current Internet economy. For each user with her own private preference and belief, the platform commits to a recommendation strategy to utilize his information advantage on the product state to persuade the self-interested user to follow the recommendation. The platform does not know user's preferences and beliefs, and has to use an adaptive recommendation strategy to persuade with gradually learning user's preferences and beliefs in the process. We aim to design online learning policies with no Stackelberg regret for the platform, i.e., against the optimum policy in hindsight under the assumption that users will correspondingly adapt their behaviors to the benchmark policy. Our first result is an online policy that achieves double logarithm regret dependence on the number of rounds. We then present a hardness result showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform's problem as optimizing a linear program with membership oracle access, we present our second online policy that achieves regret with polynomial dependence on the number of states but logarithm dependence on the number of rounds.","PeriodicalId":209859,"journal":{"name":"Proceedings of the 23rd ACM Conference on Economics and Computation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3490486.3538327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

We introduce and study the online Bayesian recommendation problem for a platform, who can observe a utility-relevant state of a product, repeatedly interacting with a population of myopic users through an online recommendation mechanism. This paradigm is common in a wide range of scenarios in the current Internet economy. For each user with her own private preference and belief, the platform commits to a recommendation strategy to utilize his information advantage on the product state to persuade the self-interested user to follow the recommendation. The platform does not know user's preferences and beliefs, and has to use an adaptive recommendation strategy to persuade with gradually learning user's preferences and beliefs in the process. We aim to design online learning policies with no Stackelberg regret for the platform, i.e., against the optimum policy in hindsight under the assumption that users will correspondingly adapt their behaviors to the benchmark policy. Our first result is an online policy that achieves double logarithm regret dependence on the number of rounds. We then present a hardness result showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform's problem as optimizing a linear program with membership oracle access, we present our second online policy that achieves regret with polynomial dependence on the number of states but logarithm dependence on the number of rounds.
无悔的在线贝叶斯推荐
我们引入并研究了一个平台的在线贝叶斯推荐问题,该平台可以观察产品的效用相关状态,并通过在线推荐机制与一群近视用户进行重复交互。这种模式在当前互联网经济的许多场景中都很常见。对于每一个有自己私人偏好和信仰的用户,平台承诺一个推荐策略,利用他在产品状态上的信息优势,说服自利用户遵循推荐。平台不知道用户的偏好和信念,必须使用自适应的推荐策略进行说服,并在此过程中逐渐学习用户的偏好和信念。我们的目标是为平台设计没有Stackelberg遗憾的在线学习策略,即在假设用户将相应地调整其行为以适应基准策略的情况下,在事后反对最优策略。我们的第一个结果是一个在线策略,它实现了对轮数的双对数后悔依赖。然后,我们给出了一个硬度结果,表明没有自适应在线策略可以更好地依赖于轮数来实现后悔。最后,通过将平台问题表述为优化具有成员oracle访问的线性规划,我们提出了第二个在线策略,该策略通过多项式依赖于状态数而对数依赖于轮数来实现遗憾。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信