序列贝叶斯说服中的在线学习:处理未知先验

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò
{"title":"序列贝叶斯说服中的在线学习:处理未知先验","authors":"Martino Bernasconi,&nbsp;Matteo Castiglioni,&nbsp;Alberto Marchesi,&nbsp;Nicola Gatti,&nbsp;Francesco Trovò","doi":"10.1016/j.artint.2024.104245","DOIUrl":null,"url":null,"abstract":"<div><div>We study a repeated <em>information design</em> problem faced by an informed <em>sender</em> who tries to influence the behavior of a self-interested <em>receiver</em>, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a <em>sequential decision making</em> (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to <em>persuade</em> them to follow (desirable) action recommendations. We study the case in which the sender does <em>not</em> know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: <em>no learning algorithm can be persuasive in high probability</em>. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's <em>regret</em> in following recommendations <em>grows sub-linearly</em>. In the <em>full-feedback</em> setting—where the sender observes the realizations of <em>all</em> the possible random events—, we provide an algorithm with <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msqrt><mrow><mi>T</mi></mrow></msqrt><mo>)</mo></math></span> regret for both the sender and the receiver. Instead, in the <em>bandit-feedback</em> setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>]</mo></math></span> as input, guarantees <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>α</mi></mrow></msup><mo>)</mo></math></span> and <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>max</mi><mo>⁡</mo><mo>{</mo><mi>α</mi><mo>,</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mi>α</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>}</mo></mrow></msup><mo>)</mo></math></span> regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo>/</mo><mn>3</mn><mo>]</mo></math></span>.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"338 ","pages":"Article 104245"},"PeriodicalIF":5.1000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online learning in sequential Bayesian persuasion: Handling unknown priors\",\"authors\":\"Martino Bernasconi,&nbsp;Matteo Castiglioni,&nbsp;Alberto Marchesi,&nbsp;Nicola Gatti,&nbsp;Francesco Trovò\",\"doi\":\"10.1016/j.artint.2024.104245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We study a repeated <em>information design</em> problem faced by an informed <em>sender</em> who tries to influence the behavior of a self-interested <em>receiver</em>, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a <em>sequential decision making</em> (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to <em>persuade</em> them to follow (desirable) action recommendations. We study the case in which the sender does <em>not</em> know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: <em>no learning algorithm can be persuasive in high probability</em>. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's <em>regret</em> in following recommendations <em>grows sub-linearly</em>. In the <em>full-feedback</em> setting—where the sender observes the realizations of <em>all</em> the possible random events—, we provide an algorithm with <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msqrt><mrow><mi>T</mi></mrow></msqrt><mo>)</mo></math></span> regret for both the sender and the receiver. Instead, in the <em>bandit-feedback</em> setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>]</mo></math></span> as input, guarantees <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>α</mi></mrow></msup><mo>)</mo></math></span> and <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>max</mi><mo>⁡</mo><mo>{</mo><mi>α</mi><mo>,</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mi>α</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>}</mo></mrow></msup><mo>)</mo></math></span> regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo>/</mo><mn>3</mn><mo>]</mo></math></span>.</div></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"338 \",\"pages\":\"Article 104245\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370224001814\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224001814","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

我们研究的是一个重复信息设计问题,该问题由一个知情的发送者面临,他试图通过提供与报酬相关的信息来影响一个自利的接收者的行为。我们考虑的是接收方重复面临连续决策(SDM)问题的情况。在每一轮中,发送方都会观察 SDM 问题中随机事件的实现情况,而接收方只能部分地观察到这些情况。这就带来了一个挑战:如何逐步向接收方披露这些信息,以说服他们遵循(理想的)行动建议。我们研究的是发送方不知道随机事件概率的情况,因此发送方必须在说服接收方的同时逐步了解这些概率。我们首先提供了发送方有说服力的信息披露结构集合的非难多顶近似值。这对于设计高效的学习算法至关重要。接下来,我们证明了一个同样适用于非序列情况的否定结果:任何学习算法都不可能高概率地具有说服力。因此,我们放宽了对说服力的要求,研究那些能保证接收者在遵循推荐时的遗憾呈亚线性增长的算法。在全反馈设置中--即发送者观察所有可能的随机事件的实现情况--我们提供了一种对发送者和接收者都有 O˜(T)遗憾的算法。相反,在匪徒反馈设置中,即发送方只观察 SDM 问题中实际发生的随机事件的实现情况,我们设计了一种算法,在输入α∈[1/2,1]的情况下,保证发送方和接收方分别有 O˜(Tα)和 O˜(Tmax{α,1-α2})遗憾。这一结果得到了一个下限的补充,表明这种遗憾权衡在 α∈[1/2,2/3] 时是紧密的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Online learning in sequential Bayesian persuasion: Handling unknown priors
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: no learning algorithm can be persuasive in high probability. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting—where the sender observes the realizations of all the possible random events—, we provide an algorithm with O˜(T) regret for both the sender and the receiver. Instead, in the bandit-feedback setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an α[1/2,1] as input, guarantees O˜(Tα) and O˜(Tmax{α,1α2}) regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for α[1/2,2/3].
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Artificial Intelligence
Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
11.20
自引率
1.40%
发文量
118
审稿时长
8 months
期刊介绍: The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信