Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò
{"title":"序列贝叶斯说服中的在线学习:处理未知先验","authors":"Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò","doi":"10.1016/j.artint.2024.104245","DOIUrl":null,"url":null,"abstract":"<div><div>We study a repeated <em>information design</em> problem faced by an informed <em>sender</em> who tries to influence the behavior of a self-interested <em>receiver</em>, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a <em>sequential decision making</em> (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to <em>persuade</em> them to follow (desirable) action recommendations. We study the case in which the sender does <em>not</em> know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: <em>no learning algorithm can be persuasive in high probability</em>. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's <em>regret</em> in following recommendations <em>grows sub-linearly</em>. In the <em>full-feedback</em> setting—where the sender observes the realizations of <em>all</em> the possible random events—, we provide an algorithm with <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msqrt><mrow><mi>T</mi></mrow></msqrt><mo>)</mo></math></span> regret for both the sender and the receiver. Instead, in the <em>bandit-feedback</em> setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>]</mo></math></span> as input, guarantees <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>α</mi></mrow></msup><mo>)</mo></math></span> and <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>max</mi><mo></mo><mo>{</mo><mi>α</mi><mo>,</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mi>α</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>}</mo></mrow></msup><mo>)</mo></math></span> regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo>/</mo><mn>3</mn><mo>]</mo></math></span>.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"338 ","pages":"Article 104245"},"PeriodicalIF":5.1000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online learning in sequential Bayesian persuasion: Handling unknown priors\",\"authors\":\"Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò\",\"doi\":\"10.1016/j.artint.2024.104245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We study a repeated <em>information design</em> problem faced by an informed <em>sender</em> who tries to influence the behavior of a self-interested <em>receiver</em>, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a <em>sequential decision making</em> (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to <em>persuade</em> them to follow (desirable) action recommendations. We study the case in which the sender does <em>not</em> know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: <em>no learning algorithm can be persuasive in high probability</em>. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's <em>regret</em> in following recommendations <em>grows sub-linearly</em>. In the <em>full-feedback</em> setting—where the sender observes the realizations of <em>all</em> the possible random events—, we provide an algorithm with <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msqrt><mrow><mi>T</mi></mrow></msqrt><mo>)</mo></math></span> regret for both the sender and the receiver. Instead, in the <em>bandit-feedback</em> setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>]</mo></math></span> as input, guarantees <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>α</mi></mrow></msup><mo>)</mo></math></span> and <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>max</mi><mo></mo><mo>{</mo><mi>α</mi><mo>,</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mi>α</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>}</mo></mrow></msup><mo>)</mo></math></span> regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo>/</mo><mn>3</mn><mo>]</mo></math></span>.</div></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"338 \",\"pages\":\"Article 104245\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370224001814\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224001814","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Online learning in sequential Bayesian persuasion: Handling unknown priors
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: no learning algorithm can be persuasive in high probability. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting—where the sender observes the realizations of all the possible random events—, we provide an algorithm with regret for both the sender and the receiver. Instead, in the bandit-feedback setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an as input, guarantees and regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for .
期刊介绍:
The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.