{"title":"社会学习和客栈老板的挑战","authors":"Gal Bahar, Rann Smorodinsky, Moshe Tennenholtz","doi":"10.1145/3328526.3329569","DOIUrl":null,"url":null,"abstract":"Technological evolution, so central to the progress of humanity in recent decades, is the process of constantly introducing new technologies to replace old ones. A new technology does not necessarily mean a better technology and so should not always be embraced. How can society learn which novelties present actual improvements over the existing technology? Whereas the quality of status-quo technology is well known, the new one is a pig in a poke. With sufficiently many individuals willing to explore the new technology society can learn whether it is indeed an improvement. However, self motivated agents, often, do not agree to explore. This is true, in particular, if agents observed some predecessors that were disappointed from the new technology. Inspired by the classical multi-armed bandit model we study a setting where agents arrive sequentially and must pull one of two arms in order to receive a reward - a risky arm (representing the new technology) and a safe arm (representing the existing one). A central planner must induce sufficiently many agents to experiment with the risky arm. The central planner observes the actions and rewards of all agents while the agents themselves have partial observation. For the setting where each agent observes his predecessor we provide the central planner with a recommendation algorithm that is (almost) incentive compatible and facilitates social learning.","PeriodicalId":416173,"journal":{"name":"Proceedings of the 2019 ACM Conference on Economics and Computation","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Social Learning and the Innkeeper's Challenge\",\"authors\":\"Gal Bahar, Rann Smorodinsky, Moshe Tennenholtz\",\"doi\":\"10.1145/3328526.3329569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Technological evolution, so central to the progress of humanity in recent decades, is the process of constantly introducing new technologies to replace old ones. A new technology does not necessarily mean a better technology and so should not always be embraced. How can society learn which novelties present actual improvements over the existing technology? Whereas the quality of status-quo technology is well known, the new one is a pig in a poke. With sufficiently many individuals willing to explore the new technology society can learn whether it is indeed an improvement. However, self motivated agents, often, do not agree to explore. This is true, in particular, if agents observed some predecessors that were disappointed from the new technology. Inspired by the classical multi-armed bandit model we study a setting where agents arrive sequentially and must pull one of two arms in order to receive a reward - a risky arm (representing the new technology) and a safe arm (representing the existing one). A central planner must induce sufficiently many agents to experiment with the risky arm. The central planner observes the actions and rewards of all agents while the agents themselves have partial observation. For the setting where each agent observes his predecessor we provide the central planner with a recommendation algorithm that is (almost) incentive compatible and facilitates social learning.\",\"PeriodicalId\":416173,\"journal\":{\"name\":\"Proceedings of the 2019 ACM Conference on Economics and Computation\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 ACM Conference on Economics and Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3328526.3329569\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3328526.3329569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Technological evolution, so central to the progress of humanity in recent decades, is the process of constantly introducing new technologies to replace old ones. A new technology does not necessarily mean a better technology and so should not always be embraced. How can society learn which novelties present actual improvements over the existing technology? Whereas the quality of status-quo technology is well known, the new one is a pig in a poke. With sufficiently many individuals willing to explore the new technology society can learn whether it is indeed an improvement. However, self motivated agents, often, do not agree to explore. This is true, in particular, if agents observed some predecessors that were disappointed from the new technology. Inspired by the classical multi-armed bandit model we study a setting where agents arrive sequentially and must pull one of two arms in order to receive a reward - a risky arm (representing the new technology) and a safe arm (representing the existing one). A central planner must induce sufficiently many agents to experiment with the risky arm. The central planner observes the actions and rewards of all agents while the agents themselves have partial observation. For the setting where each agent observes his predecessor we provide the central planner with a recommendation algorithm that is (almost) incentive compatible and facilitates social learning.