未选择选项的贬值:对过度乐观预期的起源和维持的贝叶斯解释。

CogSci ... Annual Conference of the Cognitive Science Society. Cognitive Science Society (U.S.). Conference Pub Date : 2020-07-01

Corey Yishan Zhou, Dalin Guo, Angela J Yu

{"title":"未选择选项的贬值:对过度乐观预期的起源和维持的贝叶斯解释。","authors":"Corey Yishan Zhou, Dalin Guo, Angela J Yu","doi":"","DOIUrl":null,"url":null,"abstract":"Humans frequently overestimate the likelihood of desirable events while underestimating the likelihood of undesirable ones: a phenomenon known as unrealistic optimism. Previously, it was suggested that unrealistic optimism arises from asymmetric belief updating, with a relatively reduced coding of undesirable information. Prior studies have shown that a reinforcement learning (RL) model with asymmetric learning rates (greater for a positive prediction error than a negative prediction error) could account for unrealistic optimism in a bandit task, in particular the tendency of human subjects to persistently choosing a single option when there are multiple equally good options. Here, we propose an alternative explanation of such persistent behavior, by modeling human behavior using a Bayesian hidden Markov model, the Dynamic Belief Model (DBM). We find that DBM captures human choice behavior better than the previously proposed asymmetric RL model. Whereas asymmetric RL attains a measure of optimism by giving better-than-expected outcomes higher learning weights compared to worse-than-expected outcomes, DBM does so by progressively devaluing the unchosen options, thus placing a greater emphasis on choice history independent of reward outcome (e.g. an oft-chosen option might continue to be preferred even if it has not been particularly rewarding), which has broadly been shown to underlie sequential effects in a variety of behavioral settings. Moreover, previous work showed that the devaluation of unchosen options in DBM helps to compensate for a default assumption of environmental non-stationarity, thus allowing the decision-maker to both be more adaptive in changing environments and still obtain near-optimal performance in stationary environments. Thus, the current work suggests both a novel rationale and mechanism for persistent behavior in bandit tasks.","PeriodicalId":72634,"journal":{"name":"CogSci ... Annual Conference of the Cognitive Science Society. Cognitive Science Society (U.S.). Conference","volume":"42 ","pages":"1682-1688"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336429/pdf/","citationCount":"0","resultStr":"{\"title\":\"Devaluation of Unchosen Options: A Bayesian Account of the Provenance and Maintenance of Overly Optimistic Expectations.\",\"authors\":\"Corey Yishan Zhou, Dalin Guo, Angela J Yu\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans frequently overestimate the likelihood of desirable events while underestimating the likelihood of undesirable ones: a phenomenon known as unrealistic optimism. Previously, it was suggested that unrealistic optimism arises from asymmetric belief updating, with a relatively reduced coding of undesirable information. Prior studies have shown that a reinforcement learning (RL) model with asymmetric learning rates (greater for a positive prediction error than a negative prediction error) could account for unrealistic optimism in a bandit task, in particular the tendency of human subjects to persistently choosing a single option when there are multiple equally good options. Here, we propose an alternative explanation of such persistent behavior, by modeling human behavior using a Bayesian hidden Markov model, the Dynamic Belief Model (DBM). We find that DBM captures human choice behavior better than the previously proposed asymmetric RL model. Whereas asymmetric RL attains a measure of optimism by giving better-than-expected outcomes higher learning weights compared to worse-than-expected outcomes, DBM does so by progressively devaluing the unchosen options, thus placing a greater emphasis on choice history independent of reward outcome (e.g. an oft-chosen option might continue to be preferred even if it has not been particularly rewarding), which has broadly been shown to underlie sequential effects in a variety of behavioral settings. Moreover, previous work showed that the devaluation of unchosen options in DBM helps to compensate for a default assumption of environmental non-stationarity, thus allowing the decision-maker to both be more adaptive in changing environments and still obtain near-optimal performance in stationary environments. Thus, the current work suggests both a novel rationale and mechanism for persistent behavior in bandit tasks.\",\"PeriodicalId\":72634,\"journal\":{\"name\":\"CogSci ... Annual Conference of the Cognitive Science Society. Cognitive Science Society (U.S.). Conference\",\"volume\":\"42 \",\"pages\":\"1682-1688\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336429/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CogSci ... Annual Conference of the Cognitive Science Society. Cognitive Science Society (U.S.). Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CogSci ... Annual Conference of the Cognitive Science Society. Cognitive Science Society (U.S.). Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人类经常高估理想事件发生的可能性，而低估不理想事件发生的可能性:这种现象被称为不切实际的乐观主义。此前，人们认为不切实际的乐观情绪源于不对称的信念更新，不希望的信息编码相对减少。先前的研究表明，具有非对称学习率(积极预测误差大于消极预测误差)的强化学习(RL)模型可以解释强盗任务中不切实际的乐观情绪，特别是当存在多个同样好的选择时，人类受试者坚持选择单一选项的倾向。在这里，我们提出了这种持续行为的另一种解释，通过使用贝叶斯隐马尔可夫模型，即动态信念模型(DBM)对人类行为进行建模。我们发现DBM比之前提出的非对称强化学习模型更好地捕捉了人类的选择行为。非对称强化学习通过给予好于预期的结果比差于预期的结果更高的学习权重来达到一定程度的乐观，而DBM通过逐步贬低未选择的选项来实现这一点，从而更加强调与奖励结果无关的选择历史(例如，一个经常被选择的选项可能会继续受到青睐，即使它没有特别的奖励)。这已经被广泛地证明是一系列行为的基础。此外，先前的研究表明，DBM中未选择选项的贬值有助于补偿环境非平稳性的默认假设，从而使决策者在不断变化的环境中更具适应性，并且仍然在固定环境中获得接近最优的性能。因此，目前的工作为强盗任务中的持续行为提出了一种新的理论基础和机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Devaluation of Unchosen Options: A Bayesian Account of the Provenance and Maintenance of Overly Optimistic Expectations.

本刊更多论文

Devaluation of Unchosen Options: A Bayesian Account of the Provenance and Maintenance of Overly Optimistic Expectations.

Humans frequently overestimate the likelihood of desirable events while underestimating the likelihood of undesirable ones: a phenomenon known as unrealistic optimism. Previously, it was suggested that unrealistic optimism arises from asymmetric belief updating, with a relatively reduced coding of undesirable information. Prior studies have shown that a reinforcement learning (RL) model with asymmetric learning rates (greater for a positive prediction error than a negative prediction error) could account for unrealistic optimism in a bandit task, in particular the tendency of human subjects to persistently choosing a single option when there are multiple equally good options. Here, we propose an alternative explanation of such persistent behavior, by modeling human behavior using a Bayesian hidden Markov model, the Dynamic Belief Model (DBM). We find that DBM captures human choice behavior better than the previously proposed asymmetric RL model. Whereas asymmetric RL attains a measure of optimism by giving better-than-expected outcomes higher learning weights compared to worse-than-expected outcomes, DBM does so by progressively devaluing the unchosen options, thus placing a greater emphasis on choice history independent of reward outcome (e.g. an oft-chosen option might continue to be preferred even if it has not been particularly rewarding), which has broadly been shown to underlie sequential effects in a variety of behavioral settings. Moreover, previous work showed that the devaluation of unchosen options in DBM helps to compensate for a default assumption of environmental non-stationarity, thus allowing the decision-maker to both be more adaptive in changing environments and still obtain near-optimal performance in stationary environments. Thus, the current work suggests both a novel rationale and mechanism for persistent behavior in bandit tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CogSci ... Annual Conference of the Cognitive Science Society. Cognitive Science Society (U.S.). Conference

自引率

0.00%

发文量