Observational learning of exploration-exploitation strategies in bandit tasks

IF 2.8 1区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Cognition Pub Date : 2025-03-20 DOI:10.1016/j.cognition.2025.106124

Ludwig Danwitz, Bettina von Helversen

{"title":"Observational learning of exploration-exploitation strategies in bandit tasks","authors":"Ludwig Danwitz, Bettina von Helversen","doi":"10.1016/j.cognition.2025.106124","DOIUrl":null,"url":null,"abstract":"<div><div>In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones—a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.</div></div>","PeriodicalId":48455,"journal":{"name":"Cognition","volume":"259 ","pages":"Article 106124"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognition","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010027725000642","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones—a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.

查看原文本刊更多论文

土匪任务中探索-开发策略的观察学习。

在决策场景中，个人经常面临在探索新选项和利用已知选项之间取得平衡的挑战--这种动态被称为探索-利用权衡。在这种情况下，人们经常有机会观察他人的行动。然而，人们对个体在探索-开发两难境地中何时、如何以及从谁那里利用观察学习知之甚少。在两个实验中，参与者独立完成了多个九臂强盗任务，或同时观察了一个使用探索策略或同样成功的利用策略的虚构代理。为了分析参与者的行为，我们使用了一个强化学习模型（简化卡尔曼滤波器）来提取个体层面的复制和探索参数。结果显示，参与者通过在观察到的行动的个体估计值上添加奖励来复制观察到的代理选择。虽然大多数参与者似乎采用了无条件复制的方法，但也有一部分参与者采用了 "不确定时复制 "的方法。此外，参与者还根据观察到的情况调整自己的探索策略。我们将讨论这在多大程度上可以理解为一种模仿。关于参与者对模仿探索型代理与模仿开发型代理的偏好，结果并不明确。与预期相反，参与者和代理人探索倾向的相似性或不相似性对观察学习没有影响。这些结果揭示了人类在探索情景和观察学习条件下对社会和非社会信息的处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognition PSYCHOLOGY, EXPERIMENTAL-

CiteScore

6.40

自引率

5.90%

发文量

283

期刊介绍： Cognition is an international journal that publishes theoretical and experimental papers on the study of the mind. It covers a wide variety of subjects concerning all the different aspects of cognition, ranging from biological and experimental studies to formal analysis. Contributions from the fields of psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy are welcome in this journal provided that they have some bearing on the functioning of the mind. In addition, the journal serves as a forum for discussion of social and political aspects of cognitive science.