{"title":"Observational learning of exploration-exploitation strategies in bandit tasks","authors":"Ludwig Danwitz, Bettina von Helversen","doi":"10.1016/j.cognition.2025.106124","DOIUrl":null,"url":null,"abstract":"<div><div>In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones—a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.</div></div>","PeriodicalId":48455,"journal":{"name":"Cognition","volume":"259 ","pages":"Article 106124"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognition","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010027725000642","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones—a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.
期刊介绍:
Cognition is an international journal that publishes theoretical and experimental papers on the study of the mind. It covers a wide variety of subjects concerning all the different aspects of cognition, ranging from biological and experimental studies to formal analysis. Contributions from the fields of psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy are welcome in this journal provided that they have some bearing on the functioning of the mind. In addition, the journal serves as a forum for discussion of social and political aspects of cognitive science.