Andrea Gregor de Varda, Chiara Saponaro, Marco Marelli
{"title":"High variability in LLMs’ analogical reasoning","authors":"Andrea Gregor de Varda, Chiara Saponaro, Marco Marelli","doi":"10.1038/s41562-025-02224-3","DOIUrl":null,"url":null,"abstract":"<p><span>arising from</span> T. Webb et al. <i>Nature Human Behaviour</i> https://doi.org/10.1038/s41562-023-01659-w (2025)</p><p>In a recent study, Webb, Holyoak and Lu<sup>1</sup> (henceforth WHL) demonstrated that a large language model (GPT-3, text-davinci-003) could match or even exceed human performance across several analogical reasoning tasks. This result led to the compelling conclusion that LLMs such as GPT-3 possess an emergent ability to reason by analogy. However, the findings were based on a single, proprietary model for which the releasing company provided limited public details and progressively restricted access to the internal probability distributions. Furthermore, text-davinci-003 was deprecated on 4 January 2024, and is no longer available through the OpenAI API. This poses a challenge to replicability in two ways. First, the lack of open access to the model and its recent deprecation make it difficult—if not impossible—for other researchers to verify or build upon the findings. Second, relying on a single model leaves open the question of whether the results can be extended to LLMs as a broader class of objects of scientific investigation. Without testing a diverse range of models, it is unclear whether the observed behaviours are specific to GPT-3 or represent a general property of comparable contemporary LLMs. Replicating experimental results based on proprietary models with public alternatives is thus crucial to ensure that the findings can be reproduced in the future<sup>2</sup>, generalized to new model instances, and, more generally, to adhere to transparency principles that are of paramount importance in scientific research<sup>3</sup>.</p>","PeriodicalId":19074,"journal":{"name":"Nature Human Behaviour","volume":"17 1","pages":""},"PeriodicalIF":21.4000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Human Behaviour","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1038/s41562-025-02224-3","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
arising from T. Webb et al. Nature Human Behaviour https://doi.org/10.1038/s41562-023-01659-w (2025)
In a recent study, Webb, Holyoak and Lu1 (henceforth WHL) demonstrated that a large language model (GPT-3, text-davinci-003) could match or even exceed human performance across several analogical reasoning tasks. This result led to the compelling conclusion that LLMs such as GPT-3 possess an emergent ability to reason by analogy. However, the findings were based on a single, proprietary model for which the releasing company provided limited public details and progressively restricted access to the internal probability distributions. Furthermore, text-davinci-003 was deprecated on 4 January 2024, and is no longer available through the OpenAI API. This poses a challenge to replicability in two ways. First, the lack of open access to the model and its recent deprecation make it difficult—if not impossible—for other researchers to verify or build upon the findings. Second, relying on a single model leaves open the question of whether the results can be extended to LLMs as a broader class of objects of scientific investigation. Without testing a diverse range of models, it is unclear whether the observed behaviours are specific to GPT-3 or represent a general property of comparable contemporary LLMs. Replicating experimental results based on proprietary models with public alternatives is thus crucial to ensure that the findings can be reproduced in the future2, generalized to new model instances, and, more generally, to adhere to transparency principles that are of paramount importance in scientific research3.
期刊介绍:
Nature Human Behaviour is a journal that focuses on publishing research of outstanding significance into any aspect of human behavior.The research can cover various areas such as psychological, biological, and social bases of human behavior.It also includes the study of origins, development, and disorders related to human behavior.The primary aim of the journal is to increase the visibility of research in the field and enhance its societal reach and impact.