人工智能和理解的幻觉:心智理论和大型语言模型的系统回顾。

IF 4.2 2区 心理学 Q1 PSYCHOLOGY, SOCIAL
Antonella Marchetti,Federico Manzi,Giuseppe Riva,Andrea Gaggioli,Davide Massaro
{"title":"人工智能和理解的幻觉:心智理论和大型语言模型的系统回顾。","authors":"Antonella Marchetti,Federico Manzi,Giuseppe Riva,Andrea Gaggioli,Davide Massaro","doi":"10.1089/cyber.2024.0536","DOIUrl":null,"url":null,"abstract":"The development of Large Language Models (LLMs) has sparked significant debate regarding their capacity for Theory of Mind (ToM)-the ability to attribute mental states to oneself and others. This systematic review examines the extent to which LLMs exhibit Artificial ToM (AToM) by evaluating their performance on ToM tasks and comparing it with human responses. While LLMs, particularly GPT-4, perform well on first-order false belief tasks, they struggle with more complex reasoning, such as second-order beliefs and recursive inferences, where humans consistently outperform them. Moreover, the review underscores the variability in ToM assessments, as many studies adapt classical tasks for LLMs, raising concerns about comparability with human ToM. Most evaluations remain constrained to text-based tasks, overlooking embodied and multimodal dimensions crucial to human social cognition. This review discusses the \"illusion of understanding\" in LLMs for two primary reasons: First, their lack of the developmental and cognitive mechanisms necessary for genuine ToM, and second, methodological biases in test designs that favor LLMs' strengths, limiting direct comparisons with human performance. The findings highlight the need for more ecologically valid assessments and interdisciplinary research to better delineate the limitations and potential of AToM. This set of issues is highly relevant to psychology, as language is generally considered just one component in the broader development of human ToM, a perspective that contrasts with the dominant approach in AToM studies. This discrepancy raises critical questions about the extent to which human ToM and AToM are comparable.","PeriodicalId":10872,"journal":{"name":"Cyberpsychology, behavior and social networking","volume":"19 1","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence and the Illusion of Understanding: A Systematic Review of Theory of Mind and Large Language Models.\",\"authors\":\"Antonella Marchetti,Federico Manzi,Giuseppe Riva,Andrea Gaggioli,Davide Massaro\",\"doi\":\"10.1089/cyber.2024.0536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of Large Language Models (LLMs) has sparked significant debate regarding their capacity for Theory of Mind (ToM)-the ability to attribute mental states to oneself and others. This systematic review examines the extent to which LLMs exhibit Artificial ToM (AToM) by evaluating their performance on ToM tasks and comparing it with human responses. While LLMs, particularly GPT-4, perform well on first-order false belief tasks, they struggle with more complex reasoning, such as second-order beliefs and recursive inferences, where humans consistently outperform them. Moreover, the review underscores the variability in ToM assessments, as many studies adapt classical tasks for LLMs, raising concerns about comparability with human ToM. Most evaluations remain constrained to text-based tasks, overlooking embodied and multimodal dimensions crucial to human social cognition. This review discusses the \\\"illusion of understanding\\\" in LLMs for two primary reasons: First, their lack of the developmental and cognitive mechanisms necessary for genuine ToM, and second, methodological biases in test designs that favor LLMs' strengths, limiting direct comparisons with human performance. The findings highlight the need for more ecologically valid assessments and interdisciplinary research to better delineate the limitations and potential of AToM. This set of issues is highly relevant to psychology, as language is generally considered just one component in the broader development of human ToM, a perspective that contrasts with the dominant approach in AToM studies. This discrepancy raises critical questions about the extent to which human ToM and AToM are comparable.\",\"PeriodicalId\":10872,\"journal\":{\"name\":\"Cyberpsychology, behavior and social networking\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cyberpsychology, behavior and social networking\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1089/cyber.2024.0536\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, SOCIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cyberpsychology, behavior and social networking","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1089/cyber.2024.0536","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, SOCIAL","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(llm)的发展引发了关于其心智理论(ToM)能力的重大争论,心智理论是将自己和他人的心理状态归因于自己和他人的能力。本系统综述通过评估llm在ToM任务上的表现并将其与人类反应进行比较,检查了llm展示人工ToM (AToM)的程度。虽然llm,特别是GPT-4,在一阶错误信念任务上表现良好,但它们在更复杂的推理方面表现不佳,比如二阶信念和递归推理,而人类在这些方面的表现一直优于它们。此外,该综述强调了ToM评估的可变性,因为许多研究为llm调整了经典任务,引起了对与人类ToM可比性的担忧。大多数评估仍然局限于基于文本的任务,忽视了对人类社会认知至关重要的体现和多模态维度。本综述讨论llm中的“理解错觉”主要有两个原因:首先,他们缺乏真正的ToM所需的发展和认知机制,其次,测试设计中的方法偏差有利于llm的优势,限制了与人类表现的直接比较。研究结果强调需要进行更多生态有效的评估和跨学科研究,以更好地描述AToM的局限性和潜力。这一系列问题与心理学高度相关,因为语言通常被认为只是人类ToM更广泛发展的一个组成部分,这一观点与AToM研究中的主导方法形成了对比。这种差异提出了关于人类ToM和AToM在多大程度上具有可比性的关键问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Artificial Intelligence and the Illusion of Understanding: A Systematic Review of Theory of Mind and Large Language Models.
The development of Large Language Models (LLMs) has sparked significant debate regarding their capacity for Theory of Mind (ToM)-the ability to attribute mental states to oneself and others. This systematic review examines the extent to which LLMs exhibit Artificial ToM (AToM) by evaluating their performance on ToM tasks and comparing it with human responses. While LLMs, particularly GPT-4, perform well on first-order false belief tasks, they struggle with more complex reasoning, such as second-order beliefs and recursive inferences, where humans consistently outperform them. Moreover, the review underscores the variability in ToM assessments, as many studies adapt classical tasks for LLMs, raising concerns about comparability with human ToM. Most evaluations remain constrained to text-based tasks, overlooking embodied and multimodal dimensions crucial to human social cognition. This review discusses the "illusion of understanding" in LLMs for two primary reasons: First, their lack of the developmental and cognitive mechanisms necessary for genuine ToM, and second, methodological biases in test designs that favor LLMs' strengths, limiting direct comparisons with human performance. The findings highlight the need for more ecologically valid assessments and interdisciplinary research to better delineate the limitations and potential of AToM. This set of issues is highly relevant to psychology, as language is generally considered just one component in the broader development of human ToM, a perspective that contrasts with the dominant approach in AToM studies. This discrepancy raises critical questions about the extent to which human ToM and AToM are comparable.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.60
自引率
3.00%
发文量
123
期刊介绍: Cyberpsychology, Behavior, and Social Networking is a leading peer-reviewed journal that is recognized for its authoritative research on the social, behavioral, and psychological impacts of contemporary social networking practices. The journal covers a wide range of platforms, including Twitter, Facebook, internet gaming, and e-commerce, and examines how these digital environments shape human interaction and societal norms. For over two decades, this journal has been a pioneering voice in the exploration of social networking and virtual reality, establishing itself as an indispensable resource for professionals and academics in the field. It is particularly celebrated for its swift dissemination of findings through rapid communication articles, alongside comprehensive, in-depth studies that delve into the multifaceted effects of interactive technologies on both individual behavior and broader societal trends. The journal's scope encompasses the full spectrum of impacts—highlighting not only the potential benefits but also the challenges that arise as a result of these technologies. By providing a platform for rigorous research and critical discussions, it fosters a deeper understanding of the complex interplay between technology and human behavior.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信