{"title":"Artificial Intelligence and the Illusion of Understanding: A Systematic Review of Theory of Mind and Large Language Models.","authors":"Antonella Marchetti,Federico Manzi,Giuseppe Riva,Andrea Gaggioli,Davide Massaro","doi":"10.1089/cyber.2024.0536","DOIUrl":null,"url":null,"abstract":"The development of Large Language Models (LLMs) has sparked significant debate regarding their capacity for Theory of Mind (ToM)-the ability to attribute mental states to oneself and others. This systematic review examines the extent to which LLMs exhibit Artificial ToM (AToM) by evaluating their performance on ToM tasks and comparing it with human responses. While LLMs, particularly GPT-4, perform well on first-order false belief tasks, they struggle with more complex reasoning, such as second-order beliefs and recursive inferences, where humans consistently outperform them. Moreover, the review underscores the variability in ToM assessments, as many studies adapt classical tasks for LLMs, raising concerns about comparability with human ToM. Most evaluations remain constrained to text-based tasks, overlooking embodied and multimodal dimensions crucial to human social cognition. This review discusses the \"illusion of understanding\" in LLMs for two primary reasons: First, their lack of the developmental and cognitive mechanisms necessary for genuine ToM, and second, methodological biases in test designs that favor LLMs' strengths, limiting direct comparisons with human performance. The findings highlight the need for more ecologically valid assessments and interdisciplinary research to better delineate the limitations and potential of AToM. This set of issues is highly relevant to psychology, as language is generally considered just one component in the broader development of human ToM, a perspective that contrasts with the dominant approach in AToM studies. This discrepancy raises critical questions about the extent to which human ToM and AToM are comparable.","PeriodicalId":10872,"journal":{"name":"Cyberpsychology, behavior and social networking","volume":"19 1","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cyberpsychology, behavior and social networking","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1089/cyber.2024.0536","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, SOCIAL","Score":null,"Total":0}
引用次数: 0
Abstract
The development of Large Language Models (LLMs) has sparked significant debate regarding their capacity for Theory of Mind (ToM)-the ability to attribute mental states to oneself and others. This systematic review examines the extent to which LLMs exhibit Artificial ToM (AToM) by evaluating their performance on ToM tasks and comparing it with human responses. While LLMs, particularly GPT-4, perform well on first-order false belief tasks, they struggle with more complex reasoning, such as second-order beliefs and recursive inferences, where humans consistently outperform them. Moreover, the review underscores the variability in ToM assessments, as many studies adapt classical tasks for LLMs, raising concerns about comparability with human ToM. Most evaluations remain constrained to text-based tasks, overlooking embodied and multimodal dimensions crucial to human social cognition. This review discusses the "illusion of understanding" in LLMs for two primary reasons: First, their lack of the developmental and cognitive mechanisms necessary for genuine ToM, and second, methodological biases in test designs that favor LLMs' strengths, limiting direct comparisons with human performance. The findings highlight the need for more ecologically valid assessments and interdisciplinary research to better delineate the limitations and potential of AToM. This set of issues is highly relevant to psychology, as language is generally considered just one component in the broader development of human ToM, a perspective that contrasts with the dominant approach in AToM studies. This discrepancy raises critical questions about the extent to which human ToM and AToM are comparable.
期刊介绍:
Cyberpsychology, Behavior, and Social Networking is a leading peer-reviewed journal that is recognized for its authoritative research on the social, behavioral, and psychological impacts of contemporary social networking practices. The journal covers a wide range of platforms, including Twitter, Facebook, internet gaming, and e-commerce, and examines how these digital environments shape human interaction and societal norms.
For over two decades, this journal has been a pioneering voice in the exploration of social networking and virtual reality, establishing itself as an indispensable resource for professionals and academics in the field. It is particularly celebrated for its swift dissemination of findings through rapid communication articles, alongside comprehensive, in-depth studies that delve into the multifaceted effects of interactive technologies on both individual behavior and broader societal trends.
The journal's scope encompasses the full spectrum of impacts—highlighting not only the potential benefits but also the challenges that arise as a result of these technologies. By providing a platform for rigorous research and critical discussions, it fosters a deeper understanding of the complex interplay between technology and human behavior.