Is ChatGPT like a nine-year-old child in theory of mind? Evidence from Chinese writing

IF 5.4 2区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Education and Information Technologies Pub Date : 2024-09-19 DOI:10.1007/s10639-024-13046-7

Siyi Cao, Yizhong Xu, Tongquan Zhou, Siruo Zhou

{"title":"Is ChatGPT like a nine-year-old child in theory of mind? Evidence from Chinese writing","authors":"Siyi Cao, Yizhong Xu, Tongquan Zhou, Siruo Zhou","doi":"10.1007/s10639-024-13046-7","DOIUrl":null,"url":null,"abstract":"<p>ChatGPT has been demonstrated to possess significant capabilities in generating intricate human-like text, and recent studies have established that its performance in theory of mind (ToM) tasks is strikingly comparable to a nine-year-old child’s. However, it remains unknown whether ChatGPT outperforms children of this age group in Chinese writing, a task credibly related to ToM. To justify the claim, this study compared ChatGPT with nine-year-old children in making Chinese compositions (i.e., science-themed and nature-themed narratives), aiming to unveil the relative advantages and disadvantages by human writers and ChatGPT in Chinese writing. Based on the evaluative framework comprising of four indices (i.e., fluency, accuracy, complexity, and cohesion) to test writing quality, this study added an often-overlooked index “emotion” to extend the framework. Afterward, we collected 120 writing samples produced by ChatGPT and children and used the confirmatory factor analysis (CFA) and structural equation modelling (SEM) for data analysis and comparison. The results revealed that this age group of children surpassed ChatGPT in fluency and cohesion while ChatGPT transcended the children in accuracy. With respect to complexity, the children exhibited better skills in science-themed writing, but ChatGPT better in nature-themed writing. Most importantly, this study unlocked the pioneering discovery that children display more potent emotional expressions than ChatGPT in Chinese writing, providing an instance of evidence that ChatGPT is really even poorer than a nine-year-old child in ToM to some extent.</p>","PeriodicalId":51494,"journal":{"name":"Education and Information Technologies","volume":"20 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Education and Information Technologies","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1007/s10639-024-13046-7","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

ChatGPT has been demonstrated to possess significant capabilities in generating intricate human-like text, and recent studies have established that its performance in theory of mind (ToM) tasks is strikingly comparable to a nine-year-old child’s. However, it remains unknown whether ChatGPT outperforms children of this age group in Chinese writing, a task credibly related to ToM. To justify the claim, this study compared ChatGPT with nine-year-old children in making Chinese compositions (i.e., science-themed and nature-themed narratives), aiming to unveil the relative advantages and disadvantages by human writers and ChatGPT in Chinese writing. Based on the evaluative framework comprising of four indices (i.e., fluency, accuracy, complexity, and cohesion) to test writing quality, this study added an often-overlooked index “emotion” to extend the framework. Afterward, we collected 120 writing samples produced by ChatGPT and children and used the confirmatory factor analysis (CFA) and structural equation modelling (SEM) for data analysis and comparison. The results revealed that this age group of children surpassed ChatGPT in fluency and cohesion while ChatGPT transcended the children in accuracy. With respect to complexity, the children exhibited better skills in science-themed writing, but ChatGPT better in nature-themed writing. Most importantly, this study unlocked the pioneering discovery that children display more potent emotional expressions than ChatGPT in Chinese writing, providing an instance of evidence that ChatGPT is really even poorer than a nine-year-old child in ToM to some extent.

Abstract Image

查看原文本刊更多论文

ChatGPT 的思维理论像九岁儿童吗？来自中文写作的证据

ChatGPT 已被证明在生成复杂的类人文本方面具有强大的能力，最近的研究还证实它在心智理论（ToM）任务中的表现与九岁儿童的表现相当。然而，ChatGPT 在中文书写（一项与心智理论相关的任务）方面的表现是否优于该年龄段的儿童，目前仍是个未知数。为了证明这一说法，本研究比较了 ChatGPT 和九岁儿童在中文作文（即科学主题和自然主题的叙事）中的表现，旨在揭示人类作家和 ChatGPT 在中文写作中的相对优势和劣势。本研究在检验写作质量的四个指标（即流畅度、准确度、复杂度和凝聚力）评价框架的基础上，增加了一个经常被忽视的指标 "情感"，以扩展该框架。随后，我们收集了 120 份由 ChatGPT 和儿童撰写的写作样本，并使用确证因子分析（CFA）和结构方程模型（SEM）进行数据分析和比较。结果显示，这个年龄组的儿童在流畅性和连贯性方面超过了 ChatGPT，而 ChatGPT 则在准确性方面超过了儿童。就复杂性而言，儿童在以科学为主题的写作中表现出更好的技能，而 ChatGPT 则在以自然为主题的写作中表现得更好。最重要的是，本研究揭示了儿童在中文写作中比 ChatGPT 表现出更强烈的情感表达这一开创性发现，为 ChatGPT 在某种程度上确实比九岁儿童的 ToM 更差提供了实例证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Education and Information Technologies EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

10.00

自引率

12.70%

发文量

610

期刊介绍： The Journal of Education and Information Technologies (EAIT) is a platform for the range of debates and issues in the field of Computing Education as well as the many uses of information and communication technology (ICT) across many educational subjects and sectors. It probes the use of computing to improve education and learning in a variety of settings, platforms and environments. The journal aims to provide perspectives at all levels, from the micro level of specific pedagogical approaches in Computing Education and applications or instances of use in classrooms, to macro concerns of national policies and major projects; from pre-school classes to adults in tertiary institutions; from teachers and administrators to researchers and designers; from institutions to online and lifelong learning. The journal is embedded in the research and practice of professionals within the contemporary global context and its breadth and scope encourage debate on fundamental issues at all levels and from different research paradigms and learning theories. The journal does not proselytize on behalf of the technologies (whether they be mobile, desktop, interactive, virtual, games-based or learning management systems) but rather provokes debate on all the complex relationships within and between computing and education, whether they are in informal or formal settings. It probes state of the art technologies in Computing Education and it also considers the design and evaluation of digital educational artefacts. The journal aims to maintain and expand its international standing by careful selection on merit of the papers submitted, thus providing a credible ongoing forum for debate and scholarly discourse. Special Issues are occasionally published to cover particular issues in depth. EAIT invites readers to submit papers that draw inferences, probe theory and create new knowledge that informs practice, policy and scholarship. Readers are also invited to comment and reflect upon the argument and opinions published. EAIT is the official journal of the Technical Committee on Education of the International Federation for Information Processing (IFIP) in partnership with UNESCO.