Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?

IF 3.1 1区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
{"title":"Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?","authors":"Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma","doi":"10.1007/s11191-024-00496-1","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>This study aimed to examine an assumption regarding whether generative artificial intelligence (GAI) tools can overcome the cognitive intensity that humans suffer when solving problems. We examine the performance of ChatGPT and GPT-4 on NAEP science assessments and compare their performance to students by cognitive demands of the items. Fifty-four 2019 NAEP science assessment tasks were coded by content experts using a two-dimensional cognitive load framework, including task cognitive complexity and dimensionality. ChatGPT and GPT-4 answered the questions individually and were scored using the scoring keys provided by NAEP. The analysis of the available data for this study was based on the average student ability scores for students who answered each item correctly and the percentage of students who responded to individual items. The results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered each individual item in the NAEP science assessments. As the cognitive demand for NAEP science assessments increases, statistically higher average student ability scores are required to correctly address the questions. This pattern was observed for Grades 4, 8, and 12 students respectively. However, ChatGPT and GPT-4 were not statistically sensitive to the increase of cognitive demands of the tasks, except for Grade 4. As the first study focusing on comparing cutting-edge GAI and K-12 students in problem-solving in science, this finding implies the need for changes to educational objectives to prepare students with competence to work with GAI tools such as ChatGPT and GPT-4 in the future. Education ought to emphasize the cultivation of advanced cognitive skills rather than depending solely on tasks that demand cognitive intensity. This approach would foster critical thinking, analytical skills, and the application of knowledge in novel contexts among students. Furthermore, the findings suggest that researchers should innovate assessment practices by moving away from cognitive intensity tasks toward creativity and analytical skills to more efficiently avoid the negative effects of GAI on testing.</p>","PeriodicalId":771,"journal":{"name":"Science & Education","volume":"392 1 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science & Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1007/s11191-024-00496-1","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

This study aimed to examine an assumption regarding whether generative artificial intelligence (GAI) tools can overcome the cognitive intensity that humans suffer when solving problems. We examine the performance of ChatGPT and GPT-4 on NAEP science assessments and compare their performance to students by cognitive demands of the items. Fifty-four 2019 NAEP science assessment tasks were coded by content experts using a two-dimensional cognitive load framework, including task cognitive complexity and dimensionality. ChatGPT and GPT-4 answered the questions individually and were scored using the scoring keys provided by NAEP. The analysis of the available data for this study was based on the average student ability scores for students who answered each item correctly and the percentage of students who responded to individual items. The results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered each individual item in the NAEP science assessments. As the cognitive demand for NAEP science assessments increases, statistically higher average student ability scores are required to correctly address the questions. This pattern was observed for Grades 4, 8, and 12 students respectively. However, ChatGPT and GPT-4 were not statistically sensitive to the increase of cognitive demands of the tasks, except for Grade 4. As the first study focusing on comparing cutting-edge GAI and K-12 students in problem-solving in science, this finding implies the need for changes to educational objectives to prepare students with competence to work with GAI tools such as ChatGPT and GPT-4 in the future. Education ought to emphasize the cultivation of advanced cognitive skills rather than depending solely on tasks that demand cognitive intensity. This approach would foster critical thinking, analytical skills, and the application of knowledge in novel contexts among students. Furthermore, the findings suggest that researchers should innovate assessment practices by moving away from cognitive intensity tasks toward creativity and analytical skills to more efficiently avoid the negative effects of GAI on testing.

Abstract Image

生成式人工智能和 ChatGPT 能否在认知要求较高的科学问题解决任务上胜过人类?
摘要 本研究旨在探讨生成式人工智能(GAI)工具能否克服人类在解决问题时所承受的认知强度这一假设。我们考察了ChatGPT和GPT-4在NAEP科学评估中的表现,并根据项目的认知要求将它们的表现与学生进行了比较。内容专家使用二维认知负荷框架(包括任务认知复杂度和维度)对 54 个 2019 年 NAEP 科学评估任务进行了编码。ChatGPT 和 GPT-4 单独回答问题,并使用 NAEP 提供的评分标准进行评分。本研究对现有数据的分析是基于正确回答每个题目的学生的平均能力得分和回答单个题目的学生百分比。结果表明,ChatGPT 和 GPT-4 在 NAEP 科学评估中的表现始终优于大多数回答每个单项的学生。随着 NAEP 科学评估对认知能力要求的提高,正确回答问题所需的学生平均能力分数也越来越高。4 年级、8 年级和 12 年级的学生分别观察到了这种模式。然而,除四年级外,ChatGPT 和 GPT-4 在统计上对任务认知要求的提高并不敏感。作为第一项专注于比较前沿 GAI 和 K-12 学生在科学问题解决中的表现的研究,这一发现意味着有必要改变教育目标,以培养学生具备在未来使用 ChatGPT 和 GPT-4 等 GAI 工具的能力。教育应强调高级认知技能的培养,而不是仅仅依赖于要求认知强度的任务。这种方法可以培养学生的批判性思维、分析能力以及在新环境中应用知识的能力。此外,研究结果表明,研究人员应创新评估实践,从认知强度任务转向创造力和分析能力,以更有效地避免 GAI 对测试的负面影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Science & Education
Science & Education EDUCATION & EDUCATIONAL RESEARCH-
CiteScore
6.60
自引率
14.00%
发文量
0
期刊介绍: Science Education publishes original articles on the latest issues and trends occurring internationally in science curriculum, instruction, learning, policy and preparation of science teachers with the aim to advance our knowledge of science education theory and practice. In addition to original articles, the journal features the following special sections: -Learning : consisting of theoretical and empirical research studies on learning of science. We invite manuscripts that investigate learning and its change and growth from various lenses, including psychological, social, cognitive, sociohistorical, and affective. Studies examining the relationship of learning to teaching, the science knowledge and practices, the learners themselves, and the contexts (social, political, physical, ideological, institutional, epistemological, and cultural) are similarly welcome. -Issues and Trends : consisting primarily of analytical, interpretive, or persuasive essays on current educational, social, or philosophical issues and trends relevant to the teaching of science. This special section particularly seeks to promote informed dialogues about current issues in science education, and carefully reasoned papers representing disparate viewpoints are welcomed. Manuscripts submitted for this section may be in the form of a position paper, a polemical piece, or a creative commentary. -Science Learning in Everyday Life : consisting of analytical, interpretative, or philosophical papers regarding learning science outside of the formal classroom. Papers should investigate experiences in settings such as community, home, the Internet, after school settings, museums, and other opportunities that develop science interest, knowledge or practices across the life span. Attention to issues and factors relating to equity in science learning are especially encouraged.. -Science Teacher Education [...]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信