谁是最好的侦探?大型语言模型与传统机器学习在检测四年级数学不连贯答案中的对比

IF 4 2区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Felipe Urrutia, Roberto Araya
{"title":"谁是最好的侦探?大型语言模型与传统机器学习在检测四年级数学不连贯答案中的对比","authors":"Felipe Urrutia, Roberto Araya","doi":"10.1177/07356331231191174","DOIUrl":null,"url":null,"abstract":"Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.","PeriodicalId":47865,"journal":{"name":"Journal of Educational Computing Research","volume":"114 19","pages":"0"},"PeriodicalIF":4.0000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Who's the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers\",\"authors\":\"Felipe Urrutia, Roberto Araya\",\"doi\":\"10.1177/07356331231191174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.\",\"PeriodicalId\":47865,\"journal\":{\"name\":\"Journal of Educational Computing Research\",\"volume\":\"114 19\",\"pages\":\"0\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2023-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Educational Computing Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/07356331231191174\",\"RegionNum\":2,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Computing Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/07356331231191174","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

摘要

开放式问题的书面答案对学习的长期影响要高于多项选择题。然而,重要的是,老师要立即复习答案,并要求重做那些不连贯的答案。这可能是一项艰巨的任务,对教师来说可能很耗时。一个可能的解决方案是自动检测不连贯的答案。一种选择是使用大型语言模型(LLM)自动化审查。他们有强大的话语能力,可以用来解释决定。本文采用GPT-3、BLOOM和YOU三种LLMs分析了四年级学生在数学方面的反应。我们用零,一,二,三,四发子弹。我们将它们的性能与机器学习(ML)训练的各种分类器的结果进行了比较。我们发现llm在检测不连贯答案方面比ml表现得更差。困难似乎在于包含问题和答案的递归问题,以及典型的四年级拼写错误的学生的回答。经过仔细研究,我们发现ChatGPT模型面临着同样的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Who's the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers
Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Educational Computing Research
Journal of Educational Computing Research EDUCATION & EDUCATIONAL RESEARCH-
CiteScore
11.90
自引率
6.20%
发文量
69
期刊介绍: The goal of this Journal is to provide an international scholarly publication forum for peer-reviewed interdisciplinary research into the applications, effects, and implications of computer-based education. The Journal features articles useful for practitioners and theorists alike. The terms "education" and "computing" are viewed broadly. “Education” refers to the use of computer-based technologies at all levels of the formal education system, business and industry, home-schooling, lifelong learning, and unintentional learning environments. “Computing” refers to all forms of computer applications and innovations - both hardware and software. For example, this could range from mobile and ubiquitous computing to immersive 3D simulations and games to computing-enhanced virtual learning environments.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信