Who's the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers

IF 4.9 2区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational Computing Research Pub Date : 2023-11-10 DOI:10.1177/07356331231191174

Felipe Urrutia, Roberto Araya

{"title":"Who's the Best Detective? Large Language Models vs. Traditional Machine Learning in Detecting Incoherent Fourth Grade Math Answers","authors":"Felipe Urrutia, Roberto Araya","doi":"10.1177/07356331231191174","DOIUrl":null,"url":null,"abstract":"Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.","PeriodicalId":47865,"journal":{"name":"Journal of Educational Computing Research","volume":"114 19","pages":"0"},"PeriodicalIF":4.9000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Computing Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/07356331231191174","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). They have a powerful discursive ability that can be used to explain decisions. In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.

查看原文本刊更多论文

谁是最好的侦探?大型语言模型与传统机器学习在检测四年级数学不连贯答案中的对比

开放式问题的书面答案对学习的长期影响要高于多项选择题。然而，重要的是，老师要立即复习答案，并要求重做那些不连贯的答案。这可能是一项艰巨的任务，对教师来说可能很耗时。一个可能的解决方案是自动检测不连贯的答案。一种选择是使用大型语言模型(LLM)自动化审查。他们有强大的话语能力，可以用来解释决定。本文采用GPT-3、BLOOM和YOU三种LLMs分析了四年级学生在数学方面的反应。我们用零，一，二，三，四发子弹。我们将它们的性能与机器学习(ML)训练的各种分类器的结果进行了比较。我们发现llm在检测不连贯答案方面比ml表现得更差。困难似乎在于包含问题和答案的递归问题，以及典型的四年级拼写错误的学生的回答。经过仔细研究，我们发现ChatGPT模型面临着同样的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Educational Computing Research EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

11.90

自引率

6.20%

发文量

期刊介绍： The goal of this Journal is to provide an international scholarly publication forum for peer-reviewed interdisciplinary research into the applications, effects, and implications of computer-based education. The Journal features articles useful for practitioners and theorists alike. The terms "education" and "computing" are viewed broadly. “Education” refers to the use of computer-based technologies at all levels of the formal education system, business and industry, home-schooling, lifelong learning, and unintentional learning environments. “Computing” refers to all forms of computer applications and innovations - both hardware and software. For example, this could range from mobile and ubiquitous computing to immersive 3D simulations and games to computing-enhanced virtual learning environments.