Investigating the Efficacy of ChatGPT-3.5 for Tutoring in Chinese Elementary Education Settings

IF 4.9 3区教育学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Learning Technologies Pub Date : 2024-09-19 DOI:10.1109/TLT.2024.3464560

Yu Bai;Jun Li;Jun Shen;Liang Zhao

{"title":"Investigating the Efficacy of ChatGPT-3.5 for Tutoring in Chinese Elementary Education Settings","authors":"Yu Bai;Jun Li;Jun Shen;Liang Zhao","doi":"10.1109/TLT.2024.3464560","DOIUrl":null,"url":null,"abstract":"The potential of artificial intelligence (AI) in transforming education has received considerable attention. This study aims to explore the potential of large language models (LLMs) in assisting students with studying and passing standardized exams, while many people think it is a hype situation. Using primary education as an example, this research investigates whether ChatGPT-3.5 can achieve satisfactory performance on the Chinese Primary School Exams and whether it can be used as a teaching aid or tutor. We designed an experimental framework and constructed a benchmark that comprises 4800 questions collected from 48 tasks in Chinese elementary education settings. Through automatic and manual evaluations, we observed that ChatGPT-3.5’s pass rate was below the required level of accuracy for most tasks, and the correctness of ChatGPT-3.5’s answer interpretation was unsatisfactory. These results revealed a discrepancy between the findings and our initial expectations. However, the comparative experiments between ChatGPT-3.5 and ChatGPT-4 indicated significant improvements in model performance, demonstrating the potential of using LLMs as a teaching aid. This article also investigates the use of the trans-prompting strategy to reduce the impact of language bias and enhance question understanding. We present a comparison of the models' performance and the improvement under the trans-lingual problem decomposition prompting mechanism. Finally, we discuss the challenges associated with the appropriate application of AI-driven language models, along with future directions and limitations in the field of AI for education.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"2156-2171"},"PeriodicalIF":4.9000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10684453/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

The potential of artificial intelligence (AI) in transforming education has received considerable attention. This study aims to explore the potential of large language models (LLMs) in assisting students with studying and passing standardized exams, while many people think it is a hype situation. Using primary education as an example, this research investigates whether ChatGPT-3.5 can achieve satisfactory performance on the Chinese Primary School Exams and whether it can be used as a teaching aid or tutor. We designed an experimental framework and constructed a benchmark that comprises 4800 questions collected from 48 tasks in Chinese elementary education settings. Through automatic and manual evaluations, we observed that ChatGPT-3.5’s pass rate was below the required level of accuracy for most tasks, and the correctness of ChatGPT-3.5’s answer interpretation was unsatisfactory. These results revealed a discrepancy between the findings and our initial expectations. However, the comparative experiments between ChatGPT-3.5 and ChatGPT-4 indicated significant improvements in model performance, demonstrating the potential of using LLMs as a teaching aid. This article also investigates the use of the trans-prompting strategy to reduce the impact of language bias and enhance question understanding. We present a comparison of the models' performance and the improvement under the trans-lingual problem decomposition prompting mechanism. Finally, we discuss the challenges associated with the appropriate application of AI-driven language models, along with future directions and limitations in the field of AI for education.

查看原文本刊更多论文

研究 ChatGPT-3.5 在中国小学教育环境中的辅导效果

人工智能（AI）在改变教育方面的潜力已受到广泛关注。本研究旨在探索大型语言模型（LLM）在帮助学生学习和通过标准化考试方面的潜力，而很多人认为这是一种炒作情况。本研究以小学教育为例，探讨 ChatGPT-3.5 是否能在中国小学考试中取得令人满意的成绩，以及是否可用作教学辅助工具或辅导工具。我们设计了一个实验框架，并构建了一个基准，其中包括从中国小学教育环境中的 48 个任务中收集的 4800 道题。通过自动和人工评估，我们发现 ChatGPT-3.5 的通过率在大多数任务中都低于要求的准确率，而且 ChatGPT-3.5 的答案解释正确率也不尽如人意。这些结果表明，实验结果与我们最初的预期存在差异。不过，ChatGPT-3.5 和 ChatGPT-4 的对比实验表明，模型性能有了显著提高，这证明了使用 LLM 作为教学辅助工具的潜力。本文还研究了如何使用反向提示策略来减少语言偏差的影响并增强对问题的理解。我们比较了模型的性能以及在跨语言问题分解提示机制下的改进情况。最后，我们讨论了适当应用人工智能驱动的语言模型所面临的挑战，以及人工智能教育领域的未来发展方向和局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Learning Technologies COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

7.50

自引率

5.40%

发文量

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.