评估分析课堂对话的大型语言模型。

IF 3.6 1区 心理学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Yun Long, Haifeng Luo, Yu Zhang
{"title":"评估分析课堂对话的大型语言模型。","authors":"Yun Long, Haifeng Luo, Yu Zhang","doi":"10.1038/s41539-024-00273-3","DOIUrl":null,"url":null,"abstract":"<p><p>This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue-a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using datasets from middle school mathematics and Chinese classes, classroom dialogues were manually coded by experts and then analysed with a customised GPT-4 model. The study compares manual annotations with GPT-4 outputs to evaluate efficacy. Metrics include time efficiency, inter-coder agreement, and reliability between human coders and GPT-4. Results show significant time savings and high coding consistency between the model and human coders, with minor discrepancies. These findings highlight the strong potential of LLMs in teaching evaluation and facilitation.</p>","PeriodicalId":48503,"journal":{"name":"npj Science of Learning","volume":"9 1","pages":"60"},"PeriodicalIF":3.6000,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447259/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating large language models in analysing classroom dialogue.\",\"authors\":\"Yun Long, Haifeng Luo, Yu Zhang\",\"doi\":\"10.1038/s41539-024-00273-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue-a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using datasets from middle school mathematics and Chinese classes, classroom dialogues were manually coded by experts and then analysed with a customised GPT-4 model. The study compares manual annotations with GPT-4 outputs to evaluate efficacy. Metrics include time efficiency, inter-coder agreement, and reliability between human coders and GPT-4. Results show significant time savings and high coding consistency between the model and human coders, with minor discrepancies. These findings highlight the strong potential of LLMs in teaching evaluation and facilitation.</p>\",\"PeriodicalId\":48503,\"journal\":{\"name\":\"npj Science of Learning\",\"volume\":\"9 1\",\"pages\":\"60\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447259/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"npj Science of Learning\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1038/s41539-024-00273-3\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Science of Learning","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1038/s41539-024-00273-3","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

摘要

本研究探讨了大型语言模型(LLM),特别是 GPT-4 在分析课堂对话中的应用--这是教学诊断和质量改进的关键任务。传统的定性方法既是知识密集型的,也是劳动密集型的。本研究探讨了语言模型在简化和增强这一过程中的潜力。利用中学数学和中文课堂的数据集,由专家对课堂对话进行人工编码,然后使用定制的 GPT-4 模型进行分析。研究比较了人工注释和 GPT-4 输出,以评估其功效。衡量标准包括时间效率、编码员之间的一致性以及人工编码员与 GPT-4 之间的可靠性。结果表明,模型与人工编码人员之间的时间节省效果明显,编码一致性高,差异较小。这些发现凸显了 LLM 在教学评价和促进方面的巨大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating large language models in analysing classroom dialogue.

This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue-a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using datasets from middle school mathematics and Chinese classes, classroom dialogues were manually coded by experts and then analysed with a customised GPT-4 model. The study compares manual annotations with GPT-4 outputs to evaluate efficacy. Metrics include time efficiency, inter-coder agreement, and reliability between human coders and GPT-4. Results show significant time savings and high coding consistency between the model and human coders, with minor discrepancies. These findings highlight the strong potential of LLMs in teaching evaluation and facilitation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.40
自引率
7.10%
发文量
29
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信