转变在线学习研究:利用GPT大型语言模型对认知存在进行自动内容分析

IF 6.8 1区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Daniela Castellanos-Reyes , Larisa Olesova , Ayesha Sadaf
{"title":"转变在线学习研究:利用GPT大型语言模型对认知存在进行自动内容分析","authors":"Daniela Castellanos-Reyes ,&nbsp;Larisa Olesova ,&nbsp;Ayesha Sadaf","doi":"10.1016/j.iheduc.2025.101001","DOIUrl":null,"url":null,"abstract":"<div><div>The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.</div></div>","PeriodicalId":48186,"journal":{"name":"Internet and Higher Education","volume":"65 ","pages":"Article 101001"},"PeriodicalIF":6.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence\",\"authors\":\"Daniela Castellanos-Reyes ,&nbsp;Larisa Olesova ,&nbsp;Ayesha Sadaf\",\"doi\":\"10.1016/j.iheduc.2025.101001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.</div></div>\",\"PeriodicalId\":48186,\"journal\":{\"name\":\"Internet and Higher Education\",\"volume\":\"65 \",\"pages\":\"Article 101001\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet and Higher Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1096751625000107\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet and Higher Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1096751625000107","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

摘要

过去二十年来,在线学习研究通过基于认知存在(CP)和实践探究模型(PIM)等结构的内容分析来检查讨论板文本数据,从而取得了巨大的繁荣。PIM为认知发展如何在在线学习体验的协作探究中展开奠定了基础。具有讽刺意味的是,就时间和专业知识而言,内容分析是一项资源密集型的工作,这使得研究人员寻找通过集成机器学习算法自动进行文本分类的方法。我们通过OpenAI的公共API中的生成预训练转换器(GPT)模型利用大型语言模型(llm),基于PIM指标对学生文本数据进行自动化内容分析,并与人工分析相比,评估自动化内容分析的可靠性和效率。使用大型语言模型内容分析(LACA)方法的七个步骤,我们提出了一个人工智能适应的CP代码本,利用提示工程技术(即角色、思维链、一次、几次)进行CP的自动内容分析。我们发现,一个带有一次提示的微调模型与研究人员实现了适度的互解释器可靠性。在PIM整合阶段,对学生讨论版文本进行分类时,模型的可靠性更高。成本比较表明,LACA方法在效率方面具有明显的成本优势。然而,从业者仍然需要相当的数据素养技能来大规模部署LACA。为简化CP码本和利用LLM提高IRR提供了理论建议。讨论了对实践的影响,并建议未来的研究包括教学建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence
The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Internet and Higher Education
Internet and Higher Education EDUCATION & EDUCATIONAL RESEARCH-
CiteScore
19.30
自引率
4.70%
发文量
30
审稿时长
40 days
期刊介绍: The Internet and Higher Education is a quarterly peer-reviewed journal focused on contemporary issues and future trends in online learning, teaching, and administration within post-secondary education. It welcomes contributions from diverse academic disciplines worldwide and provides a platform for theory papers, research studies, critical essays, editorials, reviews, case studies, and social commentary.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信