Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence

IF 6.4 1区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Daniela Castellanos-Reyes , Larisa Olesova , Ayesha Sadaf
{"title":"Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence","authors":"Daniela Castellanos-Reyes ,&nbsp;Larisa Olesova ,&nbsp;Ayesha Sadaf","doi":"10.1016/j.iheduc.2025.101001","DOIUrl":null,"url":null,"abstract":"<div><div>The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.</div></div>","PeriodicalId":48186,"journal":{"name":"Internet and Higher Education","volume":"65 ","pages":"Article 101001"},"PeriodicalIF":6.4000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet and Higher Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1096751625000107","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Internet and Higher Education
Internet and Higher Education EDUCATION & EDUCATIONAL RESEARCH-
CiteScore
19.30
自引率
4.70%
发文量
30
审稿时长
40 days
期刊介绍: The Internet and Higher Education is a quarterly peer-reviewed journal focused on contemporary issues and future trends in online learning, teaching, and administration within post-secondary education. It welcomes contributions from diverse academic disciplines worldwide and provides a platform for theory papers, research studies, critical essays, editorials, reviews, case studies, and social commentary.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信