转变在线学习研究：利用GPT大型语言模型对认知存在进行自动内容分析

IF 6.8 1区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Internet and Higher Education Pub Date : 2025-02-12 DOI:10.1016/j.iheduc.2025.101001

Daniela Castellanos-Reyes , Larisa Olesova , Ayesha Sadaf

{"title":"转变在线学习研究：利用GPT大型语言模型对认知存在进行自动内容分析","authors":"Daniela Castellanos-Reyes , Larisa Olesova , Ayesha Sadaf","doi":"10.1016/j.iheduc.2025.101001","DOIUrl":null,"url":null,"abstract":"<div><div>The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.</div></div>","PeriodicalId":48186,"journal":{"name":"Internet and Higher Education","volume":"65 ","pages":"Article 101001"},"PeriodicalIF":6.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence\",\"authors\":\"Daniela Castellanos-Reyes , Larisa Olesova , Ayesha Sadaf\",\"doi\":\"10.1016/j.iheduc.2025.101001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.</div></div>\",\"PeriodicalId\":48186,\"journal\":{\"name\":\"Internet and Higher Education\",\"volume\":\"65 \",\"pages\":\"Article 101001\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet and Higher Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1096751625000107\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet and Higher Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1096751625000107","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

过去二十年来，在线学习研究通过基于认知存在（CP）和实践探究模型（PIM）等结构的内容分析来检查讨论板文本数据，从而取得了巨大的繁荣。PIM为认知发展如何在在线学习体验的协作探究中展开奠定了基础。具有讽刺意味的是，就时间和专业知识而言，内容分析是一项资源密集型的工作，这使得研究人员寻找通过集成机器学习算法自动进行文本分类的方法。我们通过OpenAI的公共API中的生成预训练转换器（GPT）模型利用大型语言模型（llm），基于PIM指标对学生文本数据进行自动化内容分析，并与人工分析相比，评估自动化内容分析的可靠性和效率。使用大型语言模型内容分析（LACA）方法的七个步骤，我们提出了一个人工智能适应的CP代码本，利用提示工程技术（即角色、思维链、一次、几次）进行CP的自动内容分析。我们发现，一个带有一次提示的微调模型与研究人员实现了适度的互解释器可靠性。在PIM整合阶段，对学生讨论版文本进行分类时，模型的可靠性更高。成本比较表明，LACA方法在效率方面具有明显的成本优势。然而，从业者仍然需要相当的数据素养技能来大规模部署LACA。为简化CP码本和利用LLM提高IRR提供了理论建议。讨论了对实践的影响，并建议未来的研究包括教学建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Transforming online learning research: Leveraging GPT large language models for automated content analysis of cognitive presence

The last two decades of online learning research vastly flourished by examining discussion board text data through content analysis based on constructs like cognitive presence (CP) with the Practical Inquiry Model (PIM). The PIM sets a footprint for how cognitive development unfolds in collaborative inquiry in online learning experiences. Ironically, content analysis is a resource-intensive endeavor in terms of time and expertise, making researchers look for ways to automate text classification through ensemble machine-learning algorithms. We leveraged large language models (LLMs) through OpenAI's Generative Pre-Trained Transformer (GPT) models in the public API to automate the content analysis of students' text data based on PIM indicators and assess the reliability and efficiency of automated content analysis compared to human analysis. Using the seven steps of the Large Language Model Content Analysis (LACA) approach, we proposed an AI-adapted CP codebook leveraging prompt engineering techniques (i.e., role, chain-of-thought, one-shot, few-shot) for the automated content analysis of CP. We found that a fine-tuned model with a one-shot prompt achieved moderate interrater reliability with researchers. The models were more reliable when classifying students' discussion board text in the Integration phase of the PIM. A cost comparison showed an obvious cost advantage of LACA approaches in online learning research in terms of efficiency. Nevertheless, practitioners still need considerable data literacy skills to deploy LACA at a scale. We offer theoretical suggestions for simplifying the CP codebook and improving the IRR with LLM. Implications for practice are discussed, and future research that includes instructional advice is recommended.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Internet and Higher Education EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

19.30

自引率

4.70%

发文量

审稿时长

40 days

期刊介绍： The Internet and Higher Education is a quarterly peer-reviewed journal focused on contemporary issues and future trends in online learning, teaching, and administration within post-secondary education. It welcomes contributions from diverse academic disciplines worldwide and provides a platform for theory papers, research studies, critical essays, editorials, reviews, case studies, and social commentary.