The life cycle of large language models in education: A framework for understanding sources of bias

IF 8.1 1区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

British Journal of Educational Technology Pub Date : 2024-07-12 DOI:10.1111/bjet.13505

Jinsook Lee, Yann Hicke, Renzhe Yu, Christopher Brooks, René F. Kizilcec

{"title":"The life cycle of large language models in education: A framework for understanding sources of bias","authors":"Jinsook Lee, Yann Hicke, Renzhe Yu, Christopher Brooks, René F. Kizilcec","doi":"10.1111/bjet.13505","DOIUrl":null,"url":null,"abstract":"<div>\n \n <section>\n \n \n <p>Large language models (LLMs) are increasingly adopted in educational contexts to provide personalized support to students and teachers. The unprecedented capacity of LLM-based applications to understand and generate natural language can potentially improve instructional effectiveness and learning outcomes, but the integration of LLMs in education technology has renewed concerns over algorithmic bias, which may exacerbate educational inequalities. Building on prior work that mapped the traditional machine learning life cycle, we provide a framework of the LLM life cycle from the initial development of LLMs to customizing pre-trained models for various applications in educational settings. We explain each step in the LLM life cycle and identify potential sources of bias that may arise in the context of education. We discuss why current measures of bias from traditional machine learning fail to transfer to LLM-generated text (eg, tutoring conversations) because text encodings are high-dimensional, there can be multiple correct responses, and tailoring responses may be pedagogically desirable rather than unfair. The proposed framework clarifies the complex nature of bias in LLM applications and provides practical guidance for their evaluation to promote educational equity.</p>\n </section>\n \n <section>\n \n <div>\n \n <div>\n \n <h3>Practitioner notes</h3>\n <p>What is already known about this topic\n\n </p><ul>\n \n <li>The life cycle of traditional machine learning (ML) applications which focus on predicting labels is well understood.</li>\n \n <li>Biases are known to enter in traditional ML applications at various points in the life cycle, and methods to measure and mitigate these biases have been developed and tested.</li>\n \n <li>Large language models (LLMs) and other forms of generative artificial intelligence (GenAI) are increasingly adopted in education technologies (EdTech), but current evaluation approaches are not specific to the domain of education.</li>\n </ul>\n <p>What this paper adds\n\n </p><ul>\n \n <li>A holistic perspective of the LLM life cycle with domain-specific examples in education to highlight opportunities and challenges for incorporating natural language understanding (NLU) and natural language generation (NLG) into EdTech.</li>\n \n <li>Potential sources of bias are identified in each step of the LLM life cycle and discussed in the context of education.</li>\n \n <li>A framework for understanding where to expect potential harms of LLMs for students, teachers, and other users of GenAI technology in education, which can guide approaches to bias measurement and mitigation.</li>\n </ul>\n <p>Implications for practice and/or policy\n\n </p><ul>\n \n <li>Education practitioners and policymakers should be aware that biases can originate from a multitude of steps in the LLM life cycle, and the life cycle perspective offers them a heuristic for asking technology developers to explain each step to assess the risk of bias.</li>\n \n <li>Measuring the biases of systems that use LLMs in education is more complex than with traditional ML, in large part because the evaluation of natural language generation is highly context-dependent (eg, what counts as good feedback on an assignment varies).</li>\n \n <li>EdTech developers can play an important role in collecting and curating datasets for the evaluation and benchmarking of LLM applications moving forward.</li>\n </ul>\n </div>\n </div>\n </section>\n </div>","PeriodicalId":48315,"journal":{"name":"British Journal of Educational Technology","volume":"55 5","pages":"1982-2002"},"PeriodicalIF":8.1000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Educational Technology","FirstCategoryId":"95","ListUrlMain":"https://bera-journals.onlinelibrary.wiley.com/doi/10.1111/bjet.13505","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) are increasingly adopted in educational contexts to provide personalized support to students and teachers. The unprecedented capacity of LLM-based applications to understand and generate natural language can potentially improve instructional effectiveness and learning outcomes, but the integration of LLMs in education technology has renewed concerns over algorithmic bias, which may exacerbate educational inequalities. Building on prior work that mapped the traditional machine learning life cycle, we provide a framework of the LLM life cycle from the initial development of LLMs to customizing pre-trained models for various applications in educational settings. We explain each step in the LLM life cycle and identify potential sources of bias that may arise in the context of education. We discuss why current measures of bias from traditional machine learning fail to transfer to LLM-generated text (eg, tutoring conversations) because text encodings are high-dimensional, there can be multiple correct responses, and tailoring responses may be pedagogically desirable rather than unfair. The proposed framework clarifies the complex nature of bias in LLM applications and provides practical guidance for their evaluation to promote educational equity.

Practitioner notes

What is already known about this topic

The life cycle of traditional machine learning (ML) applications which focus on predicting labels is well understood.
Biases are known to enter in traditional ML applications at various points in the life cycle, and methods to measure and mitigate these biases have been developed and tested.
Large language models (LLMs) and other forms of generative artificial intelligence (GenAI) are increasingly adopted in education technologies (EdTech), but current evaluation approaches are not specific to the domain of education.

What this paper adds

A holistic perspective of the LLM life cycle with domain-specific examples in education to highlight opportunities and challenges for incorporating natural language understanding (NLU) and natural language generation (NLG) into EdTech.
Potential sources of bias are identified in each step of the LLM life cycle and discussed in the context of education.
A framework for understanding where to expect potential harms of LLMs for students, teachers, and other users of GenAI technology in education, which can guide approaches to bias measurement and mitigation.

Implications for practice and/or policy

Education practitioners and policymakers should be aware that biases can originate from a multitude of steps in the LLM life cycle, and the life cycle perspective offers them a heuristic for asking technology developers to explain each step to assess the risk of bias.
Measuring the biases of systems that use LLMs in education is more complex than with traditional ML, in large part because the evaluation of natural language generation is highly context-dependent (eg, what counts as good feedback on an assignment varies).
EdTech developers can play an important role in collecting and curating datasets for the evaluation and benchmarking of LLM applications moving forward.

Abstract Image

查看原文本刊更多论文

教育领域大型语言模型的生命周期：了解偏见来源的框架

大语言模型（LLM）越来越多地被应用于教育领域，为学生和教师提供个性化支持。基于 LLM 的应用程序具有前所未有的理解和生成自然语言的能力，有可能提高教学效果和学习成果，但将 LLM 整合到教育技术中再次引发了对算法偏见的担忧，因为这可能会加剧教育不平等。在之前绘制传统机器学习生命周期图的工作基础上，我们提供了一个 LLM 生命周期框架，从 LLM 的初始开发到为教育环境中的各种应用定制预训练模型。我们解释了 LLM 生命周期中的每个步骤，并确定了教育背景下可能出现的潜在偏差来源。我们讨论了为什么目前传统机器学习的偏差测量方法无法应用于 LLM 生成的文本（如辅导对话），因为文本编码是高维的，可能存在多个正确的回答，而且定制回答可能在教学上是可取的，而不是不公平的。所提出的框架澄清了 LLM 应用程序中偏见的复杂性质，并为其评估提供了实用指导，以促进教育公平。众所周知，传统的机器学习应用在生命周期的不同阶段会出现偏差，而测量和减轻这些偏差的方法已经开发出来并经过了测试。大型语言模型（LLM）和其他形式的生成式人工智能（GenAI）越来越多地被教育技术（EdTech）所采用，但目前的评估方法并不是专门针对教育领域的。本文的补充内容从整体上透视 LLM 的生命周期，并结合教育领域的具体实例，强调将自然语言理解（NLU）和自然语言生成（NLG）纳入教育技术的机遇和挑战。在 LLM 生命周期的每个步骤中确定潜在的偏差来源，并结合教育进行讨论。了解 LLM 对学生、教师和 GenAI 技术在教育领域的其他用户的潜在危害的框架，该框架可指导偏差测量和缓解方法。对实践和/或政策的启示教育从业者和政策制定者应该意识到，偏差可能源于LLM生命周期中的多个步骤，而生命周期视角为他们提供了一种启发式方法，可以要求技术开发者解释每个步骤，以评估偏差风险。与传统的 ML 相比，测量教育领域中使用 LLM 的系统的偏差更为复杂，这在很大程度上是因为对自然语言生成的评估高度依赖于上下文（例如，什么算作作业的良好反馈各不相同）。教育技术开发人员可以在收集和整理数据集方面发挥重要作用，以便对 LLM 应用程序进行评估和基准测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

British Journal of Educational Technology EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

15.60

自引率

4.50%

发文量

111

期刊介绍： BJET is a primary source for academics and professionals in the fields of digital educational and training technology throughout the world. The Journal is published by Wiley on behalf of The British Educational Research Association (BERA). It publishes theoretical perspectives, methodological developments and high quality empirical research that demonstrate whether and how applications of instructional/educational technology systems, networks, tools and resources lead to improvements in formal and non-formal education at all levels, from early years through to higher, technical and vocational education, professional development and corporate training.