Using Large Language Models for Qualitative Analysis can Introduce Serious Bias

IF 6.5 2区 社会学 Q1 SOCIAL SCIENCES, MATHEMATICAL METHODS
Julian Ashwin, Aditya Chhabra, Vijayendra Rao
{"title":"Using Large Language Models for Qualitative Analysis can Introduce Serious Bias","authors":"Julian Ashwin, Aditya Chhabra, Vijayendra Rao","doi":"10.1177/00491241251338246","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) are quickly becoming ubiquitous, but their implications for social science research are not yet well understood. We ask whether LLMs can help code and analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees and their Bengali hosts in Bangladesh. We find that using LLMs to annotate and code text can introduce bias that can lead to misleading inferences. By bias we mean that the errors that LLMs make in coding interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human codes leads to less measurement error and bias than LLM annotations. Given that high quality codes are necessary in order to assess whether an LLM introduces bias, we argue that it may be preferable to train a bespoke model on a subset of transcripts coded by trained sociologists rather than use an LLM.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"240 1","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociological Methods & Research","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/00491241251338246","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) are quickly becoming ubiquitous, but their implications for social science research are not yet well understood. We ask whether LLMs can help code and analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees and their Bengali hosts in Bangladesh. We find that using LLMs to annotate and code text can introduce bias that can lead to misleading inferences. By bias we mean that the errors that LLMs make in coding interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human codes leads to less measurement error and bias than LLM annotations. Given that high quality codes are necessary in order to assess whether an LLM introduces bias, we argue that it may be preferable to train a bespoke model on a subset of transcripts coded by trained sociologists rather than use an LLM.
使用大型语言模型进行定性分析可能会引入严重的偏差
大型语言模型(llm)正迅速变得无处不在,但它们对社会科学研究的影响尚未得到很好的理解。我们询问法学硕士是否可以帮助编码和分析开放式访谈中的大n定性数据,并将其应用于对罗兴亚难民及其孟加拉国东道主的访谈记录。我们发现,使用llm来注释和编码文本可能会引入偏见,从而导致误导性推论。通过偏见,我们的意思是法学硕士在编码访谈记录时所犯的错误在访谈对象的特征方面不是随机的。与LLM注释相比,在高质量的人类代码上训练更简单的监督模型会导致更少的测量误差和偏差。考虑到为了评估法学硕士是否引入偏见,高质量的代码是必要的,我们认为,在训练有素的社会学家编码的转录本子集上训练定制模型可能比使用法学硕士更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
16.30
自引率
3.20%
发文量
40
期刊介绍: Sociological Methods & Research is a quarterly journal devoted to sociology as a cumulative empirical science. The objectives of SMR are multiple, but emphasis is placed on articles that advance the understanding of the field through systematic presentations that clarify methodological problems and assist in ordering the known facts in an area. Review articles will be published, particularly those that emphasize a critical analysis of the status of the arts, but original presentations that are broadly based and provide new research will also be published. Intrinsically, SMR is viewed as substantive journal but one that is highly focused on the assessment of the scientific status of sociology. The scope is broad and flexible, and authors are invited to correspond with the editors about the appropriateness of their articles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信