The use of large language models for qualitative research: The Deep Computational Text Analyser (DECOTA).

IF 7.6 1区 心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY
Lois Player, Ryan Hughes, Kaloyan Mitev, Lorraine Whitmarsh, Christina Demski, Nicholas Nash, Trisevgeni Papakonstantinou, Mark Wilson
{"title":"The use of large language models for qualitative research: The Deep Computational Text Analyser (DECOTA).","authors":"Lois Player, Ryan Hughes, Kaloyan Mitev, Lorraine Whitmarsh, Christina Demski, Nicholas Nash, Trisevgeni Papakonstantinou, Mark Wilson","doi":"10.1037/met0000753","DOIUrl":null,"url":null,"abstract":"<p><p>Machine-assisted approaches for free-text analysis are rising in popularity, owing to a growing need to rapidly analyze large volumes of qualitative data. In both research and policy settings, these approaches have promise in providing timely insights into public perceptions and enabling policymakers to understand their community's needs. However, current approaches still require expert human interpretation-posing a financial and practical barrier for those outside of academia. For the first time, we propose and validate the Deep Computational Text Analyser (DECOTA)-a novel machine learning methodology that automatically analyzes large free-text data sets and outputs concise themes. Building on structural topic modeling approaches, we used two fine-tuned large language models and sentence transformers to automatically derive \"codes\" and their corresponding \"themes\", as in inductive thematic analysis. To fully automate the process, we designed and validated a novel algorithm to choose the optimal number of \"topics\" for the structural topic modeling. DECOTA outputs key codes and themes, their prevalence, and how prevalence varies across covariates such as age and gender. Each code is accompanied by three representative quotes. Four data sets previously analyzed using thematic analysis were triangulated with DECOTA's codes and themes. We found that DECOTA is approximately 378 times faster and 1,920 times cheaper than human coding and consistently yields codes in agreement with or complementary to human coding (averaging 91.6% for codes and 90% for themes). The implications for evidence-based policy development, public engagement with policymaking, and psychometric measure development are discussed. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000753","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Machine-assisted approaches for free-text analysis are rising in popularity, owing to a growing need to rapidly analyze large volumes of qualitative data. In both research and policy settings, these approaches have promise in providing timely insights into public perceptions and enabling policymakers to understand their community's needs. However, current approaches still require expert human interpretation-posing a financial and practical barrier for those outside of academia. For the first time, we propose and validate the Deep Computational Text Analyser (DECOTA)-a novel machine learning methodology that automatically analyzes large free-text data sets and outputs concise themes. Building on structural topic modeling approaches, we used two fine-tuned large language models and sentence transformers to automatically derive "codes" and their corresponding "themes", as in inductive thematic analysis. To fully automate the process, we designed and validated a novel algorithm to choose the optimal number of "topics" for the structural topic modeling. DECOTA outputs key codes and themes, their prevalence, and how prevalence varies across covariates such as age and gender. Each code is accompanied by three representative quotes. Four data sets previously analyzed using thematic analysis were triangulated with DECOTA's codes and themes. We found that DECOTA is approximately 378 times faster and 1,920 times cheaper than human coding and consistently yields codes in agreement with or complementary to human coding (averaging 91.6% for codes and 90% for themes). The implications for evidence-based policy development, public engagement with policymaking, and psychometric measure development are discussed. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

在定性研究中使用大型语言模型:深度计算文本分析器(DECOTA)。
由于快速分析大量定性数据的需求日益增长,用于自由文本分析的机器辅助方法越来越受欢迎。在研究和政策制定方面,这些方法有望及时洞察公众的看法,并使决策者能够了解其社区的需求。然而,目前的方法仍然需要专家的人工解释,这给学术界以外的人带来了经济和实践上的障碍。我们首次提出并验证了深度计算文本分析器(DECOTA)——一种新颖的机器学习方法,可以自动分析大型自由文本数据集并输出简洁的主题。在结构化主题建模方法的基础上,我们使用了两个微调的大型语言模型和句子转换器来自动导出“代码”及其对应的“主题”,就像归纳主题分析一样。为了使这一过程完全自动化,我们设计并验证了一种新的算法来选择结构主题建模的最佳“主题”数量。DECOTA输出关键代码和主题、它们的流行程度,以及流行程度在年龄和性别等协变量之间的变化情况。每个代码都有三个代表性的引号。先前使用主题分析分析的四个数据集与DECOTA的代码和主题进行了三角测量。我们发现DECOTA比人工编码快378倍,便宜1920倍,并且始终产生与人类编码一致或互补的代码(代码平均为91.6%,主题平均为90%)。讨论了基于证据的政策制定、公众参与政策制定和心理测量发展的含义。(PsycInfo Database Record (c) 2025 APA,版权所有)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Psychological methods
Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-
CiteScore
13.10
自引率
7.10%
发文量
159
期刊介绍: Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信