Lois Player, Ryan Hughes, Kaloyan Mitev, Lorraine Whitmarsh, Christina Demski, Nicholas Nash, Trisevgeni Papakonstantinou, Mark Wilson
{"title":"The use of large language models for qualitative research: The Deep Computational Text Analyser (DECOTA).","authors":"Lois Player, Ryan Hughes, Kaloyan Mitev, Lorraine Whitmarsh, Christina Demski, Nicholas Nash, Trisevgeni Papakonstantinou, Mark Wilson","doi":"10.1037/met0000753","DOIUrl":null,"url":null,"abstract":"<p><p>Machine-assisted approaches for free-text analysis are rising in popularity, owing to a growing need to rapidly analyze large volumes of qualitative data. In both research and policy settings, these approaches have promise in providing timely insights into public perceptions and enabling policymakers to understand their community's needs. However, current approaches still require expert human interpretation-posing a financial and practical barrier for those outside of academia. For the first time, we propose and validate the Deep Computational Text Analyser (DECOTA)-a novel machine learning methodology that automatically analyzes large free-text data sets and outputs concise themes. Building on structural topic modeling approaches, we used two fine-tuned large language models and sentence transformers to automatically derive \"codes\" and their corresponding \"themes\", as in inductive thematic analysis. To fully automate the process, we designed and validated a novel algorithm to choose the optimal number of \"topics\" for the structural topic modeling. DECOTA outputs key codes and themes, their prevalence, and how prevalence varies across covariates such as age and gender. Each code is accompanied by three representative quotes. Four data sets previously analyzed using thematic analysis were triangulated with DECOTA's codes and themes. We found that DECOTA is approximately 378 times faster and 1,920 times cheaper than human coding and consistently yields codes in agreement with or complementary to human coding (averaging 91.6% for codes and 90% for themes). The implications for evidence-based policy development, public engagement with policymaking, and psychometric measure development are discussed. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000753","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Machine-assisted approaches for free-text analysis are rising in popularity, owing to a growing need to rapidly analyze large volumes of qualitative data. In both research and policy settings, these approaches have promise in providing timely insights into public perceptions and enabling policymakers to understand their community's needs. However, current approaches still require expert human interpretation-posing a financial and practical barrier for those outside of academia. For the first time, we propose and validate the Deep Computational Text Analyser (DECOTA)-a novel machine learning methodology that automatically analyzes large free-text data sets and outputs concise themes. Building on structural topic modeling approaches, we used two fine-tuned large language models and sentence transformers to automatically derive "codes" and their corresponding "themes", as in inductive thematic analysis. To fully automate the process, we designed and validated a novel algorithm to choose the optimal number of "topics" for the structural topic modeling. DECOTA outputs key codes and themes, their prevalence, and how prevalence varies across covariates such as age and gender. Each code is accompanied by three representative quotes. Four data sets previously analyzed using thematic analysis were triangulated with DECOTA's codes and themes. We found that DECOTA is approximately 378 times faster and 1,920 times cheaper than human coding and consistently yields codes in agreement with or complementary to human coding (averaging 91.6% for codes and 90% for themes). The implications for evidence-based policy development, public engagement with policymaking, and psychometric measure development are discussed. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
期刊介绍:
Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.