Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2024-07-24 DOI:10.1016/j.cmpb.2024.108356

Walter S Mathis , Sophia Zhao , Nicholas Pratt , Jeremy Weleff , Stefano De Paoli

{"title":"Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods?","authors":"Walter S Mathis , Sophia Zhao , Nicholas Pratt , Jeremy Weleff , Stefano De Paoli","doi":"10.1016/j.cmpb.2024.108356","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Large language models (LLMs) are generative artificial intelligence that have ignited much interest and discussion about their utility in clinical and research settings. Despite this interest there is sparse analysis of their use in qualitative thematic analysis comparing their current ability to that of human coding and analysis. In addition, there has been no published analysis of their use in real-world, protected health information.</p></div><div><h3>Objective</h3><p>Here we fill that gap in the literature by comparing an LLM to standard human thematic analysis in real-world, semi-structured interviews of both patients and clinicians within a psychiatric setting.</p></div><div><h3>Methods</h3><p>Using a 70 billion parameter open-source LLM running on local hardware and advanced prompt engineering techniques, we produced themes that summarized a full corpus of interviews in minutes. Subsequently we used three different evaluation methods for quantifying similarity between themes produced by the LLM and those produced by humans.</p></div><div><h3>Results</h3><p>These revealed similarities ranging from moderate to substantial (Jaccard similarity coefficients 0.44–0.69), which are promising preliminary results.</p></div><div><h3>Conclusion</h3><p>Our study demonstrates that open-source LLMs can effectively generate robust themes from qualitative data, achieving substantial similarity to human-generated themes. The validation of LLMs in thematic analysis, coupled with evaluation methodologies, highlights their potential to enhance and democratize qualitative research across diverse fields.</p></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"255 ","pages":"Article 108356"},"PeriodicalIF":4.9000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724003493","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Large language models (LLMs) are generative artificial intelligence that have ignited much interest and discussion about their utility in clinical and research settings. Despite this interest there is sparse analysis of their use in qualitative thematic analysis comparing their current ability to that of human coding and analysis. In addition, there has been no published analysis of their use in real-world, protected health information.

Objective

Here we fill that gap in the literature by comparing an LLM to standard human thematic analysis in real-world, semi-structured interviews of both patients and clinicians within a psychiatric setting.

Methods

Using a 70 billion parameter open-source LLM running on local hardware and advanced prompt engineering techniques, we produced themes that summarized a full corpus of interviews in minutes. Subsequently we used three different evaluation methods for quantifying similarity between themes produced by the LLM and those produced by humans.

Results

These revealed similarities ranging from moderate to substantial (Jaccard similarity coefficients 0.44–0.69), which are promising preliminary results.

Conclusion

Our study demonstrates that open-source LLMs can effectively generate robust themes from qualitative data, achieving substantial similarity to human-generated themes. The validation of LLMs in thematic analysis, coupled with evaluation methodologies, highlights their potential to enhance and democratize qualitative research across diverse fields.

查看原文本刊更多论文

使用开源大型语言模型对医疗保健定性访谈进行归纳主题分析：与传统方法相比有何优势？

背景：大型语言模型（LLMs）是一种生成式人工智能，在临床和研究领域的应用引起了广泛的兴趣和讨论。尽管人们对其兴趣浓厚，但对其在定性主题分析中的应用却鲜有分析，也没有将其目前的能力与人工编码和分析能力进行比较。目的：在此，我们通过比较 LLM 与标准人工主题分析在真实世界中对精神病患者和临床医生进行的半结构化访谈，填补了文献中的这一空白：利用在本地硬件上运行的 700 亿参数开源 LLM 和先进的提示工程技术，我们在几分钟内就生成了能概括整个访谈语料库的主题。随后，我们使用三种不同的评估方法来量化 LLM 生成的主题与人类生成的主题之间的相似性：结果：这些方法揭示了从中度到高度的相似性（Jaccard 相似系数为 0.44-0.69），这是令人鼓舞的初步结果：我们的研究表明，开源 LLMs 可以有效地从定性数据中生成稳健的主题，与人类生成的主题具有很大的相似性。在主题分析中对 LLMs 的验证，再加上评估方法，凸显了 LLMs 在不同领域加强定性研究并使之民主化的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.