Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

IF 3.5 Q1 HEALTH CARE SCIENCES & SERVICES
JMIR infodemiology Pub Date : 2024-08-29 DOI:10.2196/59641
Michael S Deiner, Vlad Honcharov, Jiawei Li, Tim K Mackey, Travis C Porco, Urmimala Sarkar
{"title":"Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.","authors":"Michael S Deiner, Vlad Honcharov, Jiawei Li, Tim K Mackey, Travis C Porco, Urmimala Sarkar","doi":"10.2196/59641","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Manually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes.</p><p><strong>Objective: </strong>We aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts?</p><p><strong>Methods: </strong>We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM.</p><p><strong>Results: </strong>The LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant.</p><p><strong>Conclusions: </strong>LLMs can effectively and efficiently process large social media-based health-related data sets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same data. There is vast potential, once better validated, for automated LLM-based real-time social listening for common and rare health conditions, informing public health understanding of the public's interests and concerns and determining the public's ideas to address them.</p>","PeriodicalId":73554,"journal":{"name":"JMIR infodemiology","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393503/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR infodemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/59641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Manually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes.

Objective: We aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts?

Methods: We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM.

Results: The LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant.

Conclusions: LLMs can effectively and efficiently process large social media-based health-related data sets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same data. There is vast potential, once better validated, for automated LLM-based real-time social listening for common and rare health conditions, informing public health understanding of the public's interests and concerns and determining the public's ideas to address them.

大型语言模型可通过单个提示对社交媒体语料库进行归纳式主题分析:人类验证研究。
背景:通过手动分析社交媒体中与公共卫生相关的内容,可以深入了解个人的信仰、态度和行为,揭示趋势和模式,为公众理解、政策决策、有针对性的干预措施和传播策略提供信息。遗憾的是,由于需要训练有素的人类主题专家花费大量的时间和精力,因此广泛的人工社交媒体监听并不可行。生成式大语言模型(LLM)有可能概括和解释大量文本,但目前还不清楚 LLM 在多大程度上能从大量社交媒体帖子中捕捉到与健康有关的微妙含义,并合理地报告与健康有关的主题:我们旨在通过尝试回答以下问题,评估使用 LLMs 对大量社交媒体帖子内容进行主题模型选择或归纳主题分析的可行性:根据主题专家的判断,LLM 能否像人类在之前的人工研究中那样有效地进行主题模型选择和归纳主题分析,或者至少是合理地进行主题模型选择和归纳主题分析?我们提出了相同的研究问题,并使用了相同的社交媒体内容集来进行相关主题的 LLM 选择和 LLM 主题分析,这与已发表的疫苗修辞研究中的人工操作相同。我们将该研究的结果作为本次 LLM 实验的背景,将之前人工分析的结果与 3 个 LLM 的分析结果进行比较:GPT4-32K、Claude-instant-100K 和 Claude-2-100K。我们还评估了多个 LLM 是否具有同等能力,并评估了每个 LLM 重复分析的一致性:结果:LLMs 通常对人类之前选择的最相关主题给予较高的排名。我们拒绝接受零假设(PC结论:LLM 可以有效、高效地处理基于社交媒体的大型健康相关数据集。LLM 可以从这些数据中提取人类主题专家认为合理的主题。但是,我们无法证明,我们测试的 LLM 可以从相同的数据中持续提取相同的主题,从而复制人类主题专家的分析深度。一旦得到更好的验证,基于 LLM 的自动实时社会聆听将大有可为,它可以聆听常见和罕见的健康状况,帮助公共卫生机构了解公众的兴趣和关注点,并确定公众解决这些问题的想法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信