Public Perceptions and Barriers to Tuberculosis Treatment in Korea: A Large Language Model-Based Analysis of Naver Knowledge-iN Data from 2002 to 2024.

IF 2.1 Q3 MEDICAL INFORMATICS

Healthcare Informatics Research Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI:10.4258/hir.2025.31.3.263

Hyewon Park, Siho Kim, Gaeun Kim, Seunghyeok Chang, Jae-Gook Shin, Sangzin Ahn

{"title":"Public Perceptions and Barriers to Tuberculosis Treatment in Korea: A Large Language Model-Based Analysis of Naver Knowledge-iN Data from 2002 to 2024.","authors":"Hyewon Park, Siho Kim, Gaeun Kim, Seunghyeok Chang, Jae-Gook Shin, Sangzin Ahn","doi":"10.4258/hir.2025.31.3.263","DOIUrl":null,"url":null,"abstract":"Objectives: This study was conducted to investigate public perceptions and concerns surrounding tuberculosis (TB) treatment in Korea through an analysis of online queries about antitubercular medications. Additionally, it evaluated the effectiveness of large language models (LLMs) as analytical tools for processing unstructured healthcare data.Methods: Using LLMs, this study analyzed 44,174 questions that mentioned TB from Naver Knowledge-iN (2002-2024). Questions referencing antitubercular medications were extracted and thematically categorized. Side effects were analyzed through parallel approaches examining general and medication-specific effects. Questions about infectivity and social implications were further analyzed using text embedding, dimensionality reduction, and clustering. The performance of LLMs was evaluated against human researchers and traditional methods.Results: Among questions mentioning specific medications (n = 919), rifampin (31.8%) and isoniazid (31.6%) were most frequently referenced. Of the 10,044 questions regarding antitubercular medication, management challenges represented the largest category (44.8%). Analysis of infectivity and social implications (n = 583) revealed previously unidentified concerns about blood donation and immigration eligibility. Employment-related concerns constituted the largest distinct subgroup (20.6%). Hepatotoxicity, dermatosis, and vomiting were the most frequently reported side effects. LLMs outperformed keyword matching in data processing and offered cost advantages over human analysis, with finetuning further reducing processing costs.Conclusions: This study produced novel insights into public concerns regarding TB treatment and demonstrated the effectiveness of combining social media platform data with LLM-based analysis, providing a systematic framework for future healthcare research using unstructured public data and LLMs.","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"263-273"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370417/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2025.31.3.263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This study was conducted to investigate public perceptions and concerns surrounding tuberculosis (TB) treatment in Korea through an analysis of online queries about antitubercular medications. Additionally, it evaluated the effectiveness of large language models (LLMs) as analytical tools for processing unstructured healthcare data.

Methods: Using LLMs, this study analyzed 44,174 questions that mentioned TB from Naver Knowledge-iN (2002-2024). Questions referencing antitubercular medications were extracted and thematically categorized. Side effects were analyzed through parallel approaches examining general and medication-specific effects. Questions about infectivity and social implications were further analyzed using text embedding, dimensionality reduction, and clustering. The performance of LLMs was evaluated against human researchers and traditional methods.

Results: Among questions mentioning specific medications (n = 919), rifampin (31.8%) and isoniazid (31.6%) were most frequently referenced. Of the 10,044 questions regarding antitubercular medication, management challenges represented the largest category (44.8%). Analysis of infectivity and social implications (n = 583) revealed previously unidentified concerns about blood donation and immigration eligibility. Employment-related concerns constituted the largest distinct subgroup (20.6%). Hepatotoxicity, dermatosis, and vomiting were the most frequently reported side effects. LLMs outperformed keyword matching in data processing and offered cost advantages over human analysis, with finetuning further reducing processing costs.

Conclusions: This study produced novel insights into public concerns regarding TB treatment and demonstrated the effectiveness of combining social media platform data with LLM-based analysis, providing a systematic framework for future healthcare research using unstructured public data and LLMs.

Abstract Image

查看原文本刊更多论文

韩国公众对结核病治疗的认知和障碍：2002年至2024年Naver Knowledge-iN数据的大型语言模型分析。

目的：本研究旨在通过分析网上关于抗结核药物的查询，调查公众对韩国结核病治疗的看法和关注。此外，它还评估了大型语言模型（llm）作为处理非结构化医疗保健数据的分析工具的有效性。方法：本研究使用LLMs分析了Naver Knowledge-iN（2002-2024）中涉及TB的44174个问题。提取有关抗结核药物的问题并按主题分类。通过检查一般和药物特异性效应的平行方法分析副作用。使用文本嵌入、降维和聚类进一步分析有关传染性和社会影响的问题。对比人类研究者和传统方法对llm的性能进行了评估。结果：在涉及特异性药物的问题中（n = 919），利福平（31.8%）和异烟肼（31.6%）被提及最多。在10044个关于抗结核药物的问题中，管理方面的挑战占了最大的类别（44.8%）。传染性和社会影响分析（n = 583）揭示了之前未被确认的献血和移民资格问题。与就业相关的担忧构成了最大的不同亚组（20.6%）。肝毒性、皮肤病和呕吐是最常见的副作用。llm在数据处理方面优于关键字匹配，并且比人工分析具有成本优势，通过微调进一步降低了处理成本。结论：本研究对公众对结核病治疗的关注产生了新的见解，并证明了将社交媒体平台数据与基于法学硕士的分析相结合的有效性，为未来使用非结构化公共数据和法学硕士的医疗保健研究提供了一个系统框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Healthcare Informatics Research MEDICAL INFORMATICS-

CiteScore

4.90

自引率

6.90%

发文量