Characterizing Public Sentiments and Drug Interactions during COVID-19: A Pretrained Language Model and Network Analysis of Social Media Discourse

medRxiv Pub Date : 2024-06-06 DOI:10.1101/2024.06.06.24308537

Wanxin Li, Yining Hua, Peilin Zhou, Li Zhou, Xin Xu, Jie Yang

{"title":"Characterizing Public Sentiments and Drug Interactions during COVID-19: A Pretrained Language Model and Network Analysis of Social Media Discourse","authors":"Wanxin Li, Yining Hua, Peilin Zhou, Li Zhou, Xin Xu, Jie Yang","doi":"10.1101/2024.06.06.24308537","DOIUrl":null,"url":null,"abstract":"Objective: Harnessing drug-related data posted on social media in real time can offer insights into how the pandemic impacts drug use and monitor misinformation. This study developed a natural language processing (NLP) pipeline tailored for the analysis of social media discourse on COVID-19 related drugs. Methods: This study constructed a full pipeline for COVID-19 related drug tweet analysis, utilizing pre-trained language model-based NLP techniques as the backbone. This pipeline is architecturally composed of four core modules: named entity recognition (NER) and normalization to identify medical entities from relevant tweets and standardize them to uniform medication names, target sentiment analysis (TSA) to reveal sentiment polarities associated with the entities, topic modeling to understand underlying themes discussed by the population, and drug network analysis to potential adverse drug reactions (ADR) and drug-drug interactions (DDI). The pipeline was deployed to analyze tweets related to COVID-19 and drug therapies between February 1, 2020, and April 30, 2022. Results: From a dataset comprising 2,124,757 relevant tweets sourced from 1,800,372 unique users, our NER model identified the top five most-discussed drugs: Ivermectin, Hydroxychloroquine, Remdesivir, Zinc, and Vitamin D. Sentiment and topic analysis revealed that public perception was predominantly shaped by celebrity endorsements, media hotspots, and governmental directives rather than empirical evidence of drug efficacy. Co-occurrence matrices and complex network analysis further identified emerging patterns of DDI and ADR that could be critical for public health surveillance like better safeguarding public safety in medicines use. Conclusion: This study evidences that an NLP-based pipeline can be a robust tool for large-scale public health monitoring and can offer valuable supplementary data for traditional epidemiological studies concerning DDI and ADR. The framework presented here aspires to serve as a cornerstone for future social media-based public health analytics.","PeriodicalId":506788,"journal":{"name":"medRxiv","volume":"50 11","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.06.24308537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Harnessing drug-related data posted on social media in real time can offer insights into how the pandemic impacts drug use and monitor misinformation. This study developed a natural language processing (NLP) pipeline tailored for the analysis of social media discourse on COVID-19 related drugs. Methods: This study constructed a full pipeline for COVID-19 related drug tweet analysis, utilizing pre-trained language model-based NLP techniques as the backbone. This pipeline is architecturally composed of four core modules: named entity recognition (NER) and normalization to identify medical entities from relevant tweets and standardize them to uniform medication names, target sentiment analysis (TSA) to reveal sentiment polarities associated with the entities, topic modeling to understand underlying themes discussed by the population, and drug network analysis to potential adverse drug reactions (ADR) and drug-drug interactions (DDI). The pipeline was deployed to analyze tweets related to COVID-19 and drug therapies between February 1, 2020, and April 30, 2022. Results: From a dataset comprising 2,124,757 relevant tweets sourced from 1,800,372 unique users, our NER model identified the top five most-discussed drugs: Ivermectin, Hydroxychloroquine, Remdesivir, Zinc, and Vitamin D. Sentiment and topic analysis revealed that public perception was predominantly shaped by celebrity endorsements, media hotspots, and governmental directives rather than empirical evidence of drug efficacy. Co-occurrence matrices and complex network analysis further identified emerging patterns of DDI and ADR that could be critical for public health surveillance like better safeguarding public safety in medicines use. Conclusion: This study evidences that an NLP-based pipeline can be a robust tool for large-scale public health monitoring and can offer valuable supplementary data for traditional epidemiological studies concerning DDI and ADR. The framework presented here aspires to serve as a cornerstone for future social media-based public health analytics.

查看原文本刊更多论文

描述 COVID-19 期间的公众情绪和药物互动：社交媒体话语的预训练语言模型和网络分析

目标：利用社交媒体上发布的与毒品有关的实时数据，可以深入了解毒品流行对毒品使用的影响并监控错误信息。本研究开发了一种自然语言处理 (NLP) 管道，专门用于分析 COVID-19 相关药物的社交媒体言论。方法：本研究利用预先训练好的基于语言模型的 NLP 技术作为骨干，为 COVID-19 相关药物的推文分析构建了一个完整的管道。该管道在结构上由四个核心模块组成：命名实体识别（NER）和规范化，用于从相关推文中识别医学实体并将其标准化为统一的药物名称；目标情感分析（TSA），用于揭示与实体相关的情感极性；主题建模，用于了解人群讨论的基本主题；药物网络分析，用于分析潜在的药物不良反应（ADR）和药物间相互作用（DDI）。该管道用于分析 2020 年 2 月 1 日至 2022 年 4 月 30 日期间与 COVID-19 和药物疗法相关的推文。结果我们的NER模型从来自1,800,372名独立用户的2,124,757条相关推文中识别出了讨论最多的五种药物：情感和话题分析显示，公众的看法主要受名人代言、媒体热点和政府指令的影响，而非药物疗效的经验证据。共现矩阵和复杂网络分析进一步确定了新出现的 DDI 和 ADR 模式，这些模式对公共卫生监测至关重要，可更好地保障公众用药安全。结论本研究证明，基于 NLP 的管道可以成为大规模公共卫生监测的有力工具，并能为有关 DDI 和 ADR 的传统流行病学研究提供有价值的补充数据。本文介绍的框架有望成为未来基于社交媒体的公共卫生分析的基石。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv

自引率

0.00%

发文量