Differential Analysis of Age, Gender, Race, Sentiment, and Emotion in Substance Use Discourse on Twitter During the COVID-19 Pandemic: A Natural Language Processing Approach.

IF 2.3 Q1 HEALTH CARE SCIENCES & SERVICES
JMIR infodemiology Pub Date : 2025-07-28 DOI:10.2196/67333
Julina Maharjan, Ruoming Jin, Jennifer King, Jianfeng Zhu, Deric Kenne
{"title":"Differential Analysis of Age, Gender, Race, Sentiment, and Emotion in Substance Use Discourse on Twitter During the COVID-19 Pandemic: A Natural Language Processing Approach.","authors":"Julina Maharjan, Ruoming Jin, Jennifer King, Jianfeng Zhu, Deric Kenne","doi":"10.2196/67333","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>User demographics are often hidden in social media data due to privacy concerns. However, demographic information on substance use (SU) can provide valuable insights, allowing public health policy makers to focus on specific cohorts and develop efficient prevention strategies, especially during global crises such as the COVID-19 pandemic.</p><p><strong>Objective: </strong>This study aimed to analyze SU trends at the user level across different demographic dimensions, such as age, gender, race, and ethnicity, with a focus on the COVID-19 pandemic. The study also establishes a baseline for SU trends using social media data.</p><p><strong>Methods: </strong>The study was conducted using large-scale English-language data from Twitter (now known as X) over a 3-year period (2019, 2020, and 2021), comprising 1.13 billion posts. Following preprocessing, the SU posts were identified using our custom-trained deep learning model (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach [RoBERTa]), which resulted in the identification of 9 million SU posts. Then, demographic attributes, such as user type, age, gender, race, and ethnicity, as well as sentiments and emotions associated with each post, were extracted via a collection of natural language processing modules. Finally, various qualitative analyses were performed to obtain insight into user behaviors based on demographics.</p><p><strong>Results: </strong>The highest level of user participation in SU discussions was observed in 2020, with a 22.18% increase compared to 2019 and a 25.24% increase compared to 2021. Throughout the study period, male users and teenagers increasingly dominated the SU discussions across all substance types. During the COVID-19 pandemic, user participation in prescription medication discussions was notably higher among female users compared to other substance types. In addition, alcohol use increased by 80% within 2 weeks after the global pandemic declaration in 2020.</p><p><strong>Conclusions: </strong>This study presents a large-scale, fine-grained analysis of SU on social media data, examining trends by age, gender, race, and ethnicity before, during, and after the COVID-19 pandemic. Our findings, contextualized with sociocultural and pandemic-specific factors, provide actionable insights for targeted public health interventions. This study establishes social media data (powered with artificial intelligence and natural language processing tools) as a valuable platform for real-time SU surveillance and prevention during crises.</p>","PeriodicalId":73554,"journal":{"name":"JMIR infodemiology","volume":"5 ","pages":"e67333"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12340460/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR infodemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/67333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: User demographics are often hidden in social media data due to privacy concerns. However, demographic information on substance use (SU) can provide valuable insights, allowing public health policy makers to focus on specific cohorts and develop efficient prevention strategies, especially during global crises such as the COVID-19 pandemic.

Objective: This study aimed to analyze SU trends at the user level across different demographic dimensions, such as age, gender, race, and ethnicity, with a focus on the COVID-19 pandemic. The study also establishes a baseline for SU trends using social media data.

Methods: The study was conducted using large-scale English-language data from Twitter (now known as X) over a 3-year period (2019, 2020, and 2021), comprising 1.13 billion posts. Following preprocessing, the SU posts were identified using our custom-trained deep learning model (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach [RoBERTa]), which resulted in the identification of 9 million SU posts. Then, demographic attributes, such as user type, age, gender, race, and ethnicity, as well as sentiments and emotions associated with each post, were extracted via a collection of natural language processing modules. Finally, various qualitative analyses were performed to obtain insight into user behaviors based on demographics.

Results: The highest level of user participation in SU discussions was observed in 2020, with a 22.18% increase compared to 2019 and a 25.24% increase compared to 2021. Throughout the study period, male users and teenagers increasingly dominated the SU discussions across all substance types. During the COVID-19 pandemic, user participation in prescription medication discussions was notably higher among female users compared to other substance types. In addition, alcohol use increased by 80% within 2 weeks after the global pandemic declaration in 2020.

Conclusions: This study presents a large-scale, fine-grained analysis of SU on social media data, examining trends by age, gender, race, and ethnicity before, during, and after the COVID-19 pandemic. Our findings, contextualized with sociocultural and pandemic-specific factors, provide actionable insights for targeted public health interventions. This study establishes social media data (powered with artificial intelligence and natural language processing tools) as a valuable platform for real-time SU surveillance and prevention during crises.

COVID-19大流行期间推特上物质使用话语中的年龄、性别、种族、情绪和情感差异分析:一种自然语言处理方法。
背景:出于隐私考虑,用户统计数据通常隐藏在社交媒体数据中。然而,关于物质使用的人口统计信息可以提供有价值的见解,使公共卫生政策制定者能够专注于特定人群并制定有效的预防战略,特别是在2019冠状病毒病大流行等全球危机期间。目的:本研究旨在分析用户层面不同人口维度(如年龄、性别、种族和民族)的SU趋势,并以COVID-19大流行为重点。该研究还利用社交媒体数据建立了SU趋势的基线。方法:该研究使用了来自Twitter(现在称为X)的大规模英语数据,历时3年(2019年、2020年和2021年),包括11.3亿条帖子。预处理后,使用我们定制训练的深度学习模型(鲁棒优化双向编码器表示From Transformers Pretraining Approach [RoBERTa])识别SU帖子,结果识别了900万个SU帖子。然后,通过一组自然语言处理模块提取人口统计属性,如用户类型、年龄、性别、种族和民族,以及与每篇文章相关的情绪和情绪。最后,我们进行了各种定性分析,以深入了解基于人口统计数据的用户行为。结果:用户参与SU讨论的最高水平出现在2020年,比2019年增长22.18%,比2021年增长25.24%。在整个研究期间,男性使用者和青少年越来越多地主导了所有物质类型的SU讨论。在2019冠状病毒病大流行期间,与其他物质类型相比,女性使用者对处方药讨论的参与度明显更高。此外,在2020年宣布全球大流行后的两周内,酒精使用量增加了80%。结论:本研究对社交媒体数据上的SU进行了大规模、细粒度的分析,检查了COVID-19大流行之前、期间和之后按年龄、性别、种族和民族划分的趋势。我们的研究结果与社会文化和流行病特定因素相结合,为有针对性的公共卫生干预提供了可行的见解。本研究将社交媒体数据(由人工智能和自然语言处理工具提供支持)建立为危机期间实时监控和预防SU的有价值平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信