Julina Maharjan, Ruoming Jin, Jennifer King, Jianfeng Zhu, Deric Kenne
{"title":"COVID-19大流行期间推特上物质使用话语中的年龄、性别、种族、情绪和情感差异分析:一种自然语言处理方法。","authors":"Julina Maharjan, Ruoming Jin, Jennifer King, Jianfeng Zhu, Deric Kenne","doi":"10.2196/67333","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>User demographics are often hidden in social media data due to privacy concerns. However, demographic information on substance use (SU) can provide valuable insights, allowing public health policy makers to focus on specific cohorts and develop efficient prevention strategies, especially during global crises such as the COVID-19 pandemic.</p><p><strong>Objective: </strong>This study aimed to analyze SU trends at the user level across different demographic dimensions, such as age, gender, race, and ethnicity, with a focus on the COVID-19 pandemic. The study also establishes a baseline for SU trends using social media data.</p><p><strong>Methods: </strong>The study was conducted using large-scale English-language data from Twitter (now known as X) over a 3-year period (2019, 2020, and 2021), comprising 1.13 billion posts. Following preprocessing, the SU posts were identified using our custom-trained deep learning model (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach [RoBERTa]), which resulted in the identification of 9 million SU posts. Then, demographic attributes, such as user type, age, gender, race, and ethnicity, as well as sentiments and emotions associated with each post, were extracted via a collection of natural language processing modules. Finally, various qualitative analyses were performed to obtain insight into user behaviors based on demographics.</p><p><strong>Results: </strong>The highest level of user participation in SU discussions was observed in 2020, with a 22.18% increase compared to 2019 and a 25.24% increase compared to 2021. Throughout the study period, male users and teenagers increasingly dominated the SU discussions across all substance types. During the COVID-19 pandemic, user participation in prescription medication discussions was notably higher among female users compared to other substance types. In addition, alcohol use increased by 80% within 2 weeks after the global pandemic declaration in 2020.</p><p><strong>Conclusions: </strong>This study presents a large-scale, fine-grained analysis of SU on social media data, examining trends by age, gender, race, and ethnicity before, during, and after the COVID-19 pandemic. Our findings, contextualized with sociocultural and pandemic-specific factors, provide actionable insights for targeted public health interventions. This study establishes social media data (powered with artificial intelligence and natural language processing tools) as a valuable platform for real-time SU surveillance and prevention during crises.</p>","PeriodicalId":73554,"journal":{"name":"JMIR infodemiology","volume":"5 ","pages":"e67333"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12340460/pdf/","citationCount":"0","resultStr":"{\"title\":\"Differential Analysis of Age, Gender, Race, Sentiment, and Emotion in Substance Use Discourse on Twitter During the COVID-19 Pandemic: A Natural Language Processing Approach.\",\"authors\":\"Julina Maharjan, Ruoming Jin, Jennifer King, Jianfeng Zhu, Deric Kenne\",\"doi\":\"10.2196/67333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>User demographics are often hidden in social media data due to privacy concerns. However, demographic information on substance use (SU) can provide valuable insights, allowing public health policy makers to focus on specific cohorts and develop efficient prevention strategies, especially during global crises such as the COVID-19 pandemic.</p><p><strong>Objective: </strong>This study aimed to analyze SU trends at the user level across different demographic dimensions, such as age, gender, race, and ethnicity, with a focus on the COVID-19 pandemic. The study also establishes a baseline for SU trends using social media data.</p><p><strong>Methods: </strong>The study was conducted using large-scale English-language data from Twitter (now known as X) over a 3-year period (2019, 2020, and 2021), comprising 1.13 billion posts. Following preprocessing, the SU posts were identified using our custom-trained deep learning model (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach [RoBERTa]), which resulted in the identification of 9 million SU posts. Then, demographic attributes, such as user type, age, gender, race, and ethnicity, as well as sentiments and emotions associated with each post, were extracted via a collection of natural language processing modules. Finally, various qualitative analyses were performed to obtain insight into user behaviors based on demographics.</p><p><strong>Results: </strong>The highest level of user participation in SU discussions was observed in 2020, with a 22.18% increase compared to 2019 and a 25.24% increase compared to 2021. Throughout the study period, male users and teenagers increasingly dominated the SU discussions across all substance types. During the COVID-19 pandemic, user participation in prescription medication discussions was notably higher among female users compared to other substance types. In addition, alcohol use increased by 80% within 2 weeks after the global pandemic declaration in 2020.</p><p><strong>Conclusions: </strong>This study presents a large-scale, fine-grained analysis of SU on social media data, examining trends by age, gender, race, and ethnicity before, during, and after the COVID-19 pandemic. Our findings, contextualized with sociocultural and pandemic-specific factors, provide actionable insights for targeted public health interventions. This study establishes social media data (powered with artificial intelligence and natural language processing tools) as a valuable platform for real-time SU surveillance and prevention during crises.</p>\",\"PeriodicalId\":73554,\"journal\":{\"name\":\"JMIR infodemiology\",\"volume\":\"5 \",\"pages\":\"e67333\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12340460/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR infodemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/67333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR infodemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/67333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Differential Analysis of Age, Gender, Race, Sentiment, and Emotion in Substance Use Discourse on Twitter During the COVID-19 Pandemic: A Natural Language Processing Approach.
Background: User demographics are often hidden in social media data due to privacy concerns. However, demographic information on substance use (SU) can provide valuable insights, allowing public health policy makers to focus on specific cohorts and develop efficient prevention strategies, especially during global crises such as the COVID-19 pandemic.
Objective: This study aimed to analyze SU trends at the user level across different demographic dimensions, such as age, gender, race, and ethnicity, with a focus on the COVID-19 pandemic. The study also establishes a baseline for SU trends using social media data.
Methods: The study was conducted using large-scale English-language data from Twitter (now known as X) over a 3-year period (2019, 2020, and 2021), comprising 1.13 billion posts. Following preprocessing, the SU posts were identified using our custom-trained deep learning model (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach [RoBERTa]), which resulted in the identification of 9 million SU posts. Then, demographic attributes, such as user type, age, gender, race, and ethnicity, as well as sentiments and emotions associated with each post, were extracted via a collection of natural language processing modules. Finally, various qualitative analyses were performed to obtain insight into user behaviors based on demographics.
Results: The highest level of user participation in SU discussions was observed in 2020, with a 22.18% increase compared to 2019 and a 25.24% increase compared to 2021. Throughout the study period, male users and teenagers increasingly dominated the SU discussions across all substance types. During the COVID-19 pandemic, user participation in prescription medication discussions was notably higher among female users compared to other substance types. In addition, alcohol use increased by 80% within 2 weeks after the global pandemic declaration in 2020.
Conclusions: This study presents a large-scale, fine-grained analysis of SU on social media data, examining trends by age, gender, race, and ethnicity before, during, and after the COVID-19 pandemic. Our findings, contextualized with sociocultural and pandemic-specific factors, provide actionable insights for targeted public health interventions. This study establishes social media data (powered with artificial intelligence and natural language processing tools) as a valuable platform for real-time SU surveillance and prevention during crises.