评估情绪分析模型：利用蒸馏器对COVID-19阶段的疫苗接种推文进行比较分析，以增强洞察力。

IF 1.9 Q2 MULTIDISCIPLINARY SCIENCES

MethodsX Pub Date : 2025-05-30 eCollection Date: 2025-06-01 DOI:10.1016/j.mex.2025.103407

Renuka Agrawal, Mehuli Majumder, Ishita Yadav, Nandini Taneja, Safa Hamdare, Preeti Hemnani

{"title":"评估情绪分析模型：利用蒸馏器对COVID-19阶段的疫苗接种推文进行比较分析，以增强洞察力。","authors":"Renuka Agrawal, Mehuli Majumder, Ishita Yadav, Nandini Taneja, Safa Hamdare, Preeti Hemnani","doi":"10.1016/j.mex.2025.103407","DOIUrl":null,"url":null,"abstract":"This study investigates public sentiment toward COVID-19 vaccinations by analyzing Twitter data using advanced machine learning (ML) and natural language processing (NLP) techniques. Recognizing social media as a valuable source for gauging public opinion during health crises, the research aims to inform policies on content moderation and misinformation control.•Comparative Analysis of Embedding Techniques and ML Models: The study evaluates two embedding techniques-TF-IDF and Word2Vec-across five ML models: LinearSVC, Random Forest, Gradient Boosting Machine (GBM), XGBoost, and AdaBoost.•The models were tested using two training-testing splits (70-30 and 80-20) to assess their performance on noisy, unlabeled, and imbalanced sentiment data.•Utilization of DistilBERT for Pseudo-Labeling: To enhance labeling accuracy, DistilBERT was employed for pseudo-labeling, capturing semantic nuances often missed by traditional ML techniques. This approach enabled more effective sentiment classification of tweets. The findings underscore the effectiveness of automated annotation, hybrid modeling, and embedding strategies in analyzing unstructured social media data. Such approaches provide valuable insights for public health applications, particularly in understanding vaccine hesitancy and shaping communication strategies. The study highlights the potential of integrating advanced NLP techniques to better comprehend and respond to public sentiments during pandemics or similar emergencies.","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"14 ","pages":"103407"},"PeriodicalIF":1.9000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12171565/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights.\",\"authors\":\"Renuka Agrawal, Mehuli Majumder, Ishita Yadav, Nandini Taneja, Safa Hamdare, Preeti Hemnani\",\"doi\":\"10.1016/j.mex.2025.103407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study investigates public sentiment toward COVID-19 vaccinations by analyzing Twitter data using advanced machine learning (ML) and natural language processing (NLP) techniques. Recognizing social media as a valuable source for gauging public opinion during health crises, the research aims to inform policies on content moderation and misinformation control.•Comparative Analysis of Embedding Techniques and ML Models: The study evaluates two embedding techniques-TF-IDF and Word2Vec-across five ML models: LinearSVC, Random Forest, Gradient Boosting Machine (GBM), XGBoost, and AdaBoost.•The models were tested using two training-testing splits (70-30 and 80-20) to assess their performance on noisy, unlabeled, and imbalanced sentiment data.•Utilization of DistilBERT for Pseudo-Labeling: To enhance labeling accuracy, DistilBERT was employed for pseudo-labeling, capturing semantic nuances often missed by traditional ML techniques. This approach enabled more effective sentiment classification of tweets. The findings underscore the effectiveness of automated annotation, hybrid modeling, and embedding strategies in analyzing unstructured social media data. Such approaches provide valuable insights for public health applications, particularly in understanding vaccine hesitancy and shaping communication strategies. The study highlights the potential of integrating advanced NLP techniques to better comprehend and respond to public sentiments during pandemics or similar emergencies.\",\"PeriodicalId\":18446,\"journal\":{\"name\":\"MethodsX\",\"volume\":\"14 \",\"pages\":\"103407\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12171565/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MethodsX\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.mex.2025.103407\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.mex.2025.103407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

本研究利用先进的机器学习（ML）和自然语言处理（NLP）技术分析Twitter数据，调查公众对COVID-19疫苗接种的情绪。认识到社交媒体是在健康危机期间衡量公众舆论的宝贵来源，该研究旨在为内容节制和错误信息控制的政策提供信息。•嵌入技术和机器学习模型的比较分析：该研究评估了两种嵌入技术- tf - idf和word2vec -跨五种机器学习模型：线性svc，随机森林，梯度增强机（GBM）， XGBoost和AdaBoost。•使用两个训练测试分割（70-30和80-20）对模型进行测试，以评估它们在嘈杂、未标记和不平衡情绪数据上的表现。•利用蒸馏酒进行伪标记：为了提高标记的准确性，蒸馏酒被用于伪标记，捕捉传统ML技术经常错过的语义细微差别。这种方法使推文的情感分类更加有效。研究结果强调了自动注释、混合建模和嵌入策略在分析非结构化社交媒体数据方面的有效性。这些方法为公共卫生应用提供了宝贵的见解，特别是在了解疫苗犹豫和制定传播战略方面。该研究强调了整合先进的自然语言处理技术，以便在大流行或类似紧急情况期间更好地理解和应对公众情绪的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights.

This study investigates public sentiment toward COVID-19 vaccinations by analyzing Twitter data using advanced machine learning (ML) and natural language processing (NLP) techniques. Recognizing social media as a valuable source for gauging public opinion during health crises, the research aims to inform policies on content moderation and misinformation control.•Comparative Analysis of Embedding Techniques and ML Models: The study evaluates two embedding techniques-TF-IDF and Word2Vec-across five ML models: LinearSVC, Random Forest, Gradient Boosting Machine (GBM), XGBoost, and AdaBoost.•The models were tested using two training-testing splits (70-30 and 80-20) to assess their performance on noisy, unlabeled, and imbalanced sentiment data.•Utilization of DistilBERT for Pseudo-Labeling: To enhance labeling accuracy, DistilBERT was employed for pseudo-labeling, capturing semantic nuances often missed by traditional ML techniques. This approach enabled more effective sentiment classification of tweets. The findings underscore the effectiveness of automated annotation, hybrid modeling, and embedding strategies in analyzing unstructured social media data. Such approaches provide valuable insights for public health applications, particularly in understanding vaccine hesitancy and shaping communication strategies. The study highlights the potential of integrating advanced NLP techniques to better comprehend and respond to public sentiments during pandemics or similar emergencies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊