用ElecBERT语言模型改进Twitter基于选举对话的情感分析

Asif Khan, Huaping Zhang, Nada Boudjellal, Arshad Ahmad, Maqbool Khan
{"title":"用ElecBERT语言模型改进Twitter基于选举对话的情感分析","authors":"Asif Khan, Huaping Zhang, Nada Boudjellal, Arshad Ahmad, Maqbool Khan","doi":"10.32604/cmc.2023.041520","DOIUrl":null,"url":null,"abstract":"Sentiment analysis plays a vital role in understanding public opinions and sentiments toward various topics. In recent years, the rise of social media platforms (SMPs) has provided a rich source of data for analyzing public opinions, particularly in the context of election-related conversations. Nevertheless, sentiment analysis of election-related tweets presents unique challenges due to the complex language used, including figurative expressions, sarcasm, and the spread of misinformation. To address these challenges, this paper proposes Election-focused Bidirectional Encoder Representations from Transformers (ElecBERT), a new model for sentiment analysis in the context of election-related tweets. Election-related tweets pose unique challenges for sentiment analysis due to their complex language, sarcasm, and misinformation. ElecBERT is based on the Bidirectional Encoder Representations from Transformers (BERT) language model and is fine-tuned on two datasets: Election-Related Sentiment-Annotated Tweets (ElecSent)-Multi-Languages, containing 5.31 million labeled tweets in multiple languages, and ElecSent-English, containing 4.75 million labeled tweets in English. The model outperforms other machine learning models such as Support Vector Machines (SVM), Naïve Bayes (NB), and eXtreme Gradient Boosting (XGBoost), with an accuracy of 0.9905 and F1-score of 0.9816 on ElecSent-Multi-Languages, and an accuracy of 0.9930 and F1-score of 0.9899 on ElecSent-English. The performance of different models was compared using the 2020 United States (US) Presidential Election as a case study. The ElecBERT-English and ElecBERT-Multi-Languages models outperformed BERTweet, with the ElecBERT-English model achieving a Mean Absolute Error (MAE) of 6.13. This paper presents a valuable contribution to sentiment analysis in the context of election-related tweets, with potential applications in political analysis, social media management, and policymaking.","PeriodicalId":93535,"journal":{"name":"Computers, materials & continua","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Sentiment Analysis in Election-Based Conversations on Twitter with ElecBERT Language Model\",\"authors\":\"Asif Khan, Huaping Zhang, Nada Boudjellal, Arshad Ahmad, Maqbool Khan\",\"doi\":\"10.32604/cmc.2023.041520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis plays a vital role in understanding public opinions and sentiments toward various topics. In recent years, the rise of social media platforms (SMPs) has provided a rich source of data for analyzing public opinions, particularly in the context of election-related conversations. Nevertheless, sentiment analysis of election-related tweets presents unique challenges due to the complex language used, including figurative expressions, sarcasm, and the spread of misinformation. To address these challenges, this paper proposes Election-focused Bidirectional Encoder Representations from Transformers (ElecBERT), a new model for sentiment analysis in the context of election-related tweets. Election-related tweets pose unique challenges for sentiment analysis due to their complex language, sarcasm, and misinformation. ElecBERT is based on the Bidirectional Encoder Representations from Transformers (BERT) language model and is fine-tuned on two datasets: Election-Related Sentiment-Annotated Tweets (ElecSent)-Multi-Languages, containing 5.31 million labeled tweets in multiple languages, and ElecSent-English, containing 4.75 million labeled tweets in English. The model outperforms other machine learning models such as Support Vector Machines (SVM), Naïve Bayes (NB), and eXtreme Gradient Boosting (XGBoost), with an accuracy of 0.9905 and F1-score of 0.9816 on ElecSent-Multi-Languages, and an accuracy of 0.9930 and F1-score of 0.9899 on ElecSent-English. The performance of different models was compared using the 2020 United States (US) Presidential Election as a case study. The ElecBERT-English and ElecBERT-Multi-Languages models outperformed BERTweet, with the ElecBERT-English model achieving a Mean Absolute Error (MAE) of 6.13. This paper presents a valuable contribution to sentiment analysis in the context of election-related tweets, with potential applications in political analysis, social media management, and policymaking.\",\"PeriodicalId\":93535,\"journal\":{\"name\":\"Computers, materials & continua\",\"volume\":\"2017 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers, materials & continua\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32604/cmc.2023.041520\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers, materials & continua","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32604/cmc.2023.041520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

情感分析在了解公众对各种话题的意见和情绪方面起着至关重要的作用。近年来,社交媒体平台(SMPs)的兴起为分析民意提供了丰富的数据来源,特别是在与选举相关的对话背景下。然而,由于使用的语言复杂,包括比喻表达、讽刺和错误信息的传播,对选举相关推文的情感分析面临着独特的挑战。为了解决这些挑战,本文提出了以选举为中心的双向编码器表示(ElecBERT),这是一种新的模型,用于在选举相关推文的背景下进行情感分析。与选举相关的推文由于其复杂的语言、讽刺和错误信息,给情绪分析带来了独特的挑战。ElecBERT基于双向编码器表示(BERT)语言模型,并对两个数据集进行了微调:与选举相关的情感注释推文(ElecSent)-多语言,包含531万条多种语言的标记推文,以及ElecSent-English,包含475万条英语标记推文。该模型优于其他机器学习模型,如支持向量机(SVM)、Naïve贝叶斯(NB)和极限梯度提升(XGBoost),在ElecSent-Multi-Languages上的准确率为0.9905,F1-score为0.9816,在ElecSent-English上的准确率为0.9930,F1-score为0.9899。以2020年美国总统大选为例,比较了不同模型的性能。ElecBERT-English和ElecBERT-Multi-Languages模型的表现优于BERTweet,其中ElecBERT-English模型的平均绝对误差(MAE)为6.13。本文对选举相关推文背景下的情绪分析做出了有价值的贡献,在政治分析、社交媒体管理和政策制定方面具有潜在的应用前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving Sentiment Analysis in Election-Based Conversations on Twitter with ElecBERT Language Model
Sentiment analysis plays a vital role in understanding public opinions and sentiments toward various topics. In recent years, the rise of social media platforms (SMPs) has provided a rich source of data for analyzing public opinions, particularly in the context of election-related conversations. Nevertheless, sentiment analysis of election-related tweets presents unique challenges due to the complex language used, including figurative expressions, sarcasm, and the spread of misinformation. To address these challenges, this paper proposes Election-focused Bidirectional Encoder Representations from Transformers (ElecBERT), a new model for sentiment analysis in the context of election-related tweets. Election-related tweets pose unique challenges for sentiment analysis due to their complex language, sarcasm, and misinformation. ElecBERT is based on the Bidirectional Encoder Representations from Transformers (BERT) language model and is fine-tuned on two datasets: Election-Related Sentiment-Annotated Tweets (ElecSent)-Multi-Languages, containing 5.31 million labeled tweets in multiple languages, and ElecSent-English, containing 4.75 million labeled tweets in English. The model outperforms other machine learning models such as Support Vector Machines (SVM), Naïve Bayes (NB), and eXtreme Gradient Boosting (XGBoost), with an accuracy of 0.9905 and F1-score of 0.9816 on ElecSent-Multi-Languages, and an accuracy of 0.9930 and F1-score of 0.9899 on ElecSent-English. The performance of different models was compared using the 2020 United States (US) Presidential Election as a case study. The ElecBERT-English and ElecBERT-Multi-Languages models outperformed BERTweet, with the ElecBERT-English model achieving a Mean Absolute Error (MAE) of 6.13. This paper presents a valuable contribution to sentiment analysis in the context of election-related tweets, with potential applications in political analysis, social media management, and policymaking.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信