使用机器学习技术分析Covid-19发病的情绪

IF 1.7 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Vishakha Arya, A. Mishra, Alfonso González-Briones
{"title":"使用机器学习技术分析Covid-19发病的情绪","authors":"Vishakha Arya, A. Mishra, Alfonso González-Briones","doi":"10.14201/adcaij.27348","DOIUrl":null,"url":null,"abstract":"The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames.  In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.\n ","PeriodicalId":42597,"journal":{"name":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","volume":"190 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Analysis of sentiments on the onset of Covid-19 using Machine Learning Techniques\",\"authors\":\"Vishakha Arya, A. Mishra, Alfonso González-Briones\",\"doi\":\"10.14201/adcaij.27348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames.  In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.\\n \",\"PeriodicalId\":42597,\"journal\":{\"name\":\"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal\",\"volume\":\"190 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2022-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14201/adcaij.27348\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14201/adcaij.27348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 12

摘要

新型冠状病毒(Covid-19)大流行席卷全球,成为社交媒体平台上最引人注目的话题之一。围绕新冠肺炎的各种想法、意见、情绪,在社交媒体(sns)上表达自己目前的感受的情绪爆发。分析情绪有助于产生更好的结果。从不同的博客网站收集数据,如Facebook、Twitter、微博、YouTube、Instagram等,Twitter是最大的存储库。视频、文本和音频也从存储库中收集。情感分析使用意见挖掘来获取用户的情感,并相应地将其分类为积极、消极和中立。对在不同时间框架内收集的3586条推文进行分析和机器学习分类。本文对Covid-19大流行期间积累的推文进行了情绪分析。使用基于web的应用程序Hydrator从Twitter数据库收集Tweets。数据预处理消除了原始数据中的所有噪声和异常值。利用自然语言工具包(NLTK),对文本分类进行情感分析,并计算得分主观极性、计数和情感分布。N-gram用于文本挖掘和自然语言处理,用于文本或文档中的连续单词序列,应用单gram、双gram和三gram进行统计计算。术语频率和逆文档频率(TF-IDF)是一种将文本数据转换为数字形式的特征提取技术。向量化数据馈送到我们的模型中,以从语言数据中获得洞察力。线性SVC、多项式nb、GBM和随机森林分类器与Tfidf分类模型应用于我们提出的模型。线性支持向量分类比其他两种分类器性能更好。结果表明,射频性能较好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analysis of sentiments on the onset of Covid-19 using Machine Learning Techniques
The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames.  In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.  
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.40
自引率
0.00%
发文量
22
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信