ANALISIS SENTIMEN MASYARAKAT TERHADAP VAKSINASI COVID-19 PADA MEDIA SOSIAL TWITTER MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM)

Herwin Syah, Arita Witanti
{"title":"ANALISIS SENTIMEN MASYARAKAT TERHADAP VAKSINASI COVID-19 PADA MEDIA SOSIAL TWITTER MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM)","authors":"Herwin Syah, Arita Witanti","doi":"10.47080/simika.v5i1.1411","DOIUrl":null,"url":null,"abstract":"This research was conducted to find information about the tendency of Indonesian people regarding the Covid-19 vaccination. The method that the author uses is by collecting data from Twitter social media using the API key provided by Twitter. The process of collecting data using a Python application with several libraries such as tweepy, pandas, numpy and nltk. After the data is crawled, then the data is cleaned with several data cleaning processes such as remove username, remove url, lower case, remove stopwords and lemmatize. Then the results are labeled with the textblob and sklearn libraries. then the data is analyzed using the Support Vector Machine (SVM) algorithm with the best comparison being 20 testing data and 80 training data or as many as 942 testing data and 3766 training data, the prediction results for testing data are f1 score 0.93, accuracy score 0.88, precision score 0.88 and a recall score of 0.99. The results showed that from 4,078 tweet data, there were 2,525 positive sentiments (43.0%), 771 negative sentiments (16.4%), and 1,912 neutral sentiments (40.6%). The results of 80% (3766) of training data and 20% (942) of test data obtained an accuracy score of 73.6%. From this study, it can be concluded that the tendency of Indonesian people when sampling data is taken is more accepting (positive responses) to government policies regarding the Covid-19 vaccination program. In the future, it is hoped that there will be a library that supports text data processing such as regional languages, because researchers found that during the data cleaning process there was a lot of word elimination, because many regional languages ​​were used by the Indonesian people in writing on social media.","PeriodicalId":443734,"journal":{"name":"Jurnal Sistem Informasi dan Informatika (Simika)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sistem Informasi dan Informatika (Simika)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47080/simika.v5i1.1411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This research was conducted to find information about the tendency of Indonesian people regarding the Covid-19 vaccination. The method that the author uses is by collecting data from Twitter social media using the API key provided by Twitter. The process of collecting data using a Python application with several libraries such as tweepy, pandas, numpy and nltk. After the data is crawled, then the data is cleaned with several data cleaning processes such as remove username, remove url, lower case, remove stopwords and lemmatize. Then the results are labeled with the textblob and sklearn libraries. then the data is analyzed using the Support Vector Machine (SVM) algorithm with the best comparison being 20 testing data and 80 training data or as many as 942 testing data and 3766 training data, the prediction results for testing data are f1 score 0.93, accuracy score 0.88, precision score 0.88 and a recall score of 0.99. The results showed that from 4,078 tweet data, there were 2,525 positive sentiments (43.0%), 771 negative sentiments (16.4%), and 1,912 neutral sentiments (40.6%). The results of 80% (3766) of training data and 20% (942) of test data obtained an accuracy score of 73.6%. From this study, it can be concluded that the tendency of Indonesian people when sampling data is taken is more accepting (positive responses) to government policies regarding the Covid-19 vaccination program. In the future, it is hoped that there will be a library that supports text data processing such as regional languages, because researchers found that during the data cleaning process there was a lot of word elimination, because many regional languages ​​were used by the Indonesian people in writing on social media.
进行这项研究是为了找到有关印度尼西亚人对Covid-19疫苗接种倾向的信息。作者使用的方法是使用Twitter提供的API密钥从Twitter社交媒体收集数据。使用带有多个库(如tweepy、pandas、numpy和nltk)的Python应用程序收集数据的过程。爬取数据后,将使用几个数据清理过程对数据进行清理,例如删除用户名、删除url、删除小写字母、删除停止词和按字母顺序排列。然后用textblob和sklearn库标记结果。然后使用支持向量机(SVM)算法对数据进行分析,最佳对比为20个测试数据和80个训练数据,或多达942个测试数据和3766个训练数据,测试数据的预测结果f1得分为0.93,准确率得分为0.88,精度得分为0.88,召回率得分为0.99。结果表明,在4078条tweet数据中,正面情绪2525条(43.0%),负面情绪771条(16.4%),中性情绪1,912条(40.6%)。80%(3766)的训练数据和20%(942)的测试数据得到的准确率得分为73.6%。从这项研究中可以得出结论,印度尼西亚人在抽样数据时更倾向于接受(积极回应)政府关于Covid-19疫苗接种计划的政策。未来希望有一个支持区域语言等文本数据处理的库,因为研究人员发现,在数据清理过程中有大量的单词消除,因为印尼人在社交媒体上写作时使用了许多区域语言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信