ANALISIS SENTIMEN MASYARAKAT TERHADAP VAKSINASI COVID-19 PADA MEDIA SOSIAL TWITTER MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM)

Jurnal Sistem Informasi dan Informatika (Simika) Pub Date : 2022-02-19 DOI:10.47080/simika.v5i1.1411

Herwin Syah, Arita Witanti

{"title":"ANALISIS SENTIMEN MASYARAKAT TERHADAP VAKSINASI COVID-19 PADA MEDIA SOSIAL TWITTER MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM)","authors":"Herwin Syah, Arita Witanti","doi":"10.47080/simika.v5i1.1411","DOIUrl":null,"url":null,"abstract":"This research was conducted to find information about the tendency of Indonesian people regarding the Covid-19 vaccination. The method that the author uses is by collecting data from Twitter social media using the API key provided by Twitter. The process of collecting data using a Python application with several libraries such as tweepy, pandas, numpy and nltk. After the data is crawled, then the data is cleaned with several data cleaning processes such as remove username, remove url, lower case, remove stopwords and lemmatize. Then the results are labeled with the textblob and sklearn libraries. then the data is analyzed using the Support Vector Machine (SVM) algorithm with the best comparison being 20 testing data and 80 training data or as many as 942 testing data and 3766 training data, the prediction results for testing data are f1 score 0.93, accuracy score 0.88, precision score 0.88 and a recall score of 0.99. The results showed that from 4,078 tweet data, there were 2,525 positive sentiments (43.0%), 771 negative sentiments (16.4%), and 1,912 neutral sentiments (40.6%). The results of 80% (3766) of training data and 20% (942) of test data obtained an accuracy score of 73.6%. From this study, it can be concluded that the tendency of Indonesian people when sampling data is taken is more accepting (positive responses) to government policies regarding the Covid-19 vaccination program. In the future, it is hoped that there will be a library that supports text data processing such as regional languages, because researchers found that during the data cleaning process there was a lot of word elimination, because many regional languages were used by the Indonesian people in writing on social media.","PeriodicalId":443734,"journal":{"name":"Jurnal Sistem Informasi dan Informatika (Simika)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sistem Informasi dan Informatika (Simika)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47080/simika.v5i1.1411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

This research was conducted to find information about the tendency of Indonesian people regarding the Covid-19 vaccination. The method that the author uses is by collecting data from Twitter social media using the API key provided by Twitter. The process of collecting data using a Python application with several libraries such as tweepy, pandas, numpy and nltk. After the data is crawled, then the data is cleaned with several data cleaning processes such as remove username, remove url, lower case, remove stopwords and lemmatize. Then the results are labeled with the textblob and sklearn libraries. then the data is analyzed using the Support Vector Machine (SVM) algorithm with the best comparison being 20 testing data and 80 training data or as many as 942 testing data and 3766 training data, the prediction results for testing data are f1 score 0.93, accuracy score 0.88, precision score 0.88 and a recall score of 0.99. The results showed that from 4,078 tweet data, there were 2,525 positive sentiments (43.0%), 771 negative sentiments (16.4%), and 1,912 neutral sentiments (40.6%). The results of 80% (3766) of training data and 20% (942) of test data obtained an accuracy score of 73.6%. From this study, it can be concluded that the tendency of Indonesian people when sampling data is taken is more accepting (positive responses) to government policies regarding the Covid-19 vaccination program. In the future, it is hoped that there will be a library that supports text data processing such as regional languages, because researchers found that during the data cleaning process there was a lot of word elimination, because many regional languages were used by the Indonesian people in writing on social media.

查看原文本刊更多论文

进行这项研究是为了找到有关印度尼西亚人对Covid-19疫苗接种倾向的信息。作者使用的方法是使用Twitter提供的API密钥从Twitter社交媒体收集数据。使用带有多个库(如tweepy、pandas、numpy和nltk)的Python应用程序收集数据的过程。爬取数据后，将使用几个数据清理过程对数据进行清理，例如删除用户名、删除url、删除小写字母、删除停止词和按字母顺序排列。然后用textblob和sklearn库标记结果。然后使用支持向量机(SVM)算法对数据进行分析，最佳对比为20个测试数据和80个训练数据，或多达942个测试数据和3766个训练数据，测试数据的预测结果f1得分为0.93，准确率得分为0.88，精度得分为0.88，召回率得分为0.99。结果表明，在4078条tweet数据中，正面情绪2525条(43.0%)，负面情绪771条(16.4%)，中性情绪1,912条(40.6%)。80%(3766)的训练数据和20%(942)的测试数据得到的准确率得分为73.6%。从这项研究中可以得出结论，印度尼西亚人在抽样数据时更倾向于接受(积极回应)政府关于Covid-19疫苗接种计划的政策。未来希望有一个支持区域语言等文本数据处理的库，因为研究人员发现，在数据清理过程中有大量的单词消除，因为印尼人在社交媒体上写作时使用了许多区域语言。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal Sistem Informasi dan Informatika (Simika)

自引率

0.00%

发文量