当返乡不来:2021返乡禁令对推特数据的情感分析使用支持向量机算法

Lidia Sandra, Ford Lumbangaol
{"title":"当返乡不来:2021返乡禁令对推特数据的情感分析使用支持向量机算法","authors":"Lidia Sandra, Ford Lumbangaol","doi":"10.1109/ICISS53185.2021.9533255","DOIUrl":null,"url":null,"abstract":"Homecoming, more traditionally known as Mudik, has become a trending topic on several social media platforms as soon as the 11-day homecoming ritual ban was announced on 7 April 2021. Opinions, varying from those in favor of and against the ban, start to rapidly appear. Twitter, a social media platform which is now considered to be an extension of oneself and often used to express ones’ opinion, has become flooded with comments on the homecoming ritual ban. The swarm of opinions in the form of tweets were then used as a dataset for sentiment analysis in order to understand how people perceive the ban. The algorithm used in this research is the classification algorithm using the Support Vector Machine method. The dataset was classified into three sentiments: positive, negative, and neutral. The use of the Support Vector Machine algorithm yielded a 62% accuracy with this dataset. The sentiment analysis showed that the keyword \"mudik\" had a neutral sentiment for the most part. Meanwhile, results of engagement analysis show that the largest forms of engagements were retweets and liking tweets that had a neutral sentiment. When the neutral sentiment was removed, we found that the largest sentiment on the homecoming ritual ban was negative. This is likely due to the release of an addendum to the Covid-19 Handling Task Force Circular Number 13 of 2021 on 22 April 2021 that imposes more restrictions on and extends the effective dates of the restrictions related to the homecoming ritual ban; exactly one day before the data scraping of 5000 datasets on tweets from 23 April 2021 was carried out. The researcher had already sampled the tweets with the most engagements (those with the most retweets and likes). It was found that some tweets had a negative sentiment, but the model classified it as having a neutral sentiment. This may be affected by inaccuracies of dataset training as some of the tweets were in Malay rather than Indonesian. A challenge that needs to be overcome is the limited number of datasets for NLP training or sentiment analysis for the Indonesian language in comparison to that of the English language. On the other hand, this has become an opportunity for the researcher to develop a more appropriate training model.","PeriodicalId":220371,"journal":{"name":"2021 International Conference on ICT for Smart Society (ICISS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"When Homecoming is not Coming: 2021 Homecoming Ban Sentiment Analysis on Twitter Data Using Support Vector Machine Algorithm\",\"authors\":\"Lidia Sandra, Ford Lumbangaol\",\"doi\":\"10.1109/ICISS53185.2021.9533255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Homecoming, more traditionally known as Mudik, has become a trending topic on several social media platforms as soon as the 11-day homecoming ritual ban was announced on 7 April 2021. Opinions, varying from those in favor of and against the ban, start to rapidly appear. Twitter, a social media platform which is now considered to be an extension of oneself and often used to express ones’ opinion, has become flooded with comments on the homecoming ritual ban. The swarm of opinions in the form of tweets were then used as a dataset for sentiment analysis in order to understand how people perceive the ban. The algorithm used in this research is the classification algorithm using the Support Vector Machine method. The dataset was classified into three sentiments: positive, negative, and neutral. The use of the Support Vector Machine algorithm yielded a 62% accuracy with this dataset. The sentiment analysis showed that the keyword \\\"mudik\\\" had a neutral sentiment for the most part. Meanwhile, results of engagement analysis show that the largest forms of engagements were retweets and liking tweets that had a neutral sentiment. When the neutral sentiment was removed, we found that the largest sentiment on the homecoming ritual ban was negative. This is likely due to the release of an addendum to the Covid-19 Handling Task Force Circular Number 13 of 2021 on 22 April 2021 that imposes more restrictions on and extends the effective dates of the restrictions related to the homecoming ritual ban; exactly one day before the data scraping of 5000 datasets on tweets from 23 April 2021 was carried out. The researcher had already sampled the tweets with the most engagements (those with the most retweets and likes). It was found that some tweets had a negative sentiment, but the model classified it as having a neutral sentiment. This may be affected by inaccuracies of dataset training as some of the tweets were in Malay rather than Indonesian. A challenge that needs to be overcome is the limited number of datasets for NLP training or sentiment analysis for the Indonesian language in comparison to that of the English language. On the other hand, this has become an opportunity for the researcher to develop a more appropriate training model.\",\"PeriodicalId\":220371,\"journal\":{\"name\":\"2021 International Conference on ICT for Smart Society (ICISS)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on ICT for Smart Society (ICISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICISS53185.2021.9533255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on ICT for Smart Society (ICISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISS53185.2021.9533255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

自2021年4月7日宣布为期11天的返乡禁令以来,返乡,更传统地称为Mudik,已经成为几个社交媒体平台上的热门话题。赞成和反对禁令的意见开始迅速出现。推特是一个社交媒体平台,现在被认为是个人的延伸,经常被用来表达个人观点。对于返校仪式的禁令,推特上的评论铺天盖地。然后,以推文形式出现的大量意见被用作情绪分析的数据集,以了解人们如何看待禁令。本研究使用的算法是基于支持向量机方法的分类算法。数据集被分为三种情绪:积极、消极和中性。使用支持向量机算法对该数据集产生了62%的准确率。情绪分析显示,关键词“mudik”在大部分情况下具有中性情绪。与此同时,参与分析的结果显示,最大的参与形式是转发和点赞那些情绪中立的推文。当中性情绪被移除时,我们发现对返乡仪式禁令的最大情绪是负面的。这可能是由于2021年4月22日发布了2019冠状病毒病处理工作组2021年第13号通告的附录,对返乡仪式禁令施加了更多限制并延长了相关限制的生效日期;就在从2021年4月23日起对5000个推特数据集进行数据抓取的前一天。研究人员已经抽取了参与次数最多的推文(转发次数和点赞次数最多的推文)。结果发现,一些推文带有负面情绪,但该模型将其归类为具有中性情绪。这可能受到数据集训练不准确的影响,因为一些推文是马来语而不是印尼语。需要克服的一个挑战是,与英语相比,用于NLP训练或情感分析的印尼语数据集数量有限。另一方面,这也为研究者提供了一个开发更合适的训练模式的机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
When Homecoming is not Coming: 2021 Homecoming Ban Sentiment Analysis on Twitter Data Using Support Vector Machine Algorithm
Homecoming, more traditionally known as Mudik, has become a trending topic on several social media platforms as soon as the 11-day homecoming ritual ban was announced on 7 April 2021. Opinions, varying from those in favor of and against the ban, start to rapidly appear. Twitter, a social media platform which is now considered to be an extension of oneself and often used to express ones’ opinion, has become flooded with comments on the homecoming ritual ban. The swarm of opinions in the form of tweets were then used as a dataset for sentiment analysis in order to understand how people perceive the ban. The algorithm used in this research is the classification algorithm using the Support Vector Machine method. The dataset was classified into three sentiments: positive, negative, and neutral. The use of the Support Vector Machine algorithm yielded a 62% accuracy with this dataset. The sentiment analysis showed that the keyword "mudik" had a neutral sentiment for the most part. Meanwhile, results of engagement analysis show that the largest forms of engagements were retweets and liking tweets that had a neutral sentiment. When the neutral sentiment was removed, we found that the largest sentiment on the homecoming ritual ban was negative. This is likely due to the release of an addendum to the Covid-19 Handling Task Force Circular Number 13 of 2021 on 22 April 2021 that imposes more restrictions on and extends the effective dates of the restrictions related to the homecoming ritual ban; exactly one day before the data scraping of 5000 datasets on tweets from 23 April 2021 was carried out. The researcher had already sampled the tweets with the most engagements (those with the most retweets and likes). It was found that some tweets had a negative sentiment, but the model classified it as having a neutral sentiment. This may be affected by inaccuracies of dataset training as some of the tweets were in Malay rather than Indonesian. A challenge that needs to be overcome is the limited number of datasets for NLP training or sentiment analysis for the Indonesian language in comparison to that of the English language. On the other hand, this has become an opportunity for the researcher to develop a more appropriate training model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信