Support Vector Machine and Lexicon based Sentiment Analysis on Kartu Prakerja (Indonesia Pre-Employment Cards Government Initiatives)

Bayu Waspodo, Qurrotul Aini, Fikri Rama Singgih, Rinda Hesti Kusumaningtyas, Elvi Fetrina
{"title":"Support Vector Machine and Lexicon based Sentiment Analysis on Kartu Prakerja (Indonesia Pre-Employment Cards Government Initiatives)","authors":"Bayu Waspodo, Qurrotul Aini, Fikri Rama Singgih, Rinda Hesti Kusumaningtyas, Elvi Fetrina","doi":"10.1109/CITSM56380.2022.9935990","DOIUrl":null,"url":null,"abstract":"Machine Learning is a technology that is able to study existing data and perform certain tasks according to what data it learns, either text, video, images, or numerical data using supervised learning and unsupversied learning techniques. The Pre-Employment Cards (Kartu Prakerja) is one of the government programs that aims to provide assistance to the Indonesian people, especially those who do not have a job. Based on data from the Central Statistics Agency for 2020–2021 recipients of the pre-employment program as many as 11.4 million recipients, the pre-employment card program received responses from various communities, whether recipients or not, where these opinions included pro-contra opinions on the Pre-Employment Cards (Kartu Prakerja) program. The purpose of this study is to classify the response (sentiment) of the community by using machine learning on pre-employment cards. Sentiment analysis is used to obtain information in the form of opinions (sentiments) based on textual data to determine the public's view of news, service satisfaction, and government policies. The sentiment analysis process is divided into several stages, namely: crawling, data preprocessing, classification, and data visualization. In this study, preprocessing consists of several stages, namely: cleaning, lemmization, stemming, tokenizing, and stopword removal. The method used in this research is a combination of unsupervised learning: Lexicon Based and supervised learning: Support Vector Machine. Textual data weighting is based on data matching against the normalized Lexicon Sentiment with scaling to determine the positive, neutral, and negative sentiment classes. The results of the classification of 940 tweet data obtained 330 positive tweets (35%), 302 negative tweets (32%), and 308 neutral tweets (33%). From the test results on the classification accuracy with the Support Vector Machine, the results obtained an average accuracy of 98.75%, precision 0.98, recall 0.98, and f-measure 0.98 with the conditions for selecting the Cost value in SVM using the help of 10-fold cross validation.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9935990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine Learning is a technology that is able to study existing data and perform certain tasks according to what data it learns, either text, video, images, or numerical data using supervised learning and unsupversied learning techniques. The Pre-Employment Cards (Kartu Prakerja) is one of the government programs that aims to provide assistance to the Indonesian people, especially those who do not have a job. Based on data from the Central Statistics Agency for 2020–2021 recipients of the pre-employment program as many as 11.4 million recipients, the pre-employment card program received responses from various communities, whether recipients or not, where these opinions included pro-contra opinions on the Pre-Employment Cards (Kartu Prakerja) program. The purpose of this study is to classify the response (sentiment) of the community by using machine learning on pre-employment cards. Sentiment analysis is used to obtain information in the form of opinions (sentiments) based on textual data to determine the public's view of news, service satisfaction, and government policies. The sentiment analysis process is divided into several stages, namely: crawling, data preprocessing, classification, and data visualization. In this study, preprocessing consists of several stages, namely: cleaning, lemmization, stemming, tokenizing, and stopword removal. The method used in this research is a combination of unsupervised learning: Lexicon Based and supervised learning: Support Vector Machine. Textual data weighting is based on data matching against the normalized Lexicon Sentiment with scaling to determine the positive, neutral, and negative sentiment classes. The results of the classification of 940 tweet data obtained 330 positive tweets (35%), 302 negative tweets (32%), and 308 neutral tweets (33%). From the test results on the classification accuracy with the Support Vector Machine, the results obtained an average accuracy of 98.75%, precision 0.98, recall 0.98, and f-measure 0.98 with the conditions for selecting the Cost value in SVM using the help of 10-fold cross validation.
基于支持向量机和词典的Kartu Prakerja情感分析(印度尼西亚就业前卡政府计划)
机器学习是一种能够研究现有数据并根据其学习的数据执行某些任务的技术,无论是文本,视频,图像还是使用监督学习和无监督学习技术的数字数据。就业前卡(Kartu Prakerja)是政府项目之一,旨在为印尼人民,特别是那些没有工作的人提供帮助。根据中央统计局关于2020-2021年就业前计划受益人多达1140万人的数据,就业前卡计划收到了来自各个社区的回应,无论受益人是否,这些意见包括对就业前卡(Kartu Prakerja)计划的赞成意见。本研究的目的是通过在就业前卡上使用机器学习对社区的反应(情绪)进行分类。情感分析是基于文本数据获取意见(情绪)形式的信息,以确定公众对新闻、服务满意度和政府政策的看法。情感分析过程分为几个阶段,即:抓取、数据预处理、分类和数据可视化。在本研究中,预处理包括几个阶段,即:清洗、词源化、词干提取、标记化和停止词去除。本研究使用的方法是结合无监督学习:基于词典和监督学习:支持向量机。文本数据加权是基于对规范化的Lexicon Sentiment的数据匹配,通过缩放来确定积极、中立和消极的情绪类别。对940条推文数据进行分类,得到330条正面推文(35%),302条负面推文(32%),308条中性推文(33%)。从支持向量机对分类准确率的测试结果来看,在选择SVM的Cost值的条件下,采用10倍交叉验证,平均准确率为98.75%,精密度为0.98,召回率为0.98,f-measure为0.98。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信