{"title":"基于支持向量机和词典的Kartu Prakerja情感分析(印度尼西亚就业前卡政府计划)","authors":"Bayu Waspodo, Qurrotul Aini, Fikri Rama Singgih, Rinda Hesti Kusumaningtyas, Elvi Fetrina","doi":"10.1109/CITSM56380.2022.9935990","DOIUrl":null,"url":null,"abstract":"Machine Learning is a technology that is able to study existing data and perform certain tasks according to what data it learns, either text, video, images, or numerical data using supervised learning and unsupversied learning techniques. The Pre-Employment Cards (Kartu Prakerja) is one of the government programs that aims to provide assistance to the Indonesian people, especially those who do not have a job. Based on data from the Central Statistics Agency for 2020–2021 recipients of the pre-employment program as many as 11.4 million recipients, the pre-employment card program received responses from various communities, whether recipients or not, where these opinions included pro-contra opinions on the Pre-Employment Cards (Kartu Prakerja) program. The purpose of this study is to classify the response (sentiment) of the community by using machine learning on pre-employment cards. Sentiment analysis is used to obtain information in the form of opinions (sentiments) based on textual data to determine the public's view of news, service satisfaction, and government policies. The sentiment analysis process is divided into several stages, namely: crawling, data preprocessing, classification, and data visualization. In this study, preprocessing consists of several stages, namely: cleaning, lemmization, stemming, tokenizing, and stopword removal. The method used in this research is a combination of unsupervised learning: Lexicon Based and supervised learning: Support Vector Machine. Textual data weighting is based on data matching against the normalized Lexicon Sentiment with scaling to determine the positive, neutral, and negative sentiment classes. The results of the classification of 940 tweet data obtained 330 positive tweets (35%), 302 negative tweets (32%), and 308 neutral tweets (33%). From the test results on the classification accuracy with the Support Vector Machine, the results obtained an average accuracy of 98.75%, precision 0.98, recall 0.98, and f-measure 0.98 with the conditions for selecting the Cost value in SVM using the help of 10-fold cross validation.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Support Vector Machine and Lexicon based Sentiment Analysis on Kartu Prakerja (Indonesia Pre-Employment Cards Government Initiatives)\",\"authors\":\"Bayu Waspodo, Qurrotul Aini, Fikri Rama Singgih, Rinda Hesti Kusumaningtyas, Elvi Fetrina\",\"doi\":\"10.1109/CITSM56380.2022.9935990\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine Learning is a technology that is able to study existing data and perform certain tasks according to what data it learns, either text, video, images, or numerical data using supervised learning and unsupversied learning techniques. The Pre-Employment Cards (Kartu Prakerja) is one of the government programs that aims to provide assistance to the Indonesian people, especially those who do not have a job. Based on data from the Central Statistics Agency for 2020–2021 recipients of the pre-employment program as many as 11.4 million recipients, the pre-employment card program received responses from various communities, whether recipients or not, where these opinions included pro-contra opinions on the Pre-Employment Cards (Kartu Prakerja) program. The purpose of this study is to classify the response (sentiment) of the community by using machine learning on pre-employment cards. Sentiment analysis is used to obtain information in the form of opinions (sentiments) based on textual data to determine the public's view of news, service satisfaction, and government policies. The sentiment analysis process is divided into several stages, namely: crawling, data preprocessing, classification, and data visualization. In this study, preprocessing consists of several stages, namely: cleaning, lemmization, stemming, tokenizing, and stopword removal. The method used in this research is a combination of unsupervised learning: Lexicon Based and supervised learning: Support Vector Machine. Textual data weighting is based on data matching against the normalized Lexicon Sentiment with scaling to determine the positive, neutral, and negative sentiment classes. The results of the classification of 940 tweet data obtained 330 positive tweets (35%), 302 negative tweets (32%), and 308 neutral tweets (33%). From the test results on the classification accuracy with the Support Vector Machine, the results obtained an average accuracy of 98.75%, precision 0.98, recall 0.98, and f-measure 0.98 with the conditions for selecting the Cost value in SVM using the help of 10-fold cross validation.\",\"PeriodicalId\":342813,\"journal\":{\"name\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITSM56380.2022.9935990\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9935990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Support Vector Machine and Lexicon based Sentiment Analysis on Kartu Prakerja (Indonesia Pre-Employment Cards Government Initiatives)
Machine Learning is a technology that is able to study existing data and perform certain tasks according to what data it learns, either text, video, images, or numerical data using supervised learning and unsupversied learning techniques. The Pre-Employment Cards (Kartu Prakerja) is one of the government programs that aims to provide assistance to the Indonesian people, especially those who do not have a job. Based on data from the Central Statistics Agency for 2020–2021 recipients of the pre-employment program as many as 11.4 million recipients, the pre-employment card program received responses from various communities, whether recipients or not, where these opinions included pro-contra opinions on the Pre-Employment Cards (Kartu Prakerja) program. The purpose of this study is to classify the response (sentiment) of the community by using machine learning on pre-employment cards. Sentiment analysis is used to obtain information in the form of opinions (sentiments) based on textual data to determine the public's view of news, service satisfaction, and government policies. The sentiment analysis process is divided into several stages, namely: crawling, data preprocessing, classification, and data visualization. In this study, preprocessing consists of several stages, namely: cleaning, lemmization, stemming, tokenizing, and stopword removal. The method used in this research is a combination of unsupervised learning: Lexicon Based and supervised learning: Support Vector Machine. Textual data weighting is based on data matching against the normalized Lexicon Sentiment with scaling to determine the positive, neutral, and negative sentiment classes. The results of the classification of 940 tweet data obtained 330 positive tweets (35%), 302 negative tweets (32%), and 308 neutral tweets (33%). From the test results on the classification accuracy with the Support Vector Machine, the results obtained an average accuracy of 98.75%, precision 0.98, recall 0.98, and f-measure 0.98 with the conditions for selecting the Cost value in SVM using the help of 10-fold cross validation.