Analysis of Public Sentiment Using The K-Nearest Neighbor (k-NN) Algorithm and Lexicon Based on Indonesian Television Shows on Social Media Twitter

2022 10th International Conference on Cyber and IT Service Management (CITSM) Pub Date : 2022-09-20 DOI:10.1109/CITSM56380.2022.9936011

K. Hulliyah, Achmad Maulana Almaisah, Fitri Mintarsih, Siti Ummi Masrurah, D. Khairani, Saepul Aripiyanto

{"title":"Analysis of Public Sentiment Using The K-Nearest Neighbor (k-NN) Algorithm and Lexicon Based on Indonesian Television Shows on Social Media Twitter","authors":"K. Hulliyah, Achmad Maulana Almaisah, Fitri Mintarsih, Siti Ummi Masrurah, D. Khairani, Saepul Aripiyanto","doi":"10.1109/CITSM56380.2022.9936011","DOIUrl":null,"url":null,"abstract":"This study aims to implement a combination of the k-Nearest Neighbor (k-NN) and Lexicon Based algorithms in the case of sentiment analysis of public responses about Indonesian television shows uploaded on Twitter with 3 sentiment classes, namely positive, negative and neutral. The chosen method is a combination classification method between k-Nearest Neighbor (k-NN) and Lexicon Based. Before classifying, the pre-processing stage of this study was carried out first including cleaning, case folding, tokenizing, normalization, stopword removal and stemming. Then weighting is carried out by the TF-IDF method. The dataset used amounted to 200 tweets taken from tweet mentions of 4 Indonesian television stations namely, TVRI, RCTI, SCTV and ANTV. The program is designed using the python programming language with the help of the jupyter notebook framework. The scenario is carried out using the value of $k$ in the k-NN algorithm of $k=1,\\ k=3$ and $k=5$. The best results obtained were $k=3$ values with an accuracy of 74%, an error rate of 26%, a recall of 83.3%, a precision of 80.64% and $f$-results at a value of $\\mathrm{k}=5$ amounting to 90%.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9936011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This study aims to implement a combination of the k-Nearest Neighbor (k-NN) and Lexicon Based algorithms in the case of sentiment analysis of public responses about Indonesian television shows uploaded on Twitter with 3 sentiment classes, namely positive, negative and neutral. The chosen method is a combination classification method between k-Nearest Neighbor (k-NN) and Lexicon Based. Before classifying, the pre-processing stage of this study was carried out first including cleaning, case folding, tokenizing, normalization, stopword removal and stemming. Then weighting is carried out by the TF-IDF method. The dataset used amounted to 200 tweets taken from tweet mentions of 4 Indonesian television stations namely, TVRI, RCTI, SCTV and ANTV. The program is designed using the python programming language with the help of the jupyter notebook framework. The scenario is carried out using the value of $k$ in the k-NN algorithm of $k=1,\ k=3$ and $k=5$. The best results obtained were $k=3$ values with an accuracy of 74%, an error rate of 26%, a recall of 83.3%, a precision of 80.64% and $f$-results at a value of $\mathrm{k}=5$ amounting to 90%.

查看原文本刊更多论文

基于社交媒体Twitter上印尼电视节目的k-最近邻(k-NN)算法和词典的公众情绪分析

本研究旨在结合k-Nearest Neighbor (k-NN)和基于Lexicon的算法，对Twitter上上传的印尼电视节目的公众反应进行情绪分析，分为正面、负面和中性三种情绪类别。所选择的方法是结合k-Nearest Neighbor (k-NN)和Lexicon Based的分类方法。在分类之前，本研究首先进行了预处理阶段，包括清洗、案例折叠、标记化、归一化、停止词去除和词干提取。然后用TF-IDF法进行加权。使用的数据集共为200条推文，来自4个印度尼西亚电视台的推文提及，即TVRI, RCTI, SCTV和ANTV。该程序是在jupyter notebook框架的帮助下，使用python编程语言设计的。该场景是使用k=1, k=3和k=5的k- nn算法中的k值来实现的。得到的最佳结果是$k=3$值，准确率为74%，错误率为26%，召回率为83.3%，精度为80.64%，$f$-结果在$\ maththrm {k}=5$值处达到90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 10th International Conference on Cyber and IT Service Management (CITSM)

自引率

0.00%

发文量