Analysis of Public Sentiment Using The K-Nearest Neighbor (k-NN) Algorithm and Lexicon Based on Indonesian Television Shows on Social Media Twitter

K. Hulliyah, Achmad Maulana Almaisah, Fitri Mintarsih, Siti Ummi Masrurah, D. Khairani, Saepul Aripiyanto
{"title":"Analysis of Public Sentiment Using The K-Nearest Neighbor (k-NN) Algorithm and Lexicon Based on Indonesian Television Shows on Social Media Twitter","authors":"K. Hulliyah, Achmad Maulana Almaisah, Fitri Mintarsih, Siti Ummi Masrurah, D. Khairani, Saepul Aripiyanto","doi":"10.1109/CITSM56380.2022.9936011","DOIUrl":null,"url":null,"abstract":"This study aims to implement a combination of the k-Nearest Neighbor (k-NN) and Lexicon Based algorithms in the case of sentiment analysis of public responses about Indonesian television shows uploaded on Twitter with 3 sentiment classes, namely positive, negative and neutral. The chosen method is a combination classification method between k-Nearest Neighbor (k-NN) and Lexicon Based. Before classifying, the pre-processing stage of this study was carried out first including cleaning, case folding, tokenizing, normalization, stopword removal and stemming. Then weighting is carried out by the TF-IDF method. The dataset used amounted to 200 tweets taken from tweet mentions of 4 Indonesian television stations namely, TVRI, RCTI, SCTV and ANTV. The program is designed using the python programming language with the help of the jupyter notebook framework. The scenario is carried out using the value of $k$ in the k-NN algorithm of $k=1,\\ k=3$ and $k=5$. The best results obtained were $k=3$ values with an accuracy of 74%, an error rate of 26%, a recall of 83.3%, a precision of 80.64% and $f$-results at a value of $\\mathrm{k}=5$ amounting to 90%.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9936011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This study aims to implement a combination of the k-Nearest Neighbor (k-NN) and Lexicon Based algorithms in the case of sentiment analysis of public responses about Indonesian television shows uploaded on Twitter with 3 sentiment classes, namely positive, negative and neutral. The chosen method is a combination classification method between k-Nearest Neighbor (k-NN) and Lexicon Based. Before classifying, the pre-processing stage of this study was carried out first including cleaning, case folding, tokenizing, normalization, stopword removal and stemming. Then weighting is carried out by the TF-IDF method. The dataset used amounted to 200 tweets taken from tweet mentions of 4 Indonesian television stations namely, TVRI, RCTI, SCTV and ANTV. The program is designed using the python programming language with the help of the jupyter notebook framework. The scenario is carried out using the value of $k$ in the k-NN algorithm of $k=1,\ k=3$ and $k=5$. The best results obtained were $k=3$ values with an accuracy of 74%, an error rate of 26%, a recall of 83.3%, a precision of 80.64% and $f$-results at a value of $\mathrm{k}=5$ amounting to 90%.
基于社交媒体Twitter上印尼电视节目的k-最近邻(k-NN)算法和词典的公众情绪分析
本研究旨在结合k-Nearest Neighbor (k-NN)和基于Lexicon的算法,对Twitter上上传的印尼电视节目的公众反应进行情绪分析,分为正面、负面和中性三种情绪类别。所选择的方法是结合k-Nearest Neighbor (k-NN)和Lexicon Based的分类方法。在分类之前,本研究首先进行了预处理阶段,包括清洗、案例折叠、标记化、归一化、停止词去除和词干提取。然后用TF-IDF法进行加权。使用的数据集共为200条推文,来自4个印度尼西亚电视台的推文提及,即TVRI, RCTI, SCTV和ANTV。该程序是在jupyter notebook框架的帮助下,使用python编程语言设计的。该场景是使用k=1, k=3和k=5的k- nn算法中的k值来实现的。得到的最佳结果是$k=3$值,准确率为74%,错误率为26%,召回率为83.3%,精度为80.64%,$f$-结果在$\ maththrm {k}=5$值处达到90%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信