K. Hulliyah, Achmad Maulana Almaisah, Fitri Mintarsih, Siti Ummi Masrurah, D. Khairani, Saepul Aripiyanto
{"title":"Analysis of Public Sentiment Using The K-Nearest Neighbor (k-NN) Algorithm and Lexicon Based on Indonesian Television Shows on Social Media Twitter","authors":"K. Hulliyah, Achmad Maulana Almaisah, Fitri Mintarsih, Siti Ummi Masrurah, D. Khairani, Saepul Aripiyanto","doi":"10.1109/CITSM56380.2022.9936011","DOIUrl":null,"url":null,"abstract":"This study aims to implement a combination of the k-Nearest Neighbor (k-NN) and Lexicon Based algorithms in the case of sentiment analysis of public responses about Indonesian television shows uploaded on Twitter with 3 sentiment classes, namely positive, negative and neutral. The chosen method is a combination classification method between k-Nearest Neighbor (k-NN) and Lexicon Based. Before classifying, the pre-processing stage of this study was carried out first including cleaning, case folding, tokenizing, normalization, stopword removal and stemming. Then weighting is carried out by the TF-IDF method. The dataset used amounted to 200 tweets taken from tweet mentions of 4 Indonesian television stations namely, TVRI, RCTI, SCTV and ANTV. The program is designed using the python programming language with the help of the jupyter notebook framework. The scenario is carried out using the value of $k$ in the k-NN algorithm of $k=1,\\ k=3$ and $k=5$. The best results obtained were $k=3$ values with an accuracy of 74%, an error rate of 26%, a recall of 83.3%, a precision of 80.64% and $f$-results at a value of $\\mathrm{k}=5$ amounting to 90%.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9936011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study aims to implement a combination of the k-Nearest Neighbor (k-NN) and Lexicon Based algorithms in the case of sentiment analysis of public responses about Indonesian television shows uploaded on Twitter with 3 sentiment classes, namely positive, negative and neutral. The chosen method is a combination classification method between k-Nearest Neighbor (k-NN) and Lexicon Based. Before classifying, the pre-processing stage of this study was carried out first including cleaning, case folding, tokenizing, normalization, stopword removal and stemming. Then weighting is carried out by the TF-IDF method. The dataset used amounted to 200 tweets taken from tweet mentions of 4 Indonesian television stations namely, TVRI, RCTI, SCTV and ANTV. The program is designed using the python programming language with the help of the jupyter notebook framework. The scenario is carried out using the value of $k$ in the k-NN algorithm of $k=1,\ k=3$ and $k=5$. The best results obtained were $k=3$ values with an accuracy of 74%, an error rate of 26%, a recall of 83.3%, a precision of 80.64% and $f$-results at a value of $\mathrm{k}=5$ amounting to 90%.