Indonesian Hate Speech Text Classification Using Improved K-Nearest Neighbor with TF-IDF-ICSρF

Nova Adi Saputra, Khurotul Aeni, Nurul Mega Saraswati
{"title":"Indonesian Hate Speech Text Classification Using Improved K-Nearest Neighbor with TF-IDF-ICSρF","authors":"Nova Adi Saputra, Khurotul Aeni, Nurul Mega Saraswati","doi":"10.15294/sji.v11i1.48085","DOIUrl":null,"url":null,"abstract":"Purpose: Freedom in social media gives rise to the possibility of disturbing users through the sentences they send, which is limited by the Electronic Information and Transactions Law (UU ITE). This research aims to find an effective method for classifying hate speech text data, especially in Indonesian, with many categories expected to minimize this case.Methods: This study used 1.000 data from Twitter with five labels, including religion, race, physical, gender and other (invective or slander). The process started with several steps of preprocessing, data transformation using TF-IDF-ICSρF term weighting and data mining using an Improved KNN algorithm. Then, the results were compared with the TF-IDF and KNN methods to evaluate the differences.Result: Using TF-IDF-ICSρF and Improved KNN algorithms gets an average accuracy value of 88.11%, 17.81% higher compared with the same data and parameters to the K-Nearest Neighbor and TF-IDF algorithms, which get results of 70.30%.Novelty: Based on the comparison results, TF-IDF-ICSρF and Improved KNN methods can effectively classify hate speech sentences that have many labels with fairly good accuracy.","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":"8 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v11i1.48085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Freedom in social media gives rise to the possibility of disturbing users through the sentences they send, which is limited by the Electronic Information and Transactions Law (UU ITE). This research aims to find an effective method for classifying hate speech text data, especially in Indonesian, with many categories expected to minimize this case.Methods: This study used 1.000 data from Twitter with five labels, including religion, race, physical, gender and other (invective or slander). The process started with several steps of preprocessing, data transformation using TF-IDF-ICSρF term weighting and data mining using an Improved KNN algorithm. Then, the results were compared with the TF-IDF and KNN methods to evaluate the differences.Result: Using TF-IDF-ICSρF and Improved KNN algorithms gets an average accuracy value of 88.11%, 17.81% higher compared with the same data and parameters to the K-Nearest Neighbor and TF-IDF algorithms, which get results of 70.30%.Novelty: Based on the comparison results, TF-IDF-ICSρF and Improved KNN methods can effectively classify hate speech sentences that have many labels with fairly good accuracy.
利用 TF-IDF-ICSρF 改进 K 近邻法进行印尼仇恨语音文本分类
目的:社交媒体上的自由可能会通过用户发送的句子对其造成干扰,而这受到《电子信息和交易法》(UU ITE)的限制。本研究旨在找到一种对仇恨言论文本数据进行分类的有效方法,尤其是在印尼,预计许多类别的仇恨言论会将这种情况降到最低:本研究使用了来自 Twitter 的 1000 条数据,包括宗教、种族、身体、性别和其他(谩骂或诽谤)五个标签。研究过程分为几个步骤:预处理、使用 TF-IDF-ICSρF 术语加权进行数据转换,以及使用改进的 KNN 算法进行数据挖掘。然后,将结果与 TF-IDF 和 KNN 方法进行比较,以评估差异:结果:使用 TF-IDF-ICSρF 算法和改进 KNN 算法得到的平均准确率为 88.11%,与 K-近邻算法和 TF-IDF 算法得到的 70.30% 的结果相比,在相同的数据和参数下,准确率高出 17.81%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
13
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信