Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility

Hani Nurrahmi, Dade Nurjanah
{"title":"Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility","authors":"Hani Nurrahmi, Dade Nurjanah","doi":"10.1109/ICOIACT.2018.8350758","DOIUrl":null,"url":null,"abstract":"Cyberbullying is a repeated act that harasses, humiliates, threatens, or hassles other people through electronic devices and online social networking websites. Cyberbullying through the internet is more dangerous than traditional bullying, because it can potentially amplify the humiliation to an unlimited online audience. According to UNICEF and a survey by the Indonesian Ministry of Communication and Information, 58% of 435 adolescents do not understand about cyberbullying. Some of them might even have been the bullies, but since they did not understand about cyberbullying they could not recognise the negative effects of their bullying. The bullies may not recognise the harm of their actions, because they do not see immediate responses from their victims. Our study aimed to detect cyberbullying actors based on texts and the credibility analysis of users and notify them about the harm of cyberbullying. We collected data from Twitter. Since the data were unlabelled, we built a web-based labelling tool to classify tweets into cyberbullying and non-cyberbullying tweets. We obtained 301 cyberbullying tweets, 399 non-cyberbullying tweets, 2,053 negative words and 129 swear words from the tool. Afterwards, we applied SVM and KNN to learn about and detect cyberbullying texts. The results show that SVM results in the highest f1-score, 67%. We also measured the credibility analysis of users and found 257 Normal Users, 45 Harmful Bullying Actors, 53 Bullying Actors and 6 Prospective Bullying Actors.","PeriodicalId":6660,"journal":{"name":"2018 International Conference on Information and Communications Technology (ICOIACT)","volume":"51 1","pages":"543-548"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information and Communications Technology (ICOIACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOIACT.2018.8350758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

Cyberbullying is a repeated act that harasses, humiliates, threatens, or hassles other people through electronic devices and online social networking websites. Cyberbullying through the internet is more dangerous than traditional bullying, because it can potentially amplify the humiliation to an unlimited online audience. According to UNICEF and a survey by the Indonesian Ministry of Communication and Information, 58% of 435 adolescents do not understand about cyberbullying. Some of them might even have been the bullies, but since they did not understand about cyberbullying they could not recognise the negative effects of their bullying. The bullies may not recognise the harm of their actions, because they do not see immediate responses from their victims. Our study aimed to detect cyberbullying actors based on texts and the credibility analysis of users and notify them about the harm of cyberbullying. We collected data from Twitter. Since the data were unlabelled, we built a web-based labelling tool to classify tweets into cyberbullying and non-cyberbullying tweets. We obtained 301 cyberbullying tweets, 399 non-cyberbullying tweets, 2,053 negative words and 129 swear words from the tool. Afterwards, we applied SVM and KNN to learn about and detect cyberbullying texts. The results show that SVM results in the highest f1-score, 67%. We also measured the credibility analysis of users and found 257 Normal Users, 45 Harmful Bullying Actors, 53 Bullying Actors and 6 Prospective Bullying Actors.
使用文本分类和用户可信度的印尼Twitter网络欺凌检测
网络欺凌是一种通过电子设备和在线社交网站骚扰、羞辱、威胁或骚扰他人的反复行为。通过互联网进行的网络欺凌比传统的欺凌更危险,因为它有可能将羞辱扩大到无限的在线受众。根据联合国儿童基金会和印度尼西亚通信和信息部的一项调查,435名青少年中有58%不了解网络欺凌。他们中的一些人甚至可能是欺凌者,但由于他们不了解网络欺凌,他们无法认识到自己欺凌的负面影响。欺凌者可能没有意识到他们行为的危害,因为他们没有看到受害者的即时反应。我们的研究旨在基于文本和用户可信度分析来发现网络欺凌行为者,并告知他们网络欺凌的危害。我们从推特上收集数据。由于数据没有标签,我们建立了一个基于网络的标签工具,将推文分为网络欺凌和非网络欺凌推文。我们从该工具中获得了301条网络欺凌推文、399条非网络欺凌推文、2053个负面词汇和129个脏话。之后,我们运用SVM和KNN对网络欺凌文本进行学习和检测。结果表明,支持向量机的f1得分最高,为67%。我们还测量了用户的可信度分析,发现257名正常用户,45名有害欺凌行为者,53名欺凌行为者和6名潜在欺凌行为者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信