Separating Hate Speech from Abusive Language on Indonesian Twitter

Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo
{"title":"Separating Hate Speech from Abusive Language on Indonesian Twitter","authors":"Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo","doi":"10.1109/ICoDSA55874.2022.9862850","DOIUrl":null,"url":null,"abstract":"Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Data Science and Its Applications (ICoDSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDSA55874.2022.9862850","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.
区分印尼推特上的仇恨言论和辱骂语言
社交媒体是人们联系和传播信息的有效工具。然而,许多人经常利用社交媒体传播仇恨言论和辱骂性语言。与仇恨言论相反,辱骂性语言经常被用作笑话,没有冒犯个人或团体的目的,即使它们可能包含亵渎。因此,仇恨言论和辱骂语言之间的区别往往是模糊的。在许多情况下,传播仇恨言论的个人可能会受到起诉,因为这涉及法律问题。之前的研究主要集中在仇恨言论和正常推文的二元分类上。本研究旨在对印尼Twitter上的仇恨言论、辱骂语言和正常信息进行分类。一些机器学习模型,如逻辑回归和BERT模型,被用来完成文本分类任务。模型的性能使用F1-Score评估指标进行评估。结果表明,BERT模型在F1-Score方面优于其他模型,其中在社交媒体文本数据上进行预训练的BERT-indobenchmark模型的F1-Score最高,为85.59。这也表明使用社交媒体数据对BERT模型进行预训练可以显著改善分类模型。开发这种可以区分仇恨言论和辱骂性语言的分类模型将有助于个人防止具有法律影响的仇恨言论的传播。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信