Detecting Hate Speech and Offensive Language using Machine Learning in Published Online Content

C. Sinyangwe, D. Kunda, William Phiri Abwino
{"title":"Detecting Hate Speech and Offensive Language using Machine Learning in Published Online Content","authors":"C. Sinyangwe, D. Kunda, William Phiri Abwino","doi":"10.33260/zictjournal.v7i1.143","DOIUrl":null,"url":null,"abstract":"Businesses are more concerned than ever about hate speech content as most brand communication and advertising move online. Different organisations may be incharge of their products and services but they do not have complete control over their content posted online via their website and social media channels, they have no control over what online users post or comment about their brand. As a result, it became imperative in our study to develop a model that will identify hate speechand, offensive language and detect cyber offence in online published content using machine learning. This study employed an experimental design to develop a detection model for determining which agile methodologies were preferred as a suitable development methodology. Deep learning and HateSonar was used to detect hate speech and offensive language in posted content. This study used data from Twitter and Facebook to detect hate speech. The text was classified as either hate speech, offensive language, or both. During the reconnaissance phase, the combined data (structured and unstructured) was obtained from kaggle.com. The combined data was stored in the database as raw data. This revealed that hate speech and offensive language exist everywhere in the world, and the trend of the vices is on the rise. Using machine learning, the researchers successfully developed a model for detecting offensive language and hate speech on online social media platforms. The labelling in the model makes it simple to categorise data in a meaningful and readable manner. The study establishes that in fore model to detect hate speech and offensive language on online social media platforms, the data set must be categorised and presented in statistical form after running the model; the count indicates the total number of data sets imported. The mean for each category, as well as the standard deviation and the minimum and maximum number of tweets in each category, are also displayed. The study established that preventing online platform abuse in Zambia requires a comprehensive approach that involves government law, responsible platform policies and practices, as well as individual responsibility and accountability. In accordance with this goal, the research was effective in developing the detection model. To guarantee that the model was completely functional, it was trained on the English dataset before being applied to the local language dataset. This was because of the fact that training deep learning models with local datasets can present a number of challenges, such as limited, biased data, data privacy, resource requirements, and model maintenance. However, the efficacy of these systems varies, and there have been concerns raised about the inherent biases and limitations of automatic moderation techniques. The study recommends that future studies consider other sources of information such as Facebook, WhatsApp, Instagram, and other social media platforms, as well as consider harvesting local data sets for training machines rather than relying on foreign data sets; the local data set can then be used to detect offences targeting Zambian citizens on local platforms.","PeriodicalId":206279,"journal":{"name":"Zambia ICT Journal","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zambia ICT Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33260/zictjournal.v7i1.143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Businesses are more concerned than ever about hate speech content as most brand communication and advertising move online. Different organisations may be incharge of their products and services but they do not have complete control over their content posted online via their website and social media channels, they have no control over what online users post or comment about their brand. As a result, it became imperative in our study to develop a model that will identify hate speechand, offensive language and detect cyber offence in online published content using machine learning. This study employed an experimental design to develop a detection model for determining which agile methodologies were preferred as a suitable development methodology. Deep learning and HateSonar was used to detect hate speech and offensive language in posted content. This study used data from Twitter and Facebook to detect hate speech. The text was classified as either hate speech, offensive language, or both. During the reconnaissance phase, the combined data (structured and unstructured) was obtained from kaggle.com. The combined data was stored in the database as raw data. This revealed that hate speech and offensive language exist everywhere in the world, and the trend of the vices is on the rise. Using machine learning, the researchers successfully developed a model for detecting offensive language and hate speech on online social media platforms. The labelling in the model makes it simple to categorise data in a meaningful and readable manner. The study establishes that in fore model to detect hate speech and offensive language on online social media platforms, the data set must be categorised and presented in statistical form after running the model; the count indicates the total number of data sets imported. The mean for each category, as well as the standard deviation and the minimum and maximum number of tweets in each category, are also displayed. The study established that preventing online platform abuse in Zambia requires a comprehensive approach that involves government law, responsible platform policies and practices, as well as individual responsibility and accountability. In accordance with this goal, the research was effective in developing the detection model. To guarantee that the model was completely functional, it was trained on the English dataset before being applied to the local language dataset. This was because of the fact that training deep learning models with local datasets can present a number of challenges, such as limited, biased data, data privacy, resource requirements, and model maintenance. However, the efficacy of these systems varies, and there have been concerns raised about the inherent biases and limitations of automatic moderation techniques. The study recommends that future studies consider other sources of information such as Facebook, WhatsApp, Instagram, and other social media platforms, as well as consider harvesting local data sets for training machines rather than relying on foreign data sets; the local data set can then be used to detect offences targeting Zambian citizens on local platforms.
使用机器学习检测在线发布内容中的仇恨言论和攻击性语言
随着大多数品牌传播和广告转移到网上,企业比以往任何时候都更加关注仇恨言论的内容。不同的组织可能对他们的产品和服务负责,但他们无法完全控制通过他们的网站和社交媒体渠道在网上发布的内容,他们无法控制在线用户发布或评论他们的品牌。因此,在我们的研究中,开发一个模型变得势在必行,该模型将使用机器学习识别仇恨言论、攻击性语言,并检测在线发布内容中的网络犯罪。本研究采用实验设计来开发一种检测模型,用于确定哪种敏捷方法更适合作为一种开发方法。深度学习和HateSonar被用来检测发布内容中的仇恨言论和攻击性语言。这项研究使用了Twitter和Facebook的数据来检测仇恨言论。这篇文章要么被归类为仇恨言论,要么被归类为攻击性语言,要么两者兼而有之。在侦察阶段,从kaggle.com获得了组合数据(结构化和非结构化)。合并后的数据作为原始数据存储在数据库中。这表明,仇恨言论和攻击性语言在世界各地都存在,恶习的趋势正在上升。利用机器学习,研究人员成功开发了一个模型,用于检测在线社交媒体平台上的攻击性语言和仇恨言论。模型中的标签使得以有意义和可读的方式对数据进行分类变得简单。该研究表明,在检测在线社交媒体平台上的仇恨言论和攻击性语言的模型中,运行模型后必须对数据集进行分类并以统计形式呈现;计数表示导入的数据集总数。还显示了每个类别的平均值,以及每个类别的标准差和最小和最大tweet数。该研究确定,在赞比亚防止网络平台滥用需要一种综合方法,包括政府法律、负责任的平台政策和做法,以及个人责任和问责制。根据这一目标,本研究在开发检测模型方面是有效的。为了保证模型的完全功能,在应用于本地语言数据集之前,对其进行了英语数据集的训练。这是因为使用本地数据集训练深度学习模型可能会带来许多挑战,例如有限的、有偏差的数据、数据隐私、资源需求和模型维护。然而,这些系统的功效各不相同,并且有人担心自动调节技术的固有偏见和局限性。该研究建议,未来的研究应考虑其他信息来源,如Facebook、WhatsApp、Instagram和其他社交媒体平台,并考虑收集本地数据集用于训练机器,而不是依赖外国数据集;当地的数据集可以用来检测当地平台上针对赞比亚公民的犯罪行为。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信