Bridging the gap in online hate speech detection: A comparative analysis of BERT and traditional models for homophobic content identification on X/Twitter

Josh McGiff, Nikola S. Nikolov
{"title":"Bridging the gap in online hate speech detection: A comparative analysis of BERT and traditional models for homophobic content identification on X/Twitter","authors":"Josh McGiff, Nikola S. Nikolov","doi":"10.54254/2755-2721/64/20241346","DOIUrl":null,"url":null,"abstract":"Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.","PeriodicalId":350976,"journal":{"name":"Applied and Computational Engineering","volume":"54 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54254/2755-2721/64/20241346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.
缩小网络仇恨言论检测的差距:BERT 与传统模式在 X/Twitter 上识别仇视同性恋内容的比较分析
仇视同性恋是情感分析研究中经常被忽视的一个领域,我们的研究通过关注这一领域,填补了网络仇恨言论检测研究中的一个重大空白。利用先进的情感分析模型(尤其是 BERT)和传统的机器学习方法,我们开发了一种细致入微的方法来识别 X/Twitter 上的仇视同性恋内容。这项研究非常重要,因为在检测模型中,仇视同性恋的内容一直没有得到充分反映。我们的研究结果表明,虽然 BERT 的性能优于传统方法,但验证技术的选择会影响模型的性能。这凸显了语境理解在检测细微仇恨言论中的重要性。通过发布我们已知的最大的用于检测仇视同性恋行为的开源标记英语数据集、对各种模型性能的分析以及我们基于 BERT 的最强模型,我们的目标是提高网络安全和包容性。未来的工作将扩展到更广泛的 LGBTQIA+ 仇恨言论检测领域,解决来源多样化数据集的挑战。通过这项工作,我们将为打击网络仇恨的更大努力做出贡献,倡导更具包容性的数字环境。我们的研究不仅通过改进以往的研究成果,为有效检测仇视同性恋的内容提供了见解,还为仇恨言论分析的未来发展奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信