Bridging the gap in online hate speech detection: A comparative analysis of BERT and traditional models for homophobic content identification on X/Twitter

Applied and Computational Engineering Pub Date : 2024-05-15 DOI:10.54254/2755-2721/64/20241346

Josh McGiff, Nikola S. Nikolov

{"title":"Bridging the gap in online hate speech detection: A comparative analysis of BERT and traditional models for homophobic content identification on X/Twitter","authors":"Josh McGiff, Nikola S. Nikolov","doi":"10.54254/2755-2721/64/20241346","DOIUrl":null,"url":null,"abstract":"Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.","PeriodicalId":350976,"journal":{"name":"Applied and Computational Engineering","volume":"54 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54254/2755-2721/64/20241346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.

查看原文本刊更多论文

缩小网络仇恨言论检测的差距：BERT 与传统模式在 X/Twitter 上识别仇视同性恋内容的比较分析

仇视同性恋是情感分析研究中经常被忽视的一个领域，我们的研究通过关注这一领域，填补了网络仇恨言论检测研究中的一个重大空白。利用先进的情感分析模型（尤其是 BERT）和传统的机器学习方法，我们开发了一种细致入微的方法来识别 X/Twitter 上的仇视同性恋内容。这项研究非常重要，因为在检测模型中，仇视同性恋的内容一直没有得到充分反映。我们的研究结果表明，虽然 BERT 的性能优于传统方法，但验证技术的选择会影响模型的性能。这凸显了语境理解在检测细微仇恨言论中的重要性。通过发布我们已知的最大的用于检测仇视同性恋行为的开源标记英语数据集、对各种模型性能的分析以及我们基于 BERT 的最强模型，我们的目标是提高网络安全和包容性。未来的工作将扩展到更广泛的 LGBTQIA+ 仇恨言论检测领域，解决来源多样化数据集的挑战。通过这项工作，我们将为打击网络仇恨的更大努力做出贡献，倡导更具包容性的数字环境。我们的研究不仅通过改进以往的研究成果，为有效检测仇视同性恋的内容提供了见解，还为仇恨言论分析的未来发展奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied and Computational Engineering

自引率

0.00%

发文量