Effective multimodal hate speech detection on Facebook hate memes dataset using incremental PCA, SMOTE, and adversarial learning

IF 4.9

Machine learning with applications Pub Date : 2025-04-17 DOI:10.1016/j.mlwa.2025.100647

Emmanuel Ludivin Tchuindjang Tchokote , Elie Fute Tagne

{"title":"Effective multimodal hate speech detection on Facebook hate memes dataset using incremental PCA, SMOTE, and adversarial learning","authors":"Emmanuel Ludivin Tchuindjang Tchokote , Elie Fute Tagne","doi":"10.1016/j.mlwa.2025.100647","DOIUrl":null,"url":null,"abstract":"<div><div>The proliferation of harmful information, such as hate speech and online harassment, has increased in recent years due to social media's explosive expansion. Using the Facebook Hate Meme Dataset (FBHM), we create a reliable model in this work for identifying multimodal hate speech on online platforms. To effectively address class imbalance and improve classification accuracy, our hybrid model combines ResNet for image processing with RoBERTa for text analysis, leveraging Synthetic Minority Over-sampling Technique (SMOTE) and Incremental Principal Component Analysis (PCA) combined with adversarial machine learning techniques. The combination of Incremental PCA's dimensionality reduction and SMOTE's synthetic sample creation produces a potent combination that enhances the training dataset and maximizes feature representation, resulting in improved online content moderation techniques. We achieved an accuracy of 81.80 %, and a Macro-F1 score of 81.53 % on the FBHM dataset which represents an 18 % improvement in accuracy over the base model. These results provide significant novel insights into this important field of study by demonstrating the potential of adversarial approaches in creating reliable models for automated hate speech identification that can help create a safer online environment and can significantly reduce the emotional burden on human content moderators by handling the contents quickly and accurately. This study highlights the mutually beneficial effect of combining SMOTE and incremental PCA, demonstrating how they improve the model's ability to correct class imbalance and boost performance. The source code and dataset are publicly available on GitHub to facilitate reproducibility and further research. Link to the code and dataset below:</div><div><span><span>https://github.com/ludivintchokote/HatePostDetection</span><svg><path></path></svg></span></div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100647"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The proliferation of harmful information, such as hate speech and online harassment, has increased in recent years due to social media's explosive expansion. Using the Facebook Hate Meme Dataset (FBHM), we create a reliable model in this work for identifying multimodal hate speech on online platforms. To effectively address class imbalance and improve classification accuracy, our hybrid model combines ResNet for image processing with RoBERTa for text analysis, leveraging Synthetic Minority Over-sampling Technique (SMOTE) and Incremental Principal Component Analysis (PCA) combined with adversarial machine learning techniques. The combination of Incremental PCA's dimensionality reduction and SMOTE's synthetic sample creation produces a potent combination that enhances the training dataset and maximizes feature representation, resulting in improved online content moderation techniques. We achieved an accuracy of 81.80 %, and a Macro-F1 score of 81.53 % on the FBHM dataset which represents an 18 % improvement in accuracy over the base model. These results provide significant novel insights into this important field of study by demonstrating the potential of adversarial approaches in creating reliable models for automated hate speech identification that can help create a safer online environment and can significantly reduce the emotional burden on human content moderators by handling the contents quickly and accurately. This study highlights the mutually beneficial effect of combining SMOTE and incremental PCA, demonstrating how they improve the model's ability to correct class imbalance and boost performance. The source code and dataset are publicly available on GitHub to facilitate reproducibility and further research. Link to the code and dataset below:

https://github.com/ludivintchokote/HatePostDetection

查看原文本刊更多论文

使用增量PCA、SMOTE和对抗学习对Facebook仇恨模因数据集进行有效的多模态仇恨语音检测

近年来，由于社交媒体的爆炸性扩张，仇恨言论和网络骚扰等有害信息的扩散有所增加。使用Facebook仇恨模因数据集（FBHM），我们在这项工作中创建了一个可靠的模型，用于识别在线平台上的多模态仇恨言论。为了有效地解决类别不平衡和提高分类精度，我们的混合模型结合了用于图像处理的ResNet和用于文本分析的RoBERTa，利用合成少数过采样技术（SMOTE）和增量主成分分析（PCA）结合对抗性机器学习技术。增量PCA的降维和SMOTE的合成样本创建的结合产生了一个强有力的组合，增强了训练数据集并最大化了特征表示，从而改进了在线内容审核技术。我们在FBHM数据集上实现了81.80%的准确率，Macro-F1得分为81.53%，比基本模型的准确率提高了18%。这些结果通过展示对抗性方法在创建可靠的自动仇恨言论识别模型方面的潜力，为这一重要研究领域提供了重要的新见解，这些模型可以帮助创建一个更安全的在线环境，并可以通过快速准确地处理内容显著减轻人类内容审核员的情感负担。本研究强调了SMOTE和增量PCA相结合的互利效应，展示了它们如何提高模型纠正类不平衡和提高性能的能力。源代码和数据集在GitHub上公开提供，以促进再现性和进一步的研究。链接到下面的代码和数据集：https://github.com/ludivintchokote/HatePostDetection

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days