Hate speech detection in Twitter using hybrid embeddings and improved cuckoo search-based neural networks

Int. J. Intell. Comput. Cybern. Pub Date : 2020-11-03 DOI:10.1108/ijicc-06-2020-0061

F. E. Ayo, O. Folorunso, F. T. Ibharalu, I. Osinuga

{"title":"Hate speech detection in Twitter using hybrid embeddings and improved cuckoo search-based neural networks","authors":"F. E. Ayo, O. Folorunso, F. T. Ibharalu, I. Osinuga","doi":"10.1108/ijicc-06-2020-0061","DOIUrl":null,"url":null,"abstract":"PurposeHate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.Design/methodology/approachThis study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.FindingsThe proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.Research limitations/implicationsFinally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.Originality/valueThe main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.","PeriodicalId":352072,"journal":{"name":"Int. J. Intell. Comput. Cybern.","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Intell. Comput. Cybern.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijicc-06-2020-0061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

PurposeHate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.Design/methodology/approachThis study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.FindingsThe proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.Research limitations/implicationsFinally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.Originality/valueThe main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.

查看原文本刊更多论文

基于混合嵌入和改进的基于布谷鸟搜索的神经网络的Twitter仇恨言论检测

仇恨言论是强烈仇恨的一种表达。Twitter已经成为预测和监控滥用行为的流行分析工具。基于社交媒体数据的仇恨语音检测是近年来的研究热点，因此需要设计通用的元数据架构和高效的特征提取技术来增强仇恨语音检测。设计/方法/方法本研究提出了一种基于主题推理方法和改进的布谷鸟搜索神经网络的混合嵌入方法，用于Twitter数据中的仇恨言论检测。该方法采用词频-逆文档频率(TF-IDF)混合嵌入技术进行词级特征提取，长短期记忆(LSTM)是递归神经网络结构的一种变体，用于句子级特征提取。从混合嵌入中提取的特征然后作为输入输入到改进的布谷鸟搜索神经网络中，用于预测推文是仇恨言论、攻击性语言还是两者都不是。与其他相关方法相比，在收集的Twitter数据集上进行测试时，所提出的方法显示出更好的结果。为了验证所提方法的性能，我们使用t检验和事后多重比较来比较所提方法与其他相关的仇恨言论检测方法的显著性和方法。此外，还进行了配对样本t检验，以验证所提出方法与其他相关方法的性能。最后，评价结果表明，本文方法的平均f1得分为91.3，优于其他相关方法。独创性/价值本研究的主要新颖之处在于使用了基于naïve贝叶斯模型的自动主题识别度量来改进特征表示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Intell. Comput. Cybern.

自引率

0.00%

发文量