Atul Kumar Srivastava , Mitali Srivastava , Sanchali Das , Vikas Jain , Tej Bahadur Chandra
{"title":"Leveraging Deep Learning for Comprehensive Multilingual Hate Speech Detection","authors":"Atul Kumar Srivastava , Mitali Srivastava , Sanchali Das , Vikas Jain , Tej Bahadur Chandra","doi":"10.1016/j.procs.2025.01.044","DOIUrl":null,"url":null,"abstract":"<div><div>Multilingual hate speech detection is a developing field of investigation that concentrates on the challenge of recognizing harmful content in various languages. With the explosive growth of social media platforms and the worldwide character of cyber communication, identifying hate speech in many language situations has become crucial. Existing detection models often struggle with language-specific nuances, cultural differences, and limited resources for less commonly spoken languages. This article conducts a widespread investigation of multilingual hate speech across 11 languages sourced from various datasets. By utilizing methods such as natural language processing (NLP), machine learning, and deep learning, researchers aim to create models that can effectively generalize across different languages. In low-resource scenarios, simpler models like LASER embeddings combined with logistic regression outperform ELMo-based models in high-resource contexts. The main objective of this research work lies in demonstrating promising results across a range of languages by integrating deep learning algorithms with multilingual pre-trained language models such as LASER and ELMo.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"252 ","pages":"Pages 832-840"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050925000444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multilingual hate speech detection is a developing field of investigation that concentrates on the challenge of recognizing harmful content in various languages. With the explosive growth of social media platforms and the worldwide character of cyber communication, identifying hate speech in many language situations has become crucial. Existing detection models often struggle with language-specific nuances, cultural differences, and limited resources for less commonly spoken languages. This article conducts a widespread investigation of multilingual hate speech across 11 languages sourced from various datasets. By utilizing methods such as natural language processing (NLP), machine learning, and deep learning, researchers aim to create models that can effectively generalize across different languages. In low-resource scenarios, simpler models like LASER embeddings combined with logistic regression outperform ELMo-based models in high-resource contexts. The main objective of this research work lies in demonstrating promising results across a range of languages by integrating deep learning algorithms with multilingual pre-trained language models such as LASER and ELMo.