Topic Words-Based Multilingual Hateful Linguistic Resources Construction for Developing Multilingual Hateful Content Detection Model Using Deep Learning Technique
IF 1.3 4区 计算机科学Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
{"title":"Topic Words-Based Multilingual Hateful Linguistic Resources Construction for Developing Multilingual Hateful Content Detection Model Using Deep Learning Technique","authors":"Naol Bakala Defersha, Kula Kekeba Tune, Solomon Teferra Abate","doi":"10.1049/ise2/6068177","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Nowadays, social media platforms provide space that allows communication and sharing of various resources using a variety of natural languages in different cultural and multilingual aspects. Although this interconnectedness offers numerous benefits, it also exposes users to the risk of encountering offensive (OFFN) and harmful content, including hateful speech. In order to create a model for detecting hateful content in resource-rich languages, lexicons, word embedding, topic modeling, and transformer language models were applied. Low-resource languages, including Ethiopian languages, suffering in lack of such linguistic resources. Multilingual hateful content detection brings complex challenges due to cultural and linguistic varieties. The paper proposes a multilingual hateful content identification model using a transformer language model and hybrid lexicon techniques to enhance hateful content recognition in low-resource Ethiopian languages. First, hateful content disseminated on Facebook in Ethiopian-languages target was identified as (insult, identity hate, antagonistic, and threat) using topic modeling techniques. Then, we compiled different hateful terms from sources such as guidelines and proclamations related to the Ethiopian context. We created Ethiopian context-based transformer language models. We utilized topic words-based datasets to construct pretrained transformer language models and multilingual lexicons of major Ethiopian languages. Finally, their performance was compared by integrating them into deep learning-based low-resource Ethiopian languages’ hateful content detection framework. Among applied deep learning algorithms with Ethiopian language linguistic resources, word2vec-based multilingual lexicons with convolutional neural network (CNN) outperform than others. The result indicated that constructing topic words based multilingual word2vec lexicons outperformed than transformers language model based on topics modeling for low-resource Ethiopian languages, effectively produce the promising hate speech (HATE) detection approach of low-resource Ethiopian languages.</p>\n </div>","PeriodicalId":50380,"journal":{"name":"IET Information Security","volume":"2025 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ise2/6068177","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Information Security","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ise2/6068177","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, social media platforms provide space that allows communication and sharing of various resources using a variety of natural languages in different cultural and multilingual aspects. Although this interconnectedness offers numerous benefits, it also exposes users to the risk of encountering offensive (OFFN) and harmful content, including hateful speech. In order to create a model for detecting hateful content in resource-rich languages, lexicons, word embedding, topic modeling, and transformer language models were applied. Low-resource languages, including Ethiopian languages, suffering in lack of such linguistic resources. Multilingual hateful content detection brings complex challenges due to cultural and linguistic varieties. The paper proposes a multilingual hateful content identification model using a transformer language model and hybrid lexicon techniques to enhance hateful content recognition in low-resource Ethiopian languages. First, hateful content disseminated on Facebook in Ethiopian-languages target was identified as (insult, identity hate, antagonistic, and threat) using topic modeling techniques. Then, we compiled different hateful terms from sources such as guidelines and proclamations related to the Ethiopian context. We created Ethiopian context-based transformer language models. We utilized topic words-based datasets to construct pretrained transformer language models and multilingual lexicons of major Ethiopian languages. Finally, their performance was compared by integrating them into deep learning-based low-resource Ethiopian languages’ hateful content detection framework. Among applied deep learning algorithms with Ethiopian language linguistic resources, word2vec-based multilingual lexicons with convolutional neural network (CNN) outperform than others. The result indicated that constructing topic words based multilingual word2vec lexicons outperformed than transformers language model based on topics modeling for low-resource Ethiopian languages, effectively produce the promising hate speech (HATE) detection approach of low-resource Ethiopian languages.
期刊介绍:
IET Information Security publishes original research papers in the following areas of information security and cryptography. Submitting authors should specify clearly in their covering statement the area into which their paper falls.
Scope:
Access Control and Database Security
Ad-Hoc Network Aspects
Anonymity and E-Voting
Authentication
Block Ciphers and Hash Functions
Blockchain, Bitcoin (Technical aspects only)
Broadcast Encryption and Traitor Tracing
Combinatorial Aspects
Covert Channels and Information Flow
Critical Infrastructures
Cryptanalysis
Dependability
Digital Rights Management
Digital Signature Schemes
Digital Steganography
Economic Aspects of Information Security
Elliptic Curve Cryptography and Number Theory
Embedded Systems Aspects
Embedded Systems Security and Forensics
Financial Cryptography
Firewall Security
Formal Methods and Security Verification
Human Aspects
Information Warfare and Survivability
Intrusion Detection
Java and XML Security
Key Distribution
Key Management
Malware
Multi-Party Computation and Threshold Cryptography
Peer-to-peer Security
PKIs
Public-Key and Hybrid Encryption
Quantum Cryptography
Risks of using Computers
Robust Networks
Secret Sharing
Secure Electronic Commerce
Software Obfuscation
Stream Ciphers
Trust Models
Watermarking and Fingerprinting
Special Issues. Current Call for Papers:
Security on Mobile and IoT devices - https://digital-library.theiet.org/files/IET_IFS_SMID_CFP.pdf