Topic Words-Based Multilingual Hateful Linguistic Resources Construction for Developing Multilingual Hateful Content Detection Model Using Deep Learning Technique

IF 1.3 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Naol Bakala Defersha, Kula Kekeba Tune, Solomon Teferra Abate
{"title":"Topic Words-Based Multilingual Hateful Linguistic Resources Construction for Developing Multilingual Hateful Content Detection Model Using Deep Learning Technique","authors":"Naol Bakala Defersha,&nbsp;Kula Kekeba Tune,&nbsp;Solomon Teferra Abate","doi":"10.1049/ise2/6068177","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Nowadays, social media platforms provide space that allows communication and sharing of various resources using a variety of natural languages in different cultural and multilingual aspects. Although this interconnectedness offers numerous benefits, it also exposes users to the risk of encountering offensive (OFFN) and harmful content, including hateful speech. In order to create a model for detecting hateful content in resource-rich languages, lexicons, word embedding, topic modeling, and transformer language models were applied. Low-resource languages, including Ethiopian languages, suffering in lack of such linguistic resources. Multilingual hateful content detection brings complex challenges due to cultural and linguistic varieties. The paper proposes a multilingual hateful content identification model using a transformer language model and hybrid lexicon techniques to enhance hateful content recognition in low-resource Ethiopian languages. First, hateful content disseminated on Facebook in Ethiopian-languages target was identified as (insult, identity hate, antagonistic, and threat) using topic modeling techniques. Then, we compiled different hateful terms from sources such as guidelines and proclamations related to the Ethiopian context. We created Ethiopian context-based transformer language models. We utilized topic words-based datasets to construct pretrained transformer language models and multilingual lexicons of major Ethiopian languages. Finally, their performance was compared by integrating them into deep learning-based low-resource Ethiopian languages’ hateful content detection framework. Among applied deep learning algorithms with Ethiopian language linguistic resources, word2vec-based multilingual lexicons with convolutional neural network (CNN) outperform than others. The result indicated that constructing topic words based multilingual word2vec lexicons outperformed than transformers language model based on topics modeling for low-resource Ethiopian languages, effectively produce the promising hate speech (HATE) detection approach of low-resource Ethiopian languages.</p>\n </div>","PeriodicalId":50380,"journal":{"name":"IET Information Security","volume":"2025 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ise2/6068177","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Information Security","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ise2/6068177","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays, social media platforms provide space that allows communication and sharing of various resources using a variety of natural languages in different cultural and multilingual aspects. Although this interconnectedness offers numerous benefits, it also exposes users to the risk of encountering offensive (OFFN) and harmful content, including hateful speech. In order to create a model for detecting hateful content in resource-rich languages, lexicons, word embedding, topic modeling, and transformer language models were applied. Low-resource languages, including Ethiopian languages, suffering in lack of such linguistic resources. Multilingual hateful content detection brings complex challenges due to cultural and linguistic varieties. The paper proposes a multilingual hateful content identification model using a transformer language model and hybrid lexicon techniques to enhance hateful content recognition in low-resource Ethiopian languages. First, hateful content disseminated on Facebook in Ethiopian-languages target was identified as (insult, identity hate, antagonistic, and threat) using topic modeling techniques. Then, we compiled different hateful terms from sources such as guidelines and proclamations related to the Ethiopian context. We created Ethiopian context-based transformer language models. We utilized topic words-based datasets to construct pretrained transformer language models and multilingual lexicons of major Ethiopian languages. Finally, their performance was compared by integrating them into deep learning-based low-resource Ethiopian languages’ hateful content detection framework. Among applied deep learning algorithms with Ethiopian language linguistic resources, word2vec-based multilingual lexicons with convolutional neural network (CNN) outperform than others. The result indicated that constructing topic words based multilingual word2vec lexicons outperformed than transformers language model based on topics modeling for low-resource Ethiopian languages, effectively produce the promising hate speech (HATE) detection approach of low-resource Ethiopian languages.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
IET Information Security
IET Information Security 工程技术-计算机:理论方法
CiteScore
3.80
自引率
7.10%
发文量
47
审稿时长
8.6 months
期刊介绍: IET Information Security publishes original research papers in the following areas of information security and cryptography. Submitting authors should specify clearly in their covering statement the area into which their paper falls. Scope: Access Control and Database Security Ad-Hoc Network Aspects Anonymity and E-Voting Authentication Block Ciphers and Hash Functions Blockchain, Bitcoin (Technical aspects only) Broadcast Encryption and Traitor Tracing Combinatorial Aspects Covert Channels and Information Flow Critical Infrastructures Cryptanalysis Dependability Digital Rights Management Digital Signature Schemes Digital Steganography Economic Aspects of Information Security Elliptic Curve Cryptography and Number Theory Embedded Systems Aspects Embedded Systems Security and Forensics Financial Cryptography Firewall Security Formal Methods and Security Verification Human Aspects Information Warfare and Survivability Intrusion Detection Java and XML Security Key Distribution Key Management Malware Multi-Party Computation and Threshold Cryptography Peer-to-peer Security PKIs Public-Key and Hybrid Encryption Quantum Cryptography Risks of using Computers Robust Networks Secret Sharing Secure Electronic Commerce Software Obfuscation Stream Ciphers Trust Models Watermarking and Fingerprinting Special Issues. Current Call for Papers: Security on Mobile and IoT devices - https://digital-library.theiet.org/files/IET_IFS_SMID_CFP.pdf
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信