基于BERT和卷积神经网络的印地语仇恨语音检测

2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) Pub Date : 2022-11-04 DOI:10.1109/ICCCIS56430.2022.10037649

Shubham Shukla, Sushama Nagpal, Sangeeta Sabharwal

{"title":"基于BERT和卷积神经网络的印地语仇恨语音检测","authors":"Shubham Shukla, Sushama Nagpal, Sangeeta Sabharwal","doi":"10.1109/ICCCIS56430.2022.10037649","DOIUrl":null,"url":null,"abstract":"Social media has become crucial in our lives; it inculcates our opinions by providing untreated information. Whether we might be not participating actively but indirectly everyone became part of its coverage. Wide spread of information over the internet without any validation made it hard to analyze the impact of misleading information. Cyber hate, which is used as a tool to incite violence against a group of people based on ethnicity, nationality, language, sexual orientation, religious faiths, etc., poses a disgraceful utilization of social media. Previous apposite studies reported hate speech mainly in the English language. Less effort has been made for the resource-constraint language such as Hindi, Marathi, Kannada, etc. This work entitles hate speech detection in low-resource Hindi language using BERT and Deep Convolution Neural Network. The proposed Hindi Hate Speech BERT Convolution Neural Network model intends to detect hate speech in real-time so that any harmful incidence can be avoided as early as possible. This model presents a two-stage architecture: In the first stage, we have applied a pre-trained BERT encoder to generate encodings. In the second stage, a convolution neural network followed by a sigmoid layer is used to detect text as hatred or non-hatred. Our model achieved 0.84 & 0.77 f1-score for Hasoc 2020 and Hasoc 2021 dataset respectively.","PeriodicalId":286808,"journal":{"name":"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hate Speech Detection in Hindi language using BERT and Convolution Neural Network\",\"authors\":\"Shubham Shukla, Sushama Nagpal, Sangeeta Sabharwal\",\"doi\":\"10.1109/ICCCIS56430.2022.10037649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media has become crucial in our lives; it inculcates our opinions by providing untreated information. Whether we might be not participating actively but indirectly everyone became part of its coverage. Wide spread of information over the internet without any validation made it hard to analyze the impact of misleading information. Cyber hate, which is used as a tool to incite violence against a group of people based on ethnicity, nationality, language, sexual orientation, religious faiths, etc., poses a disgraceful utilization of social media. Previous apposite studies reported hate speech mainly in the English language. Less effort has been made for the resource-constraint language such as Hindi, Marathi, Kannada, etc. This work entitles hate speech detection in low-resource Hindi language using BERT and Deep Convolution Neural Network. The proposed Hindi Hate Speech BERT Convolution Neural Network model intends to detect hate speech in real-time so that any harmful incidence can be avoided as early as possible. This model presents a two-stage architecture: In the first stage, we have applied a pre-trained BERT encoder to generate encodings. In the second stage, a convolution neural network followed by a sigmoid layer is used to detect text as hatred or non-hatred. Our model achieved 0.84 & 0.77 f1-score for Hasoc 2020 and Hasoc 2021 dataset respectively.\",\"PeriodicalId\":286808,\"journal\":{\"name\":\"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCIS56430.2022.10037649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCIS56430.2022.10037649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

社交媒体在我们的生活中已经变得至关重要;它通过提供未经处理的信息灌输我们的观点。无论我们是否积极参与，但间接地，每个人都成为其报道的一部分。在没有任何验证的情况下，信息在互联网上广泛传播，这使得很难分析误导性信息的影响。网络仇恨是一种利用社交媒体的可耻行为，它被用作一种工具，以种族、国籍、语言、性取向、宗教信仰等为由煽动对一群人的暴力。此前的相关研究报告称，仇恨言论主要出现在英语中。对于资源受限的语言，如印地语、马拉地语、卡纳达语等，所做的努力较少。这项工作的标题是利用BERT和深度卷积神经网络在低资源的印地语中检测仇恨言论。提出的印地语仇恨言论BERT卷积神经网络模型旨在实时检测仇恨言论，以便尽早避免任何有害事件的发生。该模型提出了一个两阶段的架构:在第一阶段，我们应用了一个预训练的BERT编码器来生成编码。在第二阶段，使用一个卷积神经网络，然后使用一个s形层来检测文本是仇恨还是非仇恨。我们的模型在Hasoc 2020和Hasoc 2021数据集上分别获得了0.84和0.77 f1得分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hate Speech Detection in Hindi language using BERT and Convolution Neural Network

Social media has become crucial in our lives; it inculcates our opinions by providing untreated information. Whether we might be not participating actively but indirectly everyone became part of its coverage. Wide spread of information over the internet without any validation made it hard to analyze the impact of misleading information. Cyber hate, which is used as a tool to incite violence against a group of people based on ethnicity, nationality, language, sexual orientation, religious faiths, etc., poses a disgraceful utilization of social media. Previous apposite studies reported hate speech mainly in the English language. Less effort has been made for the resource-constraint language such as Hindi, Marathi, Kannada, etc. This work entitles hate speech detection in low-resource Hindi language using BERT and Deep Convolution Neural Network. The proposed Hindi Hate Speech BERT Convolution Neural Network model intends to detect hate speech in real-time so that any harmful incidence can be avoided as early as possible. This model presents a two-stage architecture: In the first stage, we have applied a pre-trained BERT encoder to generate encodings. In the second stage, a convolution neural network followed by a sigmoid layer is used to detect text as hatred or non-hatred. Our model achieved 0.84 & 0.77 f1-score for Hasoc 2020 and Hasoc 2021 dataset respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)

自引率

0.00%

发文量