{"title":"基于BERT和卷积神经网络的印地语仇恨语音检测","authors":"Shubham Shukla, Sushama Nagpal, Sangeeta Sabharwal","doi":"10.1109/ICCCIS56430.2022.10037649","DOIUrl":null,"url":null,"abstract":"Social media has become crucial in our lives; it inculcates our opinions by providing untreated information. Whether we might be not participating actively but indirectly everyone became part of its coverage. Wide spread of information over the internet without any validation made it hard to analyze the impact of misleading information. Cyber hate, which is used as a tool to incite violence against a group of people based on ethnicity, nationality, language, sexual orientation, religious faiths, etc., poses a disgraceful utilization of social media. Previous apposite studies reported hate speech mainly in the English language. Less effort has been made for the resource-constraint language such as Hindi, Marathi, Kannada, etc. This work entitles hate speech detection in low-resource Hindi language using BERT and Deep Convolution Neural Network. The proposed Hindi Hate Speech BERT Convolution Neural Network model intends to detect hate speech in real-time so that any harmful incidence can be avoided as early as possible. This model presents a two-stage architecture: In the first stage, we have applied a pre-trained BERT encoder to generate encodings. In the second stage, a convolution neural network followed by a sigmoid layer is used to detect text as hatred or non-hatred. Our model achieved 0.84 & 0.77 f1-score for Hasoc 2020 and Hasoc 2021 dataset respectively.","PeriodicalId":286808,"journal":{"name":"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hate Speech Detection in Hindi language using BERT and Convolution Neural Network\",\"authors\":\"Shubham Shukla, Sushama Nagpal, Sangeeta Sabharwal\",\"doi\":\"10.1109/ICCCIS56430.2022.10037649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media has become crucial in our lives; it inculcates our opinions by providing untreated information. Whether we might be not participating actively but indirectly everyone became part of its coverage. Wide spread of information over the internet without any validation made it hard to analyze the impact of misleading information. Cyber hate, which is used as a tool to incite violence against a group of people based on ethnicity, nationality, language, sexual orientation, religious faiths, etc., poses a disgraceful utilization of social media. Previous apposite studies reported hate speech mainly in the English language. Less effort has been made for the resource-constraint language such as Hindi, Marathi, Kannada, etc. This work entitles hate speech detection in low-resource Hindi language using BERT and Deep Convolution Neural Network. The proposed Hindi Hate Speech BERT Convolution Neural Network model intends to detect hate speech in real-time so that any harmful incidence can be avoided as early as possible. This model presents a two-stage architecture: In the first stage, we have applied a pre-trained BERT encoder to generate encodings. In the second stage, a convolution neural network followed by a sigmoid layer is used to detect text as hatred or non-hatred. Our model achieved 0.84 & 0.77 f1-score for Hasoc 2020 and Hasoc 2021 dataset respectively.\",\"PeriodicalId\":286808,\"journal\":{\"name\":\"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCIS56430.2022.10037649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCIS56430.2022.10037649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hate Speech Detection in Hindi language using BERT and Convolution Neural Network
Social media has become crucial in our lives; it inculcates our opinions by providing untreated information. Whether we might be not participating actively but indirectly everyone became part of its coverage. Wide spread of information over the internet without any validation made it hard to analyze the impact of misleading information. Cyber hate, which is used as a tool to incite violence against a group of people based on ethnicity, nationality, language, sexual orientation, religious faiths, etc., poses a disgraceful utilization of social media. Previous apposite studies reported hate speech mainly in the English language. Less effort has been made for the resource-constraint language such as Hindi, Marathi, Kannada, etc. This work entitles hate speech detection in low-resource Hindi language using BERT and Deep Convolution Neural Network. The proposed Hindi Hate Speech BERT Convolution Neural Network model intends to detect hate speech in real-time so that any harmful incidence can be avoided as early as possible. This model presents a two-stage architecture: In the first stage, we have applied a pre-trained BERT encoder to generate encodings. In the second stage, a convolution neural network followed by a sigmoid layer is used to detect text as hatred or non-hatred. Our model achieved 0.84 & 0.77 f1-score for Hasoc 2020 and Hasoc 2021 dataset respectively.