基于多个深度神经网络的言语虐待分类

2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) Pub Date : 2021-04-13 DOI:10.1109/ICAIIC51459.2021.9415218

Hyunju Park, Hong Kook Kim

{"title":"基于多个深度神经网络的言语虐待分类","authors":"Hyunju Park, Hong Kook Kim","doi":"10.1109/ICAIIC51459.2021.9415218","DOIUrl":null,"url":null,"abstract":"People can be exposed to verbal abuse practically anywhere. It is considered to be one of serious issues in society. In this paper, we describe a method to classify verbal abuse into five lasses by adding a convolutional neural network (CNN), a long short-term memory, and a dense layer on top of bidirectional encoder representations from transformers (BERT). The data are collected from Korean drama, movies, and YouTube. Due to data imbalance, weighted random sampler and data augmentation are used to train the models to be generalized. Experiments show that BERT with CNN after data augmentation performs the highest accuracy among all the compared methods.","PeriodicalId":432977,"journal":{"name":"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Verbal Abuse Classification Using Multiple Deep Neural Networks\",\"authors\":\"Hyunju Park, Hong Kook Kim\",\"doi\":\"10.1109/ICAIIC51459.2021.9415218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People can be exposed to verbal abuse practically anywhere. It is considered to be one of serious issues in society. In this paper, we describe a method to classify verbal abuse into five lasses by adding a convolutional neural network (CNN), a long short-term memory, and a dense layer on top of bidirectional encoder representations from transformers (BERT). The data are collected from Korean drama, movies, and YouTube. Due to data imbalance, weighted random sampler and data augmentation are used to train the models to be generalized. Experiments show that BERT with CNN after data augmentation performs the highest accuracy among all the compared methods.\",\"PeriodicalId\":432977,\"journal\":{\"name\":\"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIIC51459.2021.9415218\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIIC51459.2021.9415218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

人们几乎在任何地方都可能遭受言语虐待。它被认为是一个严重的社会问题。在本文中，我们描述了一种方法，通过在变压器(BERT)的双向编码器表示之上添加卷积神经网络(CNN)、长短期记忆和密集层，将言语虐待分为五类。这些数据是从韩剧、电影、YouTube上收集的。由于数据不平衡，采用加权随机采样器和数据增广方法训练模型进行推广。实验表明，经过数据增强后的BERT与CNN相结合的方法在所有比较方法中准确率最高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Verbal Abuse Classification Using Multiple Deep Neural Networks

People can be exposed to verbal abuse practically anywhere. It is considered to be one of serious issues in society. In this paper, we describe a method to classify verbal abuse into five lasses by adding a convolutional neural network (CNN), a long short-term memory, and a dense layer on top of bidirectional encoder representations from transformers (BERT). The data are collected from Korean drama, movies, and YouTube. Due to data imbalance, weighted random sampler and data augmentation are used to train the models to be generalized. Experiments show that BERT with CNN after data augmentation performs the highest accuracy among all the compared methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)

自引率

0.00%

发文量