{"title":"使用基于bert的深度学习方法对有毒评论的严重程度进行评级","authors":"Ziyu Zhai","doi":"10.1109/icet55676.2022.9825384","DOIUrl":null,"url":null,"abstract":"With the integration of Internet and smartphones into people’s lives, toxic comments become ubiquitous on various online social media. These comments hinder seriously the construction of a safe and healthy network environment, leading to a great demand for automated methods which can effectively identify such harmful information and deal with it in a timely manner. To address this challenge, we propose a BERT-based deep learning method in this paper to rate the severity of toxic comments. On the basis of the text dataset provided by Jigsaw, BERT-based backbones (RoBERTa and DeBERTa) are trained to extract contextualized embeddings from sentences. After that, corresponding severity scores of comments are calculated by the subsequent head layers, where the head is chosen from the multilayer perceptron, convolutional neural network, and attention structure. After applying the K-Fold cross validation and an average ensemble of different models, our method achieves a rank 28/2301 (top 1.2%) in the leaderboard of Jigsaw Rate Severity of Toxic Comments Kaggle competition. This result can get a silver medal in this competition, and proves that our model can be an effective approach to rate precisely the severity of a toxic comment. This work can remarkably reduce the workload of manual review of Internet content and help build a more harmonious online community environment.","PeriodicalId":166358,"journal":{"name":"2022 IEEE 5th International Conference on Electronics Technology (ICET)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Rating the Severity of Toxic Comments Using BERT-Based Deep Learning Method\",\"authors\":\"Ziyu Zhai\",\"doi\":\"10.1109/icet55676.2022.9825384\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the integration of Internet and smartphones into people’s lives, toxic comments become ubiquitous on various online social media. These comments hinder seriously the construction of a safe and healthy network environment, leading to a great demand for automated methods which can effectively identify such harmful information and deal with it in a timely manner. To address this challenge, we propose a BERT-based deep learning method in this paper to rate the severity of toxic comments. On the basis of the text dataset provided by Jigsaw, BERT-based backbones (RoBERTa and DeBERTa) are trained to extract contextualized embeddings from sentences. After that, corresponding severity scores of comments are calculated by the subsequent head layers, where the head is chosen from the multilayer perceptron, convolutional neural network, and attention structure. After applying the K-Fold cross validation and an average ensemble of different models, our method achieves a rank 28/2301 (top 1.2%) in the leaderboard of Jigsaw Rate Severity of Toxic Comments Kaggle competition. This result can get a silver medal in this competition, and proves that our model can be an effective approach to rate precisely the severity of a toxic comment. This work can remarkably reduce the workload of manual review of Internet content and help build a more harmonious online community environment.\",\"PeriodicalId\":166358,\"journal\":{\"name\":\"2022 IEEE 5th International Conference on Electronics Technology (ICET)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 5th International Conference on Electronics Technology (ICET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icet55676.2022.9825384\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Electronics Technology (ICET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icet55676.2022.9825384","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rating the Severity of Toxic Comments Using BERT-Based Deep Learning Method
With the integration of Internet and smartphones into people’s lives, toxic comments become ubiquitous on various online social media. These comments hinder seriously the construction of a safe and healthy network environment, leading to a great demand for automated methods which can effectively identify such harmful information and deal with it in a timely manner. To address this challenge, we propose a BERT-based deep learning method in this paper to rate the severity of toxic comments. On the basis of the text dataset provided by Jigsaw, BERT-based backbones (RoBERTa and DeBERTa) are trained to extract contextualized embeddings from sentences. After that, corresponding severity scores of comments are calculated by the subsequent head layers, where the head is chosen from the multilayer perceptron, convolutional neural network, and attention structure. After applying the K-Fold cross validation and an average ensemble of different models, our method achieves a rank 28/2301 (top 1.2%) in the leaderboard of Jigsaw Rate Severity of Toxic Comments Kaggle competition. This result can get a silver medal in this competition, and proves that our model can be an effective approach to rate precisely the severity of a toxic comment. This work can remarkably reduce the workload of manual review of Internet content and help build a more harmonious online community environment.