property -主动识别攻击性措辞以促进有效监管

2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) Pub Date : 2022-11-16 DOI:10.1109/ICECCME55909.2022.9987933

Clara Diesenreiter, O. Krauss, Simone Sandler, Andreas Stöckl

{"title":"property -主动识别攻击性措辞以促进有效监管","authors":"Clara Diesenreiter, O. Krauss, Simone Sandler, Andreas Stöckl","doi":"10.1109/ICECCME55909.2022.9987933","DOIUrl":null,"url":null,"abstract":"This work discusses and contains content that may be offensive or unsettling. Hateful communication has always been part of human interaction, even before the advent of social media. Nowadays, offensive content is spreading faster and wider through digital communication channels. To help improve regulation of hate speech, we introduce ProperBERT, a fine-tuned BERT model for hate speech and offensive language detection specific to English. To ensure the portability of our model, five data sets from literature were combined to train ProperBERT. The pooled dataset contains racist, homophobic, misogynistic and generally offensive statements. Due to the variety of statements, which differ mainly in the target the hate is aimed at and the obviousness of the hate, a sufficiently robust model was trained. ProperBERT shows stability on data sets that have not been used for training, while remaining efficiently usable due to its compact size. By performing portability tests on data sets not used for fine-tuning, it is shown that fine-tuning on large scale and varied data leads to increased model portability.","PeriodicalId":202568,"journal":{"name":"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ProperBERT - Proactive Recognition of Offensive Phrasing for Effective Regulation\",\"authors\":\"Clara Diesenreiter, O. Krauss, Simone Sandler, Andreas Stöckl\",\"doi\":\"10.1109/ICECCME55909.2022.9987933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work discusses and contains content that may be offensive or unsettling. Hateful communication has always been part of human interaction, even before the advent of social media. Nowadays, offensive content is spreading faster and wider through digital communication channels. To help improve regulation of hate speech, we introduce ProperBERT, a fine-tuned BERT model for hate speech and offensive language detection specific to English. To ensure the portability of our model, five data sets from literature were combined to train ProperBERT. The pooled dataset contains racist, homophobic, misogynistic and generally offensive statements. Due to the variety of statements, which differ mainly in the target the hate is aimed at and the obviousness of the hate, a sufficiently robust model was trained. ProperBERT shows stability on data sets that have not been used for training, while remaining efficiently usable due to its compact size. By performing portability tests on data sets not used for fine-tuning, it is shown that fine-tuning on large scale and varied data leads to increased model portability.\",\"PeriodicalId\":202568,\"journal\":{\"name\":\"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECCME55909.2022.9987933\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCME55909.2022.9987933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

此作品讨论并包含可能令人反感或不安的内容。甚至在社交媒体出现之前，仇恨沟通一直是人类互动的一部分。如今，攻击性内容通过数字传播渠道传播得更快、更广。为了帮助改善对仇恨言论的监管，我们引入了ProperBERT，这是一个针对英语的仇恨言论和攻击性语言检测的微调BERT模型。为了确保模型的可移植性，我们将文献中的5个数据集结合起来训练ProperBERT。汇集的数据集包含种族主义、同性恋恐惧症、厌女症和通常具有攻击性的言论。由于各种各样的陈述，这些陈述主要不同于仇恨的目标和仇恨的明显性，因此训练了一个足够鲁棒的模型。ProperBERT在未用于训练的数据集上显示出稳定性，同时由于其紧凑的尺寸而保持有效可用。通过对未用于微调的数据集执行可移植性测试，结果表明，对大规模和各种数据进行微调可以提高模型可移植性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ProperBERT - Proactive Recognition of Offensive Phrasing for Effective Regulation

This work discusses and contains content that may be offensive or unsettling. Hateful communication has always been part of human interaction, even before the advent of social media. Nowadays, offensive content is spreading faster and wider through digital communication channels. To help improve regulation of hate speech, we introduce ProperBERT, a fine-tuned BERT model for hate speech and offensive language detection specific to English. To ensure the portability of our model, five data sets from literature were combined to train ProperBERT. The pooled dataset contains racist, homophobic, misogynistic and generally offensive statements. Due to the variety of statements, which differ mainly in the target the hate is aimed at and the obviousness of the hate, a sufficiently robust model was trained. ProperBERT shows stability on data sets that have not been used for training, while remaining efficiently usable due to its compact size. By performing portability tests on data sets not used for fine-tuning, it is shown that fine-tuning on large scale and varied data leads to increased model portability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)

自引率

0.00%

发文量