利用时间相关的词汇特征进行攻击性语言检测

Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP) Pub Date : 1900-01-01 DOI:10.18653/v1/2022.evonlp-1.7

Barbara McGillivray, Malithi Alahapperuma, Jonathan Cook, C. Di Bonaventura, Albert Meroño-Peñuela, Gareth Tyson, Steven R. Wilson

{"title":"利用时间相关的词汇特征进行攻击性语言检测","authors":"Barbara McGillivray, Malithi Alahapperuma, Jonathan Cook, C. Di Bonaventura, Albert Meroño-Peñuela, Gareth Tyson, Steven R. Wilson","doi":"10.18653/v1/2022.evonlp-1.7","DOIUrl":null,"url":null,"abstract":"We present a study on the integration of time-sensitive information in lexicon-based offensive language detection systems. Our focus is on Offenseval sub-task A, aimed at detecting offensive tweets. We apply a semantic change detection algorithm over a short time span of two years to detect words whose semantics has changed and we focus particularly on those words that acquired or lost an offensive meaning between 2019 and 2020. Using the output of this semantic change detection approach, we train an SVM classifier on the Offenseval 2019 training set. We build on the already competitive SINAI system submitted to Offenseval 2019 by adding new lexical features, including those that capture the change in usage of words and their association with emerging offensive usages. We discuss the challenges, opportunities and limitations of integrating semantic change detection in offensive language detection models. Our work draws attention to an often neglected aspect of offensive language, namely that the meanings of words are constantly evolving and that NLP systems that account for this change can achieve good performance even when not trained on the most recent training data.","PeriodicalId":158578,"journal":{"name":"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Leveraging time-dependent lexical features for offensive language detection\",\"authors\":\"Barbara McGillivray, Malithi Alahapperuma, Jonathan Cook, C. Di Bonaventura, Albert Meroño-Peñuela, Gareth Tyson, Steven R. Wilson\",\"doi\":\"10.18653/v1/2022.evonlp-1.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a study on the integration of time-sensitive information in lexicon-based offensive language detection systems. Our focus is on Offenseval sub-task A, aimed at detecting offensive tweets. We apply a semantic change detection algorithm over a short time span of two years to detect words whose semantics has changed and we focus particularly on those words that acquired or lost an offensive meaning between 2019 and 2020. Using the output of this semantic change detection approach, we train an SVM classifier on the Offenseval 2019 training set. We build on the already competitive SINAI system submitted to Offenseval 2019 by adding new lexical features, including those that capture the change in usage of words and their association with emerging offensive usages. We discuss the challenges, opportunities and limitations of integrating semantic change detection in offensive language detection models. Our work draws attention to an often neglected aspect of offensive language, namely that the meanings of words are constantly evolving and that NLP systems that account for this change can achieve good performance even when not trained on the most recent training data.\",\"PeriodicalId\":158578,\"journal\":{\"name\":\"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.evonlp-1.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.evonlp-1.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文对基于词典的攻击性语言检测系统中时间敏感信息的集成进行了研究。我们的重点是攻击性子任务A，旨在检测攻击性推文。我们在两年的短时间内应用语义变化检测算法来检测语义发生变化的单词，我们特别关注那些在2019年至2020年期间获得或失去冒犯性含义的单词。使用这种语义变化检测方法的输出，我们在Offenseval 2019训练集上训练SVM分类器。我们在提交给Offenseval 2019的竞争性西奈系统的基础上，增加了新的词汇功能，包括那些捕捉单词用法变化及其与新出现的攻击性用法的关联的功能。我们讨论了在攻击性语言检测模型中集成语义变化检测的挑战、机遇和局限性。我们的工作引起了人们对攻击性语言一个经常被忽视的方面的关注，即单词的含义是不断变化的，而考虑到这种变化的NLP系统即使在没有接受最新训练数据训练的情况下也能取得良好的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Leveraging time-dependent lexical features for offensive language detection

We present a study on the integration of time-sensitive information in lexicon-based offensive language detection systems. Our focus is on Offenseval sub-task A, aimed at detecting offensive tweets. We apply a semantic change detection algorithm over a short time span of two years to detect words whose semantics has changed and we focus particularly on those words that acquired or lost an offensive meaning between 2019 and 2020. Using the output of this semantic change detection approach, we train an SVM classifier on the Offenseval 2019 training set. We build on the already competitive SINAI system submitted to Offenseval 2019 by adding new lexical features, including those that capture the change in usage of words and their association with emerging offensive usages. We discuss the challenges, opportunities and limitations of integrating semantic change detection in offensive language detection models. Our work draws attention to an often neglected aspect of offensive language, namely that the meanings of words are constantly evolving and that NLP systems that account for this change can achieve good performance even when not trained on the most recent training data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)

自引率

0.00%

发文量