在线用户内容中的辱骂语言检测

Proceedings of the 25th International Conference on World Wide Web Pub Date : 2016-04-11 DOI:10.1145/2872427.2883062

Chikashi Nobata, Joel R. Tetreault, A. Thomas, Yashar Mehdad, Yi Chang

{"title":"在线用户内容中的辱骂语言检测","authors":"Chikashi Nobata, Joel R. Tetreault, A. Thomas, Yashar Mehdad, Yi Chang","doi":"10.1145/2872427.2883062","DOIUrl":null,"url":null,"abstract":"Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"62 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"953","resultStr":"{\"title\":\"Abusive Language Detection in Online User Content\",\"authors\":\"Chikashi Nobata, Joel R. Tetreault, A. Thomas, Yashar Mehdad, Yi Chang\",\"doi\":\"10.1145/2872427.2883062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.\",\"PeriodicalId\":20455,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on World Wide Web\",\"volume\":\"62 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"953\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on World Wide Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2872427.2883062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872427.2883062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 953

摘要

近年来，在用户生成的网络内容中检测辱骂性语言已成为一个日益重要的问题。目前大多数商业方法都使用黑名单和正则表达式，然而，这些措施在对付更微妙、不那么笨拙的仇恨言论时，效果不佳。在这项工作中，我们开发了一种基于机器学习的方法来检测来自两个领域的在线用户评论中的仇恨言论，该方法优于最先进的深度学习方法。我们还开发了一个针对辱骂性语言的用户评论语料库，这是同类中第一个。最后，我们使用我们的检测工具来分析不同时间和不同环境下的辱骂性语言，以进一步提高我们对这种行为的认识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Abusive Language Detection in Online User Content

Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th International Conference on World Wide Web

自引率

0.00%

发文量