Classifying Hate Speech Using a Two-Layer Model

IF 1.5 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS
Yi-jie Tang, Nicole M. Dalzell
{"title":"Classifying Hate Speech Using a Two-Layer Model","authors":"Yi-jie Tang, Nicole M. Dalzell","doi":"10.1080/2330443x.2019.1660285","DOIUrl":null,"url":null,"abstract":"ABSTRACT Social media and other online sites are being increasingly scrutinized as platforms for cyberbullying and hate speech. Many machine learning algorithms, such as support vector machines, have been adopted to create classification tools to identify and potentially filter patterns of negative speech. While effective for prediction, these methodologies yield models that are difficult to interpret. In addition, many studies focus on classifying comments as either negative or neutral, rather than further separating negative comments into subcategories. To address both of these concerns, we introduce a two-stage model for classifying text. With this model, we illustrate the use of internal lexicons, collections of words generated from a pre-classified training dataset of comments that are specific to several subcategories of negative comments. In the first stage, a machine learning algorithm classifies each comment as negative or neutral, or more generally target or nontarget. The second stage of model building leverages the internal lexicons (called L2CLs) to create features specific to each subcategory. These features, along with others, are then used in a random forest model to classify the comments into the subcategories of interest. We demonstrate our approach using two sets of data. Supplementary materials for this article are available online.","PeriodicalId":43397,"journal":{"name":"Statistics and Public Policy","volume":"6 1","pages":"80 - 86"},"PeriodicalIF":1.5000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/2330443x.2019.1660285","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Public Policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2330443x.2019.1660285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 7

Abstract

ABSTRACT Social media and other online sites are being increasingly scrutinized as platforms for cyberbullying and hate speech. Many machine learning algorithms, such as support vector machines, have been adopted to create classification tools to identify and potentially filter patterns of negative speech. While effective for prediction, these methodologies yield models that are difficult to interpret. In addition, many studies focus on classifying comments as either negative or neutral, rather than further separating negative comments into subcategories. To address both of these concerns, we introduce a two-stage model for classifying text. With this model, we illustrate the use of internal lexicons, collections of words generated from a pre-classified training dataset of comments that are specific to several subcategories of negative comments. In the first stage, a machine learning algorithm classifies each comment as negative or neutral, or more generally target or nontarget. The second stage of model building leverages the internal lexicons (called L2CLs) to create features specific to each subcategory. These features, along with others, are then used in a random forest model to classify the comments into the subcategories of interest. We demonstrate our approach using two sets of data. Supplementary materials for this article are available online.
基于双层模型的仇恨言语分类
摘要社交媒体和其他网站作为网络欺凌和仇恨言论的平台,正受到越来越多的审查。许多机器学习算法,如支持向量机,已被用于创建分类工具,以识别并潜在地过滤负面语音的模式。虽然这些方法对预测有效,但产生的模型很难解释。此外,许多研究侧重于将评论分类为负面或中性,而不是将负面评论进一步划分为子类别。为了解决这两个问题,我们引入了一个两阶段的文本分类模型。通过这个模型,我们说明了内部词典的使用,这些词典是从预先分类的评论训练数据集中生成的单词集合,这些评论特定于负面评论的几个子类别。在第一阶段,机器学习算法将每条评论分类为负面或中性,或者更一般地为目标或非目标。模型构建的第二阶段利用内部词典(称为L2CL)来创建每个子类别特有的特征。然后,在随机森林模型中使用这些特征和其他特征,将评论分类到感兴趣的子类别中。我们使用两组数据来演示我们的方法。本文的补充材料可在线获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistics and Public Policy
Statistics and Public Policy SOCIAL SCIENCES, MATHEMATICAL METHODS-
CiteScore
3.20
自引率
6.20%
发文量
13
审稿时长
32 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信