使用机器学习和深度学习检测仇恨推文

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) Pub Date : 2020-12-01 DOI:10.1109/TrustCom50675.2020.00103

Lida Ketsbaia, B. Issac, Xiaomin Chen

{"title":"使用机器学习和深度学习检测仇恨推文","authors":"Lida Ketsbaia, B. Issac, Xiaomin Chen","doi":"10.1109/TrustCom50675.2020.00103","DOIUrl":null,"url":null,"abstract":"Cyberbullying has become a highly problematic occurrence due to its potential of anonymity and its ease for others to join in the harassment of victims. The distancing effect that technological devices have, has led to cyberbullies say and do harsher things compared to what is typical in a traditional face-to-face bullying situation. Given the great importance of the problem, detection is becoming a key area of cyberbullying research. Therefore, it is highly necessary for a framework to accurately detect new cyberbullying instances automatically. To review the machine learning and deep learning approaches, two datasets were used. The first dataset was provided by the University of Maryland consisting of over 30,000 tweets, whereas the second dataset was based on the article ‘Automated Hate Speech Detection and the Problem of Offensive Language’ by Davidson et al., containing roughly 25,000 tweets. The paper explores machine learning approaches using word embeddings such as DBOW (Distributed Bag of Words) and DMM (Distributed Memory Mean) and the performance of Word2vec Convolutional Neural Networks (CNNs) to classify online hate.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Detection of Hate Tweets using Machine Learning and Deep Learning\",\"authors\":\"Lida Ketsbaia, B. Issac, Xiaomin Chen\",\"doi\":\"10.1109/TrustCom50675.2020.00103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cyberbullying has become a highly problematic occurrence due to its potential of anonymity and its ease for others to join in the harassment of victims. The distancing effect that technological devices have, has led to cyberbullies say and do harsher things compared to what is typical in a traditional face-to-face bullying situation. Given the great importance of the problem, detection is becoming a key area of cyberbullying research. Therefore, it is highly necessary for a framework to accurately detect new cyberbullying instances automatically. To review the machine learning and deep learning approaches, two datasets were used. The first dataset was provided by the University of Maryland consisting of over 30,000 tweets, whereas the second dataset was based on the article ‘Automated Hate Speech Detection and the Problem of Offensive Language’ by Davidson et al., containing roughly 25,000 tweets. The paper explores machine learning approaches using word embeddings such as DBOW (Distributed Bag of Words) and DMM (Distributed Memory Mean) and the performance of Word2vec Convolutional Neural Networks (CNNs) to classify online hate.\",\"PeriodicalId\":221956,\"journal\":{\"name\":\"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TrustCom50675.2020.00103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

网络欺凌已经成为一个非常严重的问题，因为它可能是匿名的，而且很容易让其他人加入对受害者的骚扰。与传统的面对面欺凌相比，技术设备带来的距离效应导致网络欺凌者说的话和做的事情更加严厉。鉴于这一问题的重要性，检测正成为网络欺凌研究的一个关键领域。因此，一个能够准确自动检测新的网络欺凌实例的框架是非常必要的。为了回顾机器学习和深度学习方法，我们使用了两个数据集。第一个数据集由马里兰大学提供，包含超过30,000条推文，而第二个数据集基于Davidson等人的文章“自动仇恨言论检测和攻击性语言问题”，包含大约25,000条推文。本文探索了使用词嵌入的机器学习方法，如DBOW(分布式词包)和DMM(分布式记忆均值)，以及Word2vec卷积神经网络(cnn)的性能来对在线仇恨进行分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detection of Hate Tweets using Machine Learning and Deep Learning

Cyberbullying has become a highly problematic occurrence due to its potential of anonymity and its ease for others to join in the harassment of victims. The distancing effect that technological devices have, has led to cyberbullies say and do harsher things compared to what is typical in a traditional face-to-face bullying situation. Given the great importance of the problem, detection is becoming a key area of cyberbullying research. Therefore, it is highly necessary for a framework to accurately detect new cyberbullying instances automatically. To review the machine learning and deep learning approaches, two datasets were used. The first dataset was provided by the University of Maryland consisting of over 30,000 tweets, whereas the second dataset was based on the article ‘Automated Hate Speech Detection and the Problem of Offensive Language’ by Davidson et al., containing roughly 25,000 tweets. The paper explores machine learning approaches using word embeddings such as DBOW (Distributed Bag of Words) and DMM (Distributed Memory Mean) and the performance of Word2vec Convolutional Neural Networks (CNNs) to classify online hate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

自引率

0.00%

发文量