Offensive Language Detection on Social Media Based on Text Classification

2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) Pub Date : 2022-01-26 DOI:10.1109/CCWC54503.2022.9720804

P. Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, James H. Jones

引用次数: 20

Abstract

There is a concerning rise of offensive language on the content generated by the crowd over various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. In this study, we propose a model for text classification consisting of modular cleaning phase and tokenizer, three embedding methods, and eight classifiers. Our experiments shows a promising result for detection of offensive language on our dataset obtained from Twitter. Considering hyperparameter optimization, three methods of AdaBoost, SVM and MLP had highest average of F1-score on popular embedding method of TF-IDF.

查看原文本刊更多论文

基于文本分类的社交媒体攻击性语言检测

在各种社交平台上，由人群产生的内容中出现了令人担忧的攻击性语言。这样的语言可能会欺负或伤害个人或社区的感情。最近，研究界调查并开发了不同的监督方法和训练数据集，以自动检测或防止攻击性独白或对话。在这项研究中，我们提出了一个文本分类模型，该模型由模块化的清洗阶段和标记器、三种嵌入方法和八个分类器组成。我们的实验显示，在我们从Twitter获得的数据集上检测攻击性语言的结果很有希望。考虑超参数优化，AdaBoost、SVM和MLP三种方法在TF-IDF的常用嵌入方法中f1得分平均值最高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)

自引率

0.00%

发文量