Offensive Language Detection on Social Media Based on Text Classification

P. Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, James H. Jones
{"title":"Offensive Language Detection on Social Media Based on Text Classification","authors":"P. Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, James H. Jones","doi":"10.1109/CCWC54503.2022.9720804","DOIUrl":null,"url":null,"abstract":"There is a concerning rise of offensive language on the content generated by the crowd over various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. In this study, we propose a model for text classification consisting of modular cleaning phase and tokenizer, three embedding methods, and eight classifiers. Our experiments shows a promising result for detection of offensive language on our dataset obtained from Twitter. Considering hyperparameter optimization, three methods of AdaBoost, SVM and MLP had highest average of F1-score on popular embedding method of TF-IDF.","PeriodicalId":101590,"journal":{"name":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC54503.2022.9720804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

There is a concerning rise of offensive language on the content generated by the crowd over various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. In this study, we propose a model for text classification consisting of modular cleaning phase and tokenizer, three embedding methods, and eight classifiers. Our experiments shows a promising result for detection of offensive language on our dataset obtained from Twitter. Considering hyperparameter optimization, three methods of AdaBoost, SVM and MLP had highest average of F1-score on popular embedding method of TF-IDF.
基于文本分类的社交媒体攻击性语言检测
在各种社交平台上,由人群产生的内容中出现了令人担忧的攻击性语言。这样的语言可能会欺负或伤害个人或社区的感情。最近,研究界调查并开发了不同的监督方法和训练数据集,以自动检测或防止攻击性独白或对话。在这项研究中,我们提出了一个文本分类模型,该模型由模块化的清洗阶段和标记器、三种嵌入方法和八个分类器组成。我们的实验显示,在我们从Twitter获得的数据集上检测攻击性语言的结果很有希望。考虑超参数优化,AdaBoost、SVM和MLP三种方法在TF-IDF的常用嵌入方法中f1得分平均值最高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信