基于文本分类的社交媒体攻击性语言检测

2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) Pub Date : 2022-01-26 DOI:10.1109/CCWC54503.2022.9720804

P. Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, James H. Jones

{"title":"基于文本分类的社交媒体攻击性语言检测","authors":"P. Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, James H. Jones","doi":"10.1109/CCWC54503.2022.9720804","DOIUrl":null,"url":null,"abstract":"There is a concerning rise of offensive language on the content generated by the crowd over various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. In this study, we propose a model for text classification consisting of modular cleaning phase and tokenizer, three embedding methods, and eight classifiers. Our experiments shows a promising result for detection of offensive language on our dataset obtained from Twitter. Considering hyperparameter optimization, three methods of AdaBoost, SVM and MLP had highest average of F1-score on popular embedding method of TF-IDF.","PeriodicalId":101590,"journal":{"name":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Offensive Language Detection on Social Media Based on Text Classification\",\"authors\":\"P. Hajibabaee, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, James H. Jones\",\"doi\":\"10.1109/CCWC54503.2022.9720804\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a concerning rise of offensive language on the content generated by the crowd over various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. In this study, we propose a model for text classification consisting of modular cleaning phase and tokenizer, three embedding methods, and eight classifiers. Our experiments shows a promising result for detection of offensive language on our dataset obtained from Twitter. Considering hyperparameter optimization, three methods of AdaBoost, SVM and MLP had highest average of F1-score on popular embedding method of TF-IDF.\",\"PeriodicalId\":101590,\"journal\":{\"name\":\"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCWC54503.2022.9720804\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC54503.2022.9720804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

在各种社交平台上，由人群产生的内容中出现了令人担忧的攻击性语言。这样的语言可能会欺负或伤害个人或社区的感情。最近，研究界调查并开发了不同的监督方法和训练数据集，以自动检测或防止攻击性独白或对话。在这项研究中，我们提出了一个文本分类模型，该模型由模块化的清洗阶段和标记器、三种嵌入方法和八个分类器组成。我们的实验显示，在我们从Twitter获得的数据集上检测攻击性语言的结果很有希望。考虑超参数优化，AdaBoost、SVM和MLP三种方法在TF-IDF的常用嵌入方法中f1得分平均值最高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Offensive Language Detection on Social Media Based on Text Classification

There is a concerning rise of offensive language on the content generated by the crowd over various social platforms. Such language might bully or hurt the feelings of an individual or a community. Recently, the research community has investigated and developed different supervised approaches and training datasets to detect or prevent offensive monologues or dialogues automatically. In this study, we propose a model for text classification consisting of modular cleaning phase and tokenizer, three embedding methods, and eight classifiers. Our experiments shows a promising result for detection of offensive language on our dataset obtained from Twitter. Considering hyperparameter optimization, three methods of AdaBoost, SVM and MLP had highest average of F1-score on popular embedding method of TF-IDF.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)

自引率

0.00%

发文量