Identifying Xenophobia in Twitter Posts Using Support Vector Machine with TF/IDF Strategy

Alisson Rodrigo Santana dos Santos, C. Rodrigues, Henning Barly Summer de Melo
{"title":"Identifying Xenophobia in Twitter Posts Using Support Vector Machine with TF/IDF Strategy","authors":"Alisson Rodrigo Santana dos Santos, C. Rodrigues, Henning Barly Summer de Melo","doi":"10.1145/3535511.3535548","DOIUrl":null,"url":null,"abstract":"Context: Xenophobia is the fear of foreign groups. Nevertheless, it is understood that this phenomenon emcompasses something much broader, as it brings to light not only fear, but also rejection or hostility towards different ethnic groups. Although it is not a contemporary problem, recent factors such as economic and humanitarian crises have shown that the problem is growing. Problem: Twitter is one of the most used social networks for data mining studies, due to its large number of posts. These singularities make the platform conducive to the proliferation of hate speech. Solution: The present research aims to develop a tweet classifier system for xenophobic messages. IS theory: This work was conceived under the aegis of Organizational Learning Theory. In particular, the Support Vector Machines strategy was used together with the TF-IDF statistical technique, in order to engineer a predictive model for learning potential patterns within the collected data. Method: The research conducted in this study is quantitative, organized through the following methodological procedures: (i) data collection, (ii) controlled laboratory experiments, and (iii) construction of the classifier. Summary of Results: Among the results for the developed classifier, the one with the best performance was the SVM with Kernel Sigmoid, with an accuracy of 90%. Thus, the research results are encouraging for the identification of xenophobia in social media. Contribution and Impact in the IS area: As contributions, in addition to the classification system, we also have the creation of a database on Xenophobia, something that, as far as is known, does not exist in the Brazilian context.","PeriodicalId":106528,"journal":{"name":"Proceedings of the XVIII Brazilian Symposium on Information Systems","volume":"226 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XVIII Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535511.3535548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Context: Xenophobia is the fear of foreign groups. Nevertheless, it is understood that this phenomenon emcompasses something much broader, as it brings to light not only fear, but also rejection or hostility towards different ethnic groups. Although it is not a contemporary problem, recent factors such as economic and humanitarian crises have shown that the problem is growing. Problem: Twitter is one of the most used social networks for data mining studies, due to its large number of posts. These singularities make the platform conducive to the proliferation of hate speech. Solution: The present research aims to develop a tweet classifier system for xenophobic messages. IS theory: This work was conceived under the aegis of Organizational Learning Theory. In particular, the Support Vector Machines strategy was used together with the TF-IDF statistical technique, in order to engineer a predictive model for learning potential patterns within the collected data. Method: The research conducted in this study is quantitative, organized through the following methodological procedures: (i) data collection, (ii) controlled laboratory experiments, and (iii) construction of the classifier. Summary of Results: Among the results for the developed classifier, the one with the best performance was the SVM with Kernel Sigmoid, with an accuracy of 90%. Thus, the research results are encouraging for the identification of xenophobia in social media. Contribution and Impact in the IS area: As contributions, in addition to the classification system, we also have the creation of a database on Xenophobia, something that, as far as is known, does not exist in the Brazilian context.
使用TF/IDF策略的支持向量机识别Twitter帖子中的仇外情绪
背景:仇外心理是对外国群体的恐惧。然而,据了解,这一现象包含更广泛的内容,因为它不仅揭示了恐惧,而且还揭示了对不同族裔群体的排斥或敌意。虽然这不是一个当代问题,但经济和人道主义危机等最近的因素表明,这一问题正在加剧。问题:Twitter是数据挖掘研究中最常用的社交网络之一,因为它有大量的帖子。这些独特性使得这个平台有利于仇恨言论的扩散。解决方案:本研究旨在开发一个针对仇外信息的推文分类系统。IS理论:这项工作是在组织学习理论的支持下构思的。特别是,支持向量机策略与TF-IDF统计技术一起使用,以便设计一个预测模型,用于学习收集数据中的潜在模式。方法:本研究的研究是定量的,通过以下方法学步骤进行组织:(i)数据收集,(ii)受控实验室实验,(iii)构建分类器。结果总结:在所开发的分类器的结果中,性能最好的是带有核Sigmoid的SVM,准确率达到90%。因此,研究结果对于识别社交媒体中的仇外心理是令人鼓舞的。在IS领域的贡献与影响:除了分类系统,我们也建立了仇外心理资料库,据目前所知,巴西国内并没有这种资料库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信