Alisson Rodrigo Santana dos Santos, C. Rodrigues, Henning Barly Summer de Melo
{"title":"Identifying Xenophobia in Twitter Posts Using Support Vector Machine with TF/IDF Strategy","authors":"Alisson Rodrigo Santana dos Santos, C. Rodrigues, Henning Barly Summer de Melo","doi":"10.1145/3535511.3535548","DOIUrl":null,"url":null,"abstract":"Context: Xenophobia is the fear of foreign groups. Nevertheless, it is understood that this phenomenon emcompasses something much broader, as it brings to light not only fear, but also rejection or hostility towards different ethnic groups. Although it is not a contemporary problem, recent factors such as economic and humanitarian crises have shown that the problem is growing. Problem: Twitter is one of the most used social networks for data mining studies, due to its large number of posts. These singularities make the platform conducive to the proliferation of hate speech. Solution: The present research aims to develop a tweet classifier system for xenophobic messages. IS theory: This work was conceived under the aegis of Organizational Learning Theory. In particular, the Support Vector Machines strategy was used together with the TF-IDF statistical technique, in order to engineer a predictive model for learning potential patterns within the collected data. Method: The research conducted in this study is quantitative, organized through the following methodological procedures: (i) data collection, (ii) controlled laboratory experiments, and (iii) construction of the classifier. Summary of Results: Among the results for the developed classifier, the one with the best performance was the SVM with Kernel Sigmoid, with an accuracy of 90%. Thus, the research results are encouraging for the identification of xenophobia in social media. Contribution and Impact in the IS area: As contributions, in addition to the classification system, we also have the creation of a database on Xenophobia, something that, as far as is known, does not exist in the Brazilian context.","PeriodicalId":106528,"journal":{"name":"Proceedings of the XVIII Brazilian Symposium on Information Systems","volume":"226 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XVIII Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535511.3535548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Context: Xenophobia is the fear of foreign groups. Nevertheless, it is understood that this phenomenon emcompasses something much broader, as it brings to light not only fear, but also rejection or hostility towards different ethnic groups. Although it is not a contemporary problem, recent factors such as economic and humanitarian crises have shown that the problem is growing. Problem: Twitter is one of the most used social networks for data mining studies, due to its large number of posts. These singularities make the platform conducive to the proliferation of hate speech. Solution: The present research aims to develop a tweet classifier system for xenophobic messages. IS theory: This work was conceived under the aegis of Organizational Learning Theory. In particular, the Support Vector Machines strategy was used together with the TF-IDF statistical technique, in order to engineer a predictive model for learning potential patterns within the collected data. Method: The research conducted in this study is quantitative, organized through the following methodological procedures: (i) data collection, (ii) controlled laboratory experiments, and (iii) construction of the classifier. Summary of Results: Among the results for the developed classifier, the one with the best performance was the SVM with Kernel Sigmoid, with an accuracy of 90%. Thus, the research results are encouraging for the identification of xenophobia in social media. Contribution and Impact in the IS area: As contributions, in addition to the classification system, we also have the creation of a database on Xenophobia, something that, as far as is known, does not exist in the Brazilian context.