基于Twitter种族内容数据分析的种族过滤分类模型

International Journal of Data Science and Analysis Pub Date : 2021-11-10 DOI:10.11648/J.IJDSA.20210706.11

Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim

{"title":"基于Twitter种族内容数据分析的种族过滤分类模型","authors":"Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim","doi":"10.11648/J.IJDSA.20210706.11","DOIUrl":null,"url":null,"abstract":"Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.","PeriodicalId":181499,"journal":{"name":"International Journal of Data Science and Analysis","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter\",\"authors\":\"Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim\",\"doi\":\"10.11648/J.IJDSA.20210706.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.\",\"PeriodicalId\":181499,\"journal\":{\"name\":\"International Journal of Data Science and Analysis\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Science and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11648/J.IJDSA.20210706.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Science and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/J.IJDSA.20210706.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

“停止对亚洲人的仇恨”或“停止对亚裔太平洋岛民的仇恨”指的是反对种族主义攻击亚洲人的全国性运动。这次抗议活动是根据“黑人的命也是命”(BLM)运动发起的，旨在消除针对亚洲人的持续仇恨和针对性犯罪，并教育人们了解此类威胁。针对亚洲人的仇恨犯罪在美国各地一直在稳步发生，但随着新冠疫情的影响，这些犯罪开始增加。在“停止亚洲仇恨运动”中，人们指责某些亚洲国家是新冠病毒的源头，这一问题进一步加剧。2021年，亚裔美国人报告的严重网络仇恨和骚扰事件增幅最大，其中包括将2019冠状病毒病传播归咎于亚裔的种族主义和仇外言论。为了具体评估每一场运动的影响和措施，研究人员进行了一项研究，调查了社交媒体上对亚洲人的种族歧视，特别是推特。为了分析社交媒体上的数据，我们使用Python编程来收集和分析种族歧视和反亚洲仇恨的比例。在此过程中，数据集通过数据标记建模，将每条社交媒体推文分为三个子类别之一。数据分为两种类型:包含种族内容或针对亚洲人的信息的类型1和包含非种族内容的类型0。数据收集是通过Twint完成的，Twint是一个Twitter的Python抓取工具，收集了2000多条与该运动相关的关键字的最新推文。然后，通过Python进行预处理步骤，包括去大写、词序化和标记化过程。然后用图表和词云来表示这些数据，显示社交媒体上针对亚洲人最常用的一些术语。最后，对数据进行二元分类模型的设计，用于过滤含有种族内容的推文。我们比较了三种不同算法的分类模型的准确性:逻辑回归、随机森林和支持向量机。创建的模型将能够保护用户免受互联网上广泛存在的种族主义术语的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter

Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Data Science and Analysis

自引率

0.00%

发文量