HOMO-MEX:用于Twitter上LGBT+恐惧症检测的墨西哥西班牙语注释语料库

The 7th Workshop on Online Abuse and Harms (WOAH) Pub Date : 1900-01-01 DOI:10.18653/v1/2023.woah-1.20

Juan Vásquez, S. Andersen, G. Bel-Enguix, Helena Gómez-Adorno, Sergio-Luis Ojeda-Trueba

{"title":"HOMO-MEX:用于Twitter上LGBT+恐惧症检测的墨西哥西班牙语注释语料库","authors":"Juan Vásquez, S. Andersen, G. Bel-Enguix, Helena Gómez-Adorno, Sergio-Luis Ojeda-Trueba","doi":"10.18653/v1/2023.woah-1.20","DOIUrl":null,"url":null,"abstract":"In the past few years, the NLP community has actively worked on detecting LGBT+Phobia in online spaces, using textual data publicly available Most of these are for the English language and its variants since it is the most studied language by the NLP community. Nevertheless, efforts towards creating corpora in other languages are active worldwide. Despite this, the Spanish language is an understudied language regarding digital LGBT+Phobia. The only corpus we found in the literature was for the Peninsular Spanish dialects, which use LGBT+phobic terms different than those in the Mexican dialect. For this reason, we present Homo-MEX, a novel corpus for detecting LGBT+Phobia in Mexican Spanish. In this paper, we describe our data-gathering and annotation process. Also, we present a classification benchmark using various traditional machine learning algorithms and two pre-trained deep learning models to showcase our corpus classification potential.","PeriodicalId":378248,"journal":{"name":"The 7th Workshop on Online Abuse and Harms (WOAH)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter\",\"authors\":\"Juan Vásquez, S. Andersen, G. Bel-Enguix, Helena Gómez-Adorno, Sergio-Luis Ojeda-Trueba\",\"doi\":\"10.18653/v1/2023.woah-1.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the past few years, the NLP community has actively worked on detecting LGBT+Phobia in online spaces, using textual data publicly available Most of these are for the English language and its variants since it is the most studied language by the NLP community. Nevertheless, efforts towards creating corpora in other languages are active worldwide. Despite this, the Spanish language is an understudied language regarding digital LGBT+Phobia. The only corpus we found in the literature was for the Peninsular Spanish dialects, which use LGBT+phobic terms different than those in the Mexican dialect. For this reason, we present Homo-MEX, a novel corpus for detecting LGBT+Phobia in Mexican Spanish. In this paper, we describe our data-gathering and annotation process. Also, we present a classification benchmark using various traditional machine learning algorithms and two pre-trained deep learning models to showcase our corpus classification potential.\",\"PeriodicalId\":378248,\"journal\":{\"name\":\"The 7th Workshop on Online Abuse and Harms (WOAH)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 7th Workshop on Online Abuse and Harms (WOAH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2023.woah-1.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 7th Workshop on Online Abuse and Harms (WOAH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2023.woah-1.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去的几年里，NLP社区一直积极致力于检测在线空间中的LGBT+恐惧症，使用公开的文本数据，其中大多数是针对英语及其变体的，因为它是NLP社区研究最多的语言。尽管如此，全世界都在积极努力创建其他语言的语料库。尽管如此，西班牙语是一种关于数字LGBT+恐惧症的未充分研究的语言。我们在文献中发现的唯一语料库是半岛西班牙方言，这些方言使用的LGBT+恐惧症术语与墨西哥方言不同。基于这个原因，我们提出了一种新的用于检测墨西哥西班牙语中LGBT+恐惧症的语料库——Homo-MEX。在本文中，我们描述了我们的数据收集和注释过程。此外，我们提出了一个分类基准，使用各种传统机器学习算法和两个预训练的深度学习模型来展示我们的语料库分类潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter

In the past few years, the NLP community has actively worked on detecting LGBT+Phobia in online spaces, using textual data publicly available Most of these are for the English language and its variants since it is the most studied language by the NLP community. Nevertheless, efforts towards creating corpora in other languages are active worldwide. Despite this, the Spanish language is an understudied language regarding digital LGBT+Phobia. The only corpus we found in the literature was for the Peninsular Spanish dialects, which use LGBT+phobic terms different than those in the Mexican dialect. For this reason, we present Homo-MEX, a novel corpus for detecting LGBT+Phobia in Mexican Spanish. In this paper, we describe our data-gathering and annotation process. Also, we present a classification benchmark using various traditional machine learning algorithms and two pre-trained deep learning models to showcase our corpus classification potential.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 7th Workshop on Online Abuse and Harms (WOAH)

自引率

0.00%

发文量