Prasanna Kumar Kumaresan , Rahul Ponnusamy , Ruba Priyadharshini , Paul Buitelaar , Bharathi Raja Chakravarthi
{"title":"社交媒体评论中低资源语言的同性恋恐惧症和变性恐惧症检测","authors":"Prasanna Kumar Kumaresan , Rahul Ponnusamy , Ruba Priyadharshini , Paul Buitelaar , Bharathi Raja Chakravarthi","doi":"10.1016/j.nlp.2023.100041","DOIUrl":null,"url":null,"abstract":"<div><p>People are increasingly sharing and expressing their emotions using online social media platforms such as Twitter, Facebook, and YouTube. An abusive, hateful, threatening, and discriminatory act that makes discomfort targets gay, lesbian, transgender, or bisexual individuals is called Homophobia and Transphobia. Detecting these types of acts on social media is called Homophobia and Transphobia Detection. This task has recently gained interest among researchers. Identifying homophobic and transphobic content for under-resourced languages is a bit challenging task. There are no such resources for Malayalam and Hindi to categorize these types of content as far now. This paper presents a new high-quality dataset for detecting homophobia and transphobia in Malayalam and Hindi languages. Our dataset consists of <strong>5,193</strong> comments in Malayalam and <strong>3,203</strong> comments in Hindi. We also submitted the experiments performed with traditional machine learning and transformer-based deep learning models on the Malayalam, Hindi, English, Tamil, and Tamil-English datasets.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"5 ","pages":"Article 100041"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719123000389/pdfft?md5=c1364aed60a2950cc0c429a6cbc230b6&pid=1-s2.0-S2949719123000389-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Homophobia and transphobia detection for low-resourced languages in social media comments\",\"authors\":\"Prasanna Kumar Kumaresan , Rahul Ponnusamy , Ruba Priyadharshini , Paul Buitelaar , Bharathi Raja Chakravarthi\",\"doi\":\"10.1016/j.nlp.2023.100041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>People are increasingly sharing and expressing their emotions using online social media platforms such as Twitter, Facebook, and YouTube. An abusive, hateful, threatening, and discriminatory act that makes discomfort targets gay, lesbian, transgender, or bisexual individuals is called Homophobia and Transphobia. Detecting these types of acts on social media is called Homophobia and Transphobia Detection. This task has recently gained interest among researchers. Identifying homophobic and transphobic content for under-resourced languages is a bit challenging task. There are no such resources for Malayalam and Hindi to categorize these types of content as far now. This paper presents a new high-quality dataset for detecting homophobia and transphobia in Malayalam and Hindi languages. Our dataset consists of <strong>5,193</strong> comments in Malayalam and <strong>3,203</strong> comments in Hindi. We also submitted the experiments performed with traditional machine learning and transformer-based deep learning models on the Malayalam, Hindi, English, Tamil, and Tamil-English datasets.</p></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"5 \",\"pages\":\"Article 100041\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949719123000389/pdfft?md5=c1364aed60a2950cc0c429a6cbc230b6&pid=1-s2.0-S2949719123000389-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719123000389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719123000389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Homophobia and transphobia detection for low-resourced languages in social media comments
People are increasingly sharing and expressing their emotions using online social media platforms such as Twitter, Facebook, and YouTube. An abusive, hateful, threatening, and discriminatory act that makes discomfort targets gay, lesbian, transgender, or bisexual individuals is called Homophobia and Transphobia. Detecting these types of acts on social media is called Homophobia and Transphobia Detection. This task has recently gained interest among researchers. Identifying homophobic and transphobic content for under-resourced languages is a bit challenging task. There are no such resources for Malayalam and Hindi to categorize these types of content as far now. This paper presents a new high-quality dataset for detecting homophobia and transphobia in Malayalam and Hindi languages. Our dataset consists of 5,193 comments in Malayalam and 3,203 comments in Hindi. We also submitted the experiments performed with traditional machine learning and transformer-based deep learning models on the Malayalam, Hindi, English, Tamil, and Tamil-English datasets.