Prasanna Kumar Kumaresan , Devendra Deepak Kayande , Ruba Priyadharshini , Paul Buitelaar , Bharathi Raja Chakravarthi
{"title":"Homophobia and transphobia span identification in low-resource languages","authors":"Prasanna Kumar Kumaresan , Devendra Deepak Kayande , Ruba Priyadharshini , Paul Buitelaar , Bharathi Raja Chakravarthi","doi":"10.1016/j.nlp.2025.100169","DOIUrl":null,"url":null,"abstract":"<div><div>Online platforms have become prevalent because they promote free speech and group discussions. However, they also serve as platforms for hate speech, which can negatively impact the psychological well-being of vulnerable people. This is especially true for members of the LGBTQ+ community, who are often the targets of homophobia and transphobia in online environments. Our study makes three main contributions: (1) we developed a new dataset with span-level annotations for homophobia and transphobia in Tamil, English, and Marathi; (2) we employed advanced language models using BERT-based architectures, Conditional Random Field (CRF), and Bidirectional Long Short-Term Memory (BiLSTM) layers to enhance span-level detection of harmful content; and (3) we conducted benchmarking to evaluate the effectiveness of monolingual and multilingual models in detecting subtle forms of hate speech. The annotated dataset, which is collected from real-world social media (YouTube) content, provides diverse language contexts and enhances the representation of low-resource languages. The span-based detection approach enables models to detect subtle linguistic nuances, leading to more precise content moderation that accounts for cultural differences. The experimental results show that our models achieve effective span detection, which provides valuable information for creating inclusive moderation tools. Our research leads to the development of AI systems, and we aim to reduce the burden on moderators and improve the quality of online experiences for LGBTQ+ vulnerable.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100169"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719125000457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Online platforms have become prevalent because they promote free speech and group discussions. However, they also serve as platforms for hate speech, which can negatively impact the psychological well-being of vulnerable people. This is especially true for members of the LGBTQ+ community, who are often the targets of homophobia and transphobia in online environments. Our study makes three main contributions: (1) we developed a new dataset with span-level annotations for homophobia and transphobia in Tamil, English, and Marathi; (2) we employed advanced language models using BERT-based architectures, Conditional Random Field (CRF), and Bidirectional Long Short-Term Memory (BiLSTM) layers to enhance span-level detection of harmful content; and (3) we conducted benchmarking to evaluate the effectiveness of monolingual and multilingual models in detecting subtle forms of hate speech. The annotated dataset, which is collected from real-world social media (YouTube) content, provides diverse language contexts and enhances the representation of low-resource languages. The span-based detection approach enables models to detect subtle linguistic nuances, leading to more precise content moderation that accounts for cultural differences. The experimental results show that our models achieve effective span detection, which provides valuable information for creating inclusive moderation tools. Our research leads to the development of AI systems, and we aim to reduce the burden on moderators and improve the quality of online experiences for LGBTQ+ vulnerable.