{"title":"附加注释的沃洛夫语-法语混合语推文数据用于检测厌恶信息","authors":"Ibrahima Ndao , Khadim Dramé , Gorgoumack Sambe , Gayo Diallo","doi":"10.1016/j.dib.2025.111500","DOIUrl":null,"url":null,"abstract":"<div><div>Automatic detection of obnoxious (abusive) messages on social networks is complex, especially for low-resource languages and in the case of mixed code, such as Wolof-French. This phenomenon is common in Senegalese tweets, but there is a lack of annotated data to facilitate this task. To fill this gap, we created AWOFRO, the first annotated corpus of 3510 tweets in mixed code. We analysed this corpus and validated the annotations using measures such as Cohen's Kappa.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111500"},"PeriodicalIF":1.0000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Annotated tweet data of mixed Wolof-French for detecting Obnoxious messages\",\"authors\":\"Ibrahima Ndao , Khadim Dramé , Gorgoumack Sambe , Gayo Diallo\",\"doi\":\"10.1016/j.dib.2025.111500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Automatic detection of obnoxious (abusive) messages on social networks is complex, especially for low-resource languages and in the case of mixed code, such as Wolof-French. This phenomenon is common in Senegalese tweets, but there is a lack of annotated data to facilitate this task. To fill this gap, we created AWOFRO, the first annotated corpus of 3510 tweets in mixed code. We analysed this corpus and validated the annotations using measures such as Cohen's Kappa.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"60 \",\"pages\":\"Article 111500\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S235234092500232X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235234092500232X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Annotated tweet data of mixed Wolof-French for detecting Obnoxious messages
Automatic detection of obnoxious (abusive) messages on social networks is complex, especially for low-resource languages and in the case of mixed code, such as Wolof-French. This phenomenon is common in Senegalese tweets, but there is a lack of annotated data to facilitate this task. To fill this gap, we created AWOFRO, the first annotated corpus of 3510 tweets in mixed code. We analysed this corpus and validated the annotations using measures such as Cohen's Kappa.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.