Mawqif:用于目标特定姿态检测的多标签阿拉伯语数据集

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.16

Nora S. Alturayeif, H.A. Luqman, Moataz Aly Kamaleldin Ahmed

{"title":"Mawqif:用于目标特定姿态检测的多标签阿拉伯语数据集","authors":"Nora S. Alturayeif, H.A. Luqman, Moataz Aly Kamaleldin Ahmed","doi":"10.18653/v1/2022.wanlp-1.16","DOIUrl":null,"url":null,"abstract":"Social media platforms are becoming inherent parts of people’s daily life to express opinions and stances toward topics of varying polarities. Stance detection determines the viewpoint expressed in a text toward a target. While communication on social media (e.g., Twitter) takes place in more than 40 languages, the majority of stance detection research has been focused on English. Although some efforts have recently been made to develop stance detection datasets in other languages, no similar efforts seem to have considered the Arabic language. In this paper, we present Mawqif, the first Arabic dataset for target-specific stance detection, composed of 4,121 tweets annotated with stance, sentiment, and sarcasm polarities. Mawqif, as a multi-label dataset, can provide more opportunities for studying the interaction between different opinion dimensions and evaluating a multi-task model. We provide a detailed description of the dataset, present an analysis of the produced annotation, and evaluate four BERT-based models on it. Our best model achieves a macro-F1 of 78.89%, which shows that there is ample room for improvement on this challenging task. We publicly release our dataset, the annotation guidelines, and the code of the experiments.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection\",\"authors\":\"Nora S. Alturayeif, H.A. Luqman, Moataz Aly Kamaleldin Ahmed\",\"doi\":\"10.18653/v1/2022.wanlp-1.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media platforms are becoming inherent parts of people’s daily life to express opinions and stances toward topics of varying polarities. Stance detection determines the viewpoint expressed in a text toward a target. While communication on social media (e.g., Twitter) takes place in more than 40 languages, the majority of stance detection research has been focused on English. Although some efforts have recently been made to develop stance detection datasets in other languages, no similar efforts seem to have considered the Arabic language. In this paper, we present Mawqif, the first Arabic dataset for target-specific stance detection, composed of 4,121 tweets annotated with stance, sentiment, and sarcasm polarities. Mawqif, as a multi-label dataset, can provide more opportunities for studying the interaction between different opinion dimensions and evaluating a multi-task model. We provide a detailed description of the dataset, present an analysis of the produced annotation, and evaluate four BERT-based models on it. Our best model achieves a macro-F1 of 78.89%, which shows that there is ample room for improvement on this challenging task. We publicly release our dataset, the annotation guidelines, and the code of the experiments.\",\"PeriodicalId\":355149,\"journal\":{\"name\":\"Workshop on Arabic Natural Language Processing\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Arabic Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.wanlp-1.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

社交媒体平台正成为人们日常生活中不可或缺的一部分，人们可以在社交媒体上表达对各种极端话题的观点和立场。姿态检测确定文本中对目标表达的观点。虽然社交媒体(如Twitter)上的交流用40多种语言进行，但大多数立场检测研究都集中在英语上。虽然最近已经做出了一些努力来开发其他语言的姿态检测数据集，但似乎没有类似的努力考虑到阿拉伯语。在本文中，我们提出了Mawqif，这是第一个针对特定目标的立场检测的阿拉伯语数据集，由4121条推文组成，其中标注了立场、情绪和讽刺的极性。Mawqif作为一个多标签数据集，可以为研究不同意见维度之间的相互作用和评估多任务模型提供更多的机会。我们提供了数据集的详细描述，对生成的注释进行了分析，并在此基础上评估了四种基于bert的模型。我们最好的模型实现了78.89%的宏观f1，这表明在这个具有挑战性的任务上还有很大的改进空间。我们公开发布我们的数据集、注释指南和实验代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection

Social media platforms are becoming inherent parts of people’s daily life to express opinions and stances toward topics of varying polarities. Stance detection determines the viewpoint expressed in a text toward a target. While communication on social media (e.g., Twitter) takes place in more than 40 languages, the majority of stance detection research has been focused on English. Although some efforts have recently been made to develop stance detection datasets in other languages, no similar efforts seem to have considered the Arabic language. In this paper, we present Mawqif, the first Arabic dataset for target-specific stance detection, composed of 4,121 tweets annotated with stance, sentiment, and sarcasm polarities. Mawqif, as a multi-label dataset, can provide more opportunities for studying the interaction between different opinion dimensions and evaluating a multi-task model. We provide a detailed description of the dataset, present an analysis of the produced annotation, and evaluate four BERT-based models on it. Our best model achieves a macro-F1 of 78.89%, which shows that there is ample room for improvement on this challenging task. We publicly release our dataset, the annotation guidelines, and the code of the experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量