Workshop on Arabic Natural Language Processing最新文献

筛选
英文 中文
AraDepSu: Detecting Depression and Suicidal Ideation in Arabic Tweets Using Transformers AraDepSu:使用变压器检测阿拉伯语推文中的抑郁和自杀意念
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.28
Mariam Hassib, Nancy Hossam, Jolie Sameh, Marwan Torki
{"title":"AraDepSu: Detecting Depression and Suicidal Ideation in Arabic Tweets Using Transformers","authors":"Mariam Hassib, Nancy Hossam, Jolie Sameh, Marwan Torki","doi":"10.18653/v1/2022.wanlp-1.28","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.28","url":null,"abstract":"Among mental health diseases, depression is one of the most severe, as it often leads to suicide which is the fourth leading cause of death in the Middle East. In the Middle East, Egypt has the highest percentage of suicidal deaths; due to this, it is important to identify depression and suicidal ideation. In Arabic culture, there is a lack of awareness regarding the importance of diagnosing and living with mental health diseases. However, as noted for the last couple years people all over the world, including Arab citizens, tend to express their feelings openly on social media. Twitter is the most popular platform designed to enable the expression of emotions through short texts, pictures, or videos. This paper aims to predict depression and depression with suicidal ideation. Due to the tendency of people to treat social media as their personal diaries and share their deepest thoughts on social media platforms. Social data contain valuable information that can be used to identify user’s psychological states. We create AraDepSu dataset by scrapping tweets from twitter and manually labelling them. We expand the diversity of user tweets, by adding a neutral label (“neutral”) so the dataset include three classes (“depressed”, “suicidal”, “neutral”). Then we train our AraDepSu dataset on 30+ different transformer models. We find that the best-performing model is MARBERT with accuracy, precision, recall and F1-Score values of 91.20%, 88.74%, 88.50% and 88.75%.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130903733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect 科威特方言情感分析的弱监督迁移学习方法
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.15
Fatemah Husain, Hana Al-Ostad, Halima Omar
{"title":"A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect","authors":"Fatemah Husain, Hana Al-Ostad, Halima Omar","doi":"10.18653/v1/2022.wanlp-1.15","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.15","url":null,"abstract":"Developing a system for sentiment analysis is very challenging for the Arabic language due to the limitations in the available Arabic datasets. Many Arabic dialects are still not studied by researchers in Arabic sentiment analysis due to the complexity of annotators’ recruitment process during dataset creation. This paper covers the research gap in sentiment analysis for the Kuwaiti dialect by proposing a weak supervised approach to develop a large labeled dataset. Our dataset consists of over 16.6k tweets with 7,905 negatives, 7,902 positives, and 860 neutrals that spans several themes and time frames to remove any bias that might affect its content. The annotation agreement between our proposed system’s labels and human-annotated labels reports 93% for the pairwise percent agreement and 0.87 for Cohen’s kappa coefficient. Furthermore, we evaluate our dataset using multiple traditional machine learning classifiers and advanced deep learning language models to test its performance. The results report 89% accuracy when applied to the testing dataset using the ARBERT model.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114448497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CAraNER: The COVID-19 Arabic Named Entity Corpus CAraNER: 2019冠状病毒病阿拉伯命名实体语料库
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.1
A. Al-Thubaity, Sakhar B. Alkhereyfy, Wejdan Al-Zahrani, Alia Bahanshal
{"title":"CAraNER: The COVID-19 Arabic Named Entity Corpus","authors":"A. Al-Thubaity, Sakhar B. Alkhereyfy, Wejdan Al-Zahrani, Alia Bahanshal","doi":"10.18653/v1/2022.wanlp-1.1","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.1","url":null,"abstract":"Named Entity Recognition (NER) is a well-known problem for the natural language processing (NLP) community. It is a key component of different NLP applications, including information extraction, question answering, and information retrieval. In the literature, there are several Arabic NER datasets with different named entity tags; however, due to data and concept drift, we are always in need of new data for NER and other NLP applications. In this paper, first, we introduce Wassem, a web-based annotation platform for Arabic NLP applications. Wassem can be used to manually annotate textual data for a variety of NLP tasks: text classification, sequence classification, and word segmentation. Second, we introduce the COVID-19 Arabic Named Entities Recognition (CAraNER) dataset. CAraNER has 55,389 tokens distributed over 1,278 sentences randomly extracted from Saudi Arabian newspaper articles published during 2019, 2020, and 2021. The dataset is labeled by five annotators with five named-entity tags, namely: Person, Title, Location, Organization, and Miscellaneous. The CAraNER corpus is available for download for free. We evaluate the corpus by finetuning four BERT-based Arabic language models on the CAraNER corpus. The best model was AraBERTv0.2-large with 0.86 for the F1 macro measure.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"16 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134105028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
iCompass at WANLP 2022 Shared Task: ARBERT and MARBERT for Multilabel Propaganda Classification of Arabic Tweets iCompass WANLP 2022共享任务:ARBERT和MARBERT对阿拉伯语推文的多标签宣传分类
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.59
B. Taboubi, Bechir Brahem, H. Haddad
{"title":"iCompass at WANLP 2022 Shared Task: ARBERT and MARBERT for Multilabel Propaganda Classification of Arabic Tweets","authors":"B. Taboubi, Bechir Brahem, H. Haddad","doi":"10.18653/v1/2022.wanlp-1.59","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.59","url":null,"abstract":"Arabic propaganda detection in Arabic was carried out using transformers pre-trained models ARBERT, MARBERT. They were fine-tuned for the down-stream task in hand ‘subtask 1’, multilabel classification of Arabic tweets. Submitted model was MARBERT the got 0.597 micro F1 score and got the fifth rank.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132844795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Authorship Verification for Arabic Short Texts Using Arabic Knowledge-Base Model (AraKB) 基于阿拉伯文知识库模型的阿拉伯文短文本作者身份验证
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.19
Fatimah Alqahtani, H. Yannakoudakis
{"title":"Authorship Verification for Arabic Short Texts Using Arabic Knowledge-Base Model (AraKB)","authors":"Fatimah Alqahtani, H. Yannakoudakis","doi":"10.18653/v1/2022.wanlp-1.19","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.19","url":null,"abstract":"Arabic is a Semitic language, considered to be one of the most complex languages in the world due to its unique composition and complex linguistic features. It consequently causes challenges for verifying the authorship of Arabic texts, requiring extensive research and development. This paper presents a new knowledge-based model to enhance Natural Language Understanding and thereby improve authorship verification performance. The proposed model provided promising results that would benefit the Arabic research for different Natural Language Processing tasks.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132406859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Pilot Study on the Collection and Computational Analysis of Linguistic Differences Amongst Men and Women in a Kuwaiti Arabic WhatsApp Dataset 在科威特阿拉伯语WhatsApp数据集中收集和计算分析男女语言差异的试点研究
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.35
Hesah Aldihan, R. Gaizauskas, S. Fitzmaurice
{"title":"A Pilot Study on the Collection and Computational Analysis of Linguistic Differences Amongst Men and Women in a Kuwaiti Arabic WhatsApp Dataset","authors":"Hesah Aldihan, R. Gaizauskas, S. Fitzmaurice","doi":"10.18653/v1/2022.wanlp-1.35","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.35","url":null,"abstract":"This study focuses on the collection and computational analysis of Kuwaiti Arabic (KA), which is considered a low resource dialect, to test different sociolinguistic hypotheses related to gendered language use. In this paper, we describe the collection and analysis of a corpus of WhatsApp Group chats with mixed gender Kuwaiti participants. This corpus, which we are making publicly available, is the first corpus of KA conversational data. We analyse different interactional and linguistic features to get insights about features that may be indicative of gender to inform the development of a gender classification system for KA in an upcoming study. Statistical analysis of our data shows that there is insufficient evidence to claim that there are significant differences amongst men and women with respect to number of turns, length of turns and number of emojis. However, qualitative analysis shows that men and women differ substantially in the types of emojis they use and in their use of lengthened words.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123886359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ahmed and Khalil at NADI 2022: Transfer Learning and Addressing Class Imbalance for Arabic Dialect Identification and Sentiment Analysis Ahmed和Khalil在NADI 2022:迁移学习和解决阿拉伯语方言识别和情感分析的阶级不平衡
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.46
Ahmed Oumar, Khalil Mrini
{"title":"Ahmed and Khalil at NADI 2022: Transfer Learning and Addressing Class Imbalance for Arabic Dialect Identification and Sentiment Analysis","authors":"Ahmed Oumar, Khalil Mrini","doi":"10.18653/v1/2022.wanlp-1.46","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.46","url":null,"abstract":"In this paper, we present our findings in the two subtasks of the 2022 NADI shared task. First, in the Arabic dialect identification subtask, we find that there is heavy class imbalance, and propose to address this issue using focal loss. Our experiments with the focusing hyperparameter confirm that focal loss improves performance. Second, in the Arabic tweet sentiment analysis subtask, we deal with a smaller dataset, where text includes both Arabic dialects and Modern Standard Arabic. We propose to use transfer learning from both pre-trained MSA language models and our own model from the first subtask. Our system ranks in the 5th and 7th best spots of the leaderboards of first and second subtasks respectively.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"288 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115217594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection Mawqif:用于目标特定姿态检测的多标签阿拉伯语数据集
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.16
Nora S. Alturayeif, H.A. Luqman, Moataz Aly Kamaleldin Ahmed
{"title":"Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection","authors":"Nora S. Alturayeif, H.A. Luqman, Moataz Aly Kamaleldin Ahmed","doi":"10.18653/v1/2022.wanlp-1.16","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.16","url":null,"abstract":"Social media platforms are becoming inherent parts of people’s daily life to express opinions and stances toward topics of varying polarities. Stance detection determines the viewpoint expressed in a text toward a target. While communication on social media (e.g., Twitter) takes place in more than 40 languages, the majority of stance detection research has been focused on English. Although some efforts have recently been made to develop stance detection datasets in other languages, no similar efforts seem to have considered the Arabic language. In this paper, we present Mawqif, the first Arabic dataset for target-specific stance detection, composed of 4,121 tweets annotated with stance, sentiment, and sarcasm polarities. Mawqif, as a multi-label dataset, can provide more opportunities for studying the interaction between different opinion dimensions and evaluating a multi-task model. We provide a detailed description of the dataset, present an analysis of the produced annotation, and evaluate four BERT-based models on it. Our best model achieves a macro-F1 of 78.89%, which shows that there is ample room for improvement on this challenging task. We publicly release our dataset, the annotation guidelines, and the code of the experiments.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115484504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Pythoneers at WANLP 2022 Shared Task: Monolingual AraBERT for Arabic Propaganda Detection and Span Extraction WANLP 2022共享任务:用于阿拉伯语宣传检测和跨度提取的单语AraBERT
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.64
Joseph Attieh, Fadi Hassan
{"title":"Pythoneers at WANLP 2022 Shared Task: Monolingual AraBERT for Arabic Propaganda Detection and Span Extraction","authors":"Joseph Attieh, Fadi Hassan","doi":"10.18653/v1/2022.wanlp-1.64","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.64","url":null,"abstract":"In this paper, we present two deep learning approaches that are based on AraBERT, submitted to the Propaganda Detection shared task of the Seventh Workshop for Arabic Natural Language Processing (WANLP 2022). Propaganda detection consists of two main sub-tasks, mainly propaganda identification and span extraction. We present one system per sub-task. The first system is a Multi-Task Learning model that consists of a shared AraBERT encoder with task-specific binary classification layers. This model is trained to jointly learn one binary classification task per propaganda method. The second system is an AraBERT model with a Conditional Random Field (CRF) layer. We achieved rank 3 on the first sub-task and rank 1 on the second sub-task.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115571969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimizing Naive Bayes for Arabic Dialect Identification 基于朴素贝叶斯算法的阿拉伯语方言识别
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.40
T. Jauhiainen, H. Jauhiainen, Krister Lindén
{"title":"Optimizing Naive Bayes for Arabic Dialect Identification","authors":"T. Jauhiainen, H. Jauhiainen, Krister Lindén","doi":"10.18653/v1/2022.wanlp-1.40","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.40","url":null,"abstract":"This article describes the language identification system used by the SUKI team in the 2022 Nuanced Arabic Dialect Identification (NADI) shared task. In addition to the system description, we give some details of the dialect identification experiments we conducted while preparing our submissions. In the end, we submitted only one official run. We used a Naive Bayes-based language identifier with character n-grams from one to four, of which we implemented a new version, which automatically optimizes its parameters. We also experimented with clustering the training data according to different topics. With the macro F1 score of 0.1963 on test set A and 0.1058 on test set B, we achieved the 18th position out of the 19 competing teams.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132051545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信