Workshop on Arabic Natural Language Processing最新文献_第4页

AraDepSu: Detecting Depression and Suicidal Ideation in Arabic Tweets Using Transformers AraDepSu:使用变压器检测阿拉伯语推文中的抑郁和自杀意念

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.28

Mariam Hassib, Nancy Hossam, Jolie Sameh, Marwan Torki

{"title":"AraDepSu: Detecting Depression and Suicidal Ideation in Arabic Tweets Using Transformers","authors":"Mariam Hassib, Nancy Hossam, Jolie Sameh, Marwan Torki","doi":"10.18653/v1/2022.wanlp-1.28","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.28","url":null,"abstract":"Among mental health diseases, depression is one of the most severe, as it often leads to suicide which is the fourth leading cause of death in the Middle East. In the Middle East, Egypt has the highest percentage of suicidal deaths; due to this, it is important to identify depression and suicidal ideation. In Arabic culture, there is a lack of awareness regarding the importance of diagnosing and living with mental health diseases. However, as noted for the last couple years people all over the world, including Arab citizens, tend to express their feelings openly on social media. Twitter is the most popular platform designed to enable the expression of emotions through short texts, pictures, or videos. This paper aims to predict depression and depression with suicidal ideation. Due to the tendency of people to treat social media as their personal diaries and share their deepest thoughts on social media platforms. Social data contain valuable information that can be used to identify user’s psychological states. We create AraDepSu dataset by scrapping tweets from twitter and manually labelling them. We expand the diversity of user tweets, by adding a neutral label (“neutral”) so the dataset include three classes (“depressed”, “suicidal”, “neutral”). Then we train our AraDepSu dataset on 30+ different transformer models. We find that the best-performing model is MARBERT with accuracy, precision, recall and F1-Score values of 91.20%, 88.74%, 88.50% and 88.75%.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130903733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect 科威特方言情感分析的弱监督迁移学习方法

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.15

Fatemah Husain, Hana Al-Ostad, Halima Omar

引用次数: 2

CAraNER: The COVID-19 Arabic Named Entity Corpus CAraNER: 2019冠状病毒病阿拉伯命名实体语料库

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.1

A. Al-Thubaity, Sakhar B. Alkhereyfy, Wejdan Al-Zahrani, Alia Bahanshal

{"title":"CAraNER: The COVID-19 Arabic Named Entity Corpus","authors":"A. Al-Thubaity, Sakhar B. Alkhereyfy, Wejdan Al-Zahrani, Alia Bahanshal","doi":"10.18653/v1/2022.wanlp-1.1","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.1","url":null,"abstract":"Named Entity Recognition (NER) is a well-known problem for the natural language processing (NLP) community. It is a key component of different NLP applications, including information extraction, question answering, and information retrieval. In the literature, there are several Arabic NER datasets with different named entity tags; however, due to data and concept drift, we are always in need of new data for NER and other NLP applications. In this paper, first, we introduce Wassem, a web-based annotation platform for Arabic NLP applications. Wassem can be used to manually annotate textual data for a variety of NLP tasks: text classification, sequence classification, and word segmentation. Second, we introduce the COVID-19 Arabic Named Entities Recognition (CAraNER) dataset. CAraNER has 55,389 tokens distributed over 1,278 sentences randomly extracted from Saudi Arabian newspaper articles published during 2019, 2020, and 2021. The dataset is labeled by five annotators with five named-entity tags, namely: Person, Title, Location, Organization, and Miscellaneous. The CAraNER corpus is available for download for free. We evaluate the corpus by finetuning four BERT-based Arabic language models on the CAraNER corpus. The best model was AraBERTv0.2-large with 0.86 for the F1 macro measure.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"16 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134105028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

iCompass at WANLP 2022 Shared Task: ARBERT and MARBERT for Multilabel Propaganda Classification of Arabic Tweets iCompass WANLP 2022共享任务:ARBERT和MARBERT对阿拉伯语推文的多标签宣传分类

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.59

B. Taboubi, Bechir Brahem, H. Haddad

引用次数: 4

Authorship Verification for Arabic Short Texts Using Arabic Knowledge-Base Model (AraKB) 基于阿拉伯文知识库模型的阿拉伯文短文本作者身份验证

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.19

Fatimah Alqahtani, H. Yannakoudakis

引用次数: 0

A Pilot Study on the Collection and Computational Analysis of Linguistic Differences Amongst Men and Women in a Kuwaiti Arabic WhatsApp Dataset 在科威特阿拉伯语WhatsApp数据集中收集和计算分析男女语言差异的试点研究

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.35

Hesah Aldihan, R. Gaizauskas, S. Fitzmaurice

引用次数: 1

Ahmed and Khalil at NADI 2022: Transfer Learning and Addressing Class Imbalance for Arabic Dialect Identification and Sentiment Analysis Ahmed和Khalil在NADI 2022:迁移学习和解决阿拉伯语方言识别和情感分析的阶级不平衡

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.46

Ahmed Oumar, Khalil Mrini

引用次数: 1

Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection Mawqif:用于目标特定姿态检测的多标签阿拉伯语数据集

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.16

Nora S. Alturayeif, H.A. Luqman, Moataz Aly Kamaleldin Ahmed

{"title":"Mawqif: A Multi-label Arabic Dataset for Target-specific Stance Detection","authors":"Nora S. Alturayeif, H.A. Luqman, Moataz Aly Kamaleldin Ahmed","doi":"10.18653/v1/2022.wanlp-1.16","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.16","url":null,"abstract":"Social media platforms are becoming inherent parts of people’s daily life to express opinions and stances toward topics of varying polarities. Stance detection determines the viewpoint expressed in a text toward a target. While communication on social media (e.g., Twitter) takes place in more than 40 languages, the majority of stance detection research has been focused on English. Although some efforts have recently been made to develop stance detection datasets in other languages, no similar efforts seem to have considered the Arabic language. In this paper, we present Mawqif, the first Arabic dataset for target-specific stance detection, composed of 4,121 tweets annotated with stance, sentiment, and sarcasm polarities. Mawqif, as a multi-label dataset, can provide more opportunities for studying the interaction between different opinion dimensions and evaluating a multi-task model. We provide a detailed description of the dataset, present an analysis of the produced annotation, and evaluate four BERT-based models on it. Our best model achieves a macro-F1 of 78.89%, which shows that there is ample room for improvement on this challenging task. We publicly release our dataset, the annotation guidelines, and the code of the experiments.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115484504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Pythoneers at WANLP 2022 Shared Task: Monolingual AraBERT for Arabic Propaganda Detection and Span Extraction WANLP 2022共享任务:用于阿拉伯语宣传检测和跨度提取的单语AraBERT

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.64

Joseph Attieh, Fadi Hassan

引用次数: 3

Optimizing Naive Bayes for Arabic Dialect Identification 基于朴素贝叶斯算法的阿拉伯语方言识别

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.40

T. Jauhiainen, H. Jauhiainen, Krister Lindén

引用次数: 3