Workshop on Arabic Natural Language Processing最新文献

筛选
英文 中文
Domain Adaptation for Arabic Crisis Response 阿拉伯危机应对的领域适应
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.23
Reem AlRashdi, Simon E. M. O'Keefe
{"title":"Domain Adaptation for Arabic Crisis Response","authors":"Reem AlRashdi, Simon E. M. O'Keefe","doi":"10.18653/v1/2022.wanlp-1.23","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.23","url":null,"abstract":"Deep learning algorithms can identify related tweets to reduce the information overload that prevents humanitarian organisations from using valuable Twitter posts. However, they rely heavily on human-labelled data, which are unavailable for emerging crises. Because each crisis has its own features, such as location, time and social media response, current models are known to suffer from generalising to unseen disaster events when pre-trained on past ones.Tweet classifiers for low-resource languages like Arabic has the additional issue of limited labelled data duplicates caused by the absence of good language resources. Thus, we propose a novel domain adaptation approach that employs distant supervision to automatically label tweets from emerging Arabic crisis events to be used to train a model along with available human-labelled data. We evaluate our work on data from seven 2018–2020 Arabic events from different crisis types (flood, explosion, virus and storm). Results show that our method outperforms self-training in identifying crisis-related tweets in real-time scenarios and can be seen as a robust Arabic tweet classifier.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131373401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SI2M & AIOX Labs at WANLP 2022 Shared Task: Propaganda Detection in Arabic, A Data Augmentation and Name Entity Recognition Approach SI2M和AIOX实验室在WANLP 2022共享任务:阿拉伯语的宣传检测,数据增强和名称实体识别方法
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.58
Kamel Gaanoun, Imade Benelallam
{"title":"SI2M & AIOX Labs at WANLP 2022 Shared Task: Propaganda Detection in Arabic, A Data Augmentation and Name Entity Recognition Approach","authors":"Kamel Gaanoun, Imade Benelallam","doi":"10.18653/v1/2022.wanlp-1.58","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.58","url":null,"abstract":"This paper presents SI2M & AIOX Labs work among the propaganda detection in Arabic text shared task. The objective of this challenge is to identify the propaganda techniques used in specific propaganda fragments. We use a combination of data augmentation, Name Entity Recognition, rule-based repetition detection, and ARBERT prediction to develop our system. The model we provide scored 0.585 micro F1-Score and ranked 6th out of 12 teams.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115692351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task TF-IDF或变压器用于阿拉伯语方言识别?itflow参与NADI 2022共享任务
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.42
Fouad Shammary, Yiyi Chen, Z. T. Kardkovács, Mehwish Alam, Haithem Afli
{"title":"TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task","authors":"Fouad Shammary, Yiyi Chen, Z. T. Kardkovács, Mehwish Alam, Haithem Afli","doi":"10.18653/v1/2022.wanlp-1.42","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.42","url":null,"abstract":"This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131695874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Building an Ensemble of Transformer Models for Arabic Dialect Classification and Sentiment Analysis 阿拉伯语方言分类与情感分析的变压器模型集成
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.53
Abdullah Salem Khered, Ingy Yasser Hassan Abdou Abdelhalim, R. Batista-Navarro
{"title":"Building an Ensemble of Transformer Models for Arabic Dialect Classification and Sentiment Analysis","authors":"Abdullah Salem Khered, Ingy Yasser Hassan Abdou Abdelhalim, R. Batista-Navarro","doi":"10.18653/v1/2022.wanlp-1.53","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.53","url":null,"abstract":"In this paper, we describe the approaches we developed for the Nuanced Arabic Dialect Identification (NADI) 2022 shared task, which consists of two subtasks: the identification of country-level Arabic dialects and sentiment analysis. Our team, UniManc, developed approaches to the two subtasks which are underpinned by the same model: a pre-trained MARBERT language model. For Subtask 1, we applied undersampling to create versions of the training data with a balanced distribution across classes. For Subtask 2, we further trained the original MARBERT model for the masked language modelling objective using a NADI-provided dataset of unlabelled Arabic tweets. For each of the subtasks, a MARBERT model was fine-tuned for sequence classification, using different values for hyperparameters such as seed and learning rate. This resulted in multiple model variants, which formed the basis of an ensemble model for each subtask. Based on the official NADI evaluation, our ensemble model obtained a macro-F1-score of 26.863, ranking second overall in the first subtask. In the second subtask, our ensemble model also ranked second, obtaining a macro-F1-PN score (macro-averaged F1-score over the Positive and Negative classes) of 73.544.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132557081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
TUB at WANLP22 Shared Task: Using Semantic Similarity for Propaganda Detection in Arabic 共享任务:使用语义相似度进行阿拉伯语宣传检测
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.57
Salar Mohtaj, Sebastian Möller
{"title":"TUB at WANLP22 Shared Task: Using Semantic Similarity for Propaganda Detection in Arabic","authors":"Salar Mohtaj, Sebastian Möller","doi":"10.18653/v1/2022.wanlp-1.57","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.57","url":null,"abstract":"Propaganda and the spreading of fake news through social media have become a serious problem in recent years. In this paper we present our approach for the shared task on propaganda detection in Arabic in which the goal is to identify propaganda techniques in the Arabic social media text. We propose a semantic similarity detection model to compare text in the test set with the sentences in the train set to find the most similar instances. The label of the target text is obtained from the most similar texts in the train set. The proposed model obtained the micro F1 score of 0.494 on the text data set.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116322674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Generating Classical Arabic Poetry using Pre-trained Models 使用预训练模型生成古典阿拉伯诗歌
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.6
Nehal Elkaref, Mervat Abu-Elkheir, Maryam ElOraby, Mohamed Abdelgaber
{"title":"Generating Classical Arabic Poetry using Pre-trained Models","authors":"Nehal Elkaref, Mervat Abu-Elkheir, Maryam ElOraby, Mohamed Abdelgaber","doi":"10.18653/v1/2022.wanlp-1.6","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.6","url":null,"abstract":"Poetry generation tends to be a complicated task given meter and rhyme constraints. Previous work resorted to exhaustive methods in-order to employ poetic elements. In this paper we leave pre-trained models, GPT-J and BERTShared to recognize patterns of meters and rhyme to generate classical Arabic poetry and present our findings and results on how well both models could pick up on these classical Arabic poetic elements.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129766193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Arabic Keyphrase Extraction: Enhancing Deep Learning Models with Pre-trained Contextual Embedding and External Features 阿拉伯语关键词提取:使用预训练的上下文嵌入和外部特征增强深度学习模型
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.30
Randah Alharbi, H. Al-Muhtaseb
{"title":"Arabic Keyphrase Extraction: Enhancing Deep Learning Models with Pre-trained Contextual Embedding and External Features","authors":"Randah Alharbi, H. Al-Muhtaseb","doi":"10.18653/v1/2022.wanlp-1.30","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.30","url":null,"abstract":"Keyphrase extraction is essential to many Information retrieval (IR) and Natural language Processing (NLP) tasks such as summarization and indexing. This study investigates deep learning approaches to Arabic keyphrase extraction. We address the problem as sequence classification and create a Bi-LSTM model to classify each sequence token as either part of the keyphrase or outside of it. We have extracted word embeddings from two pre-trained models, Word2Vec and BERT. Moreover, we have investigated the effect of incorporating linguistic, positional, and statistical features with word embeddings on performance. Our best-performing model has achieved 0.45 F1-score on ArabicKPE dataset when combining linguistic and positional features with BERT embedding.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130436034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event-Based Knowledge MLM for Arabic Event Detection 基于事件的知识传销阿拉伯语事件检测
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.25
Asma Z. Yamani, Amjad K Alsulami, Rabeah Al-Zaidy
{"title":"Event-Based Knowledge MLM for Arabic Event Detection","authors":"Asma Z. Yamani, Amjad K Alsulami, Rabeah Al-Zaidy","doi":"10.18653/v1/2022.wanlp-1.25","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.25","url":null,"abstract":"With the fast pace of reporting around the globe from various sources, event extraction has increasingly become an important task in NLP. The use of pre-trained language models (PTMs) has become popular to provide contextual representation for downstream tasks. This work aims to pre-train language models that enhance event extraction accuracy. To this end, we propose an Event-Based Knowledge (EBK) masking approach to mask the most significant terms in the event detection task. These significant terms are based on an external knowledge source that is curated for the purpose of event detection for the Arabic language. The proposed approach improves the classification accuracy of all the 9 event types. The experimental results demonstrate the effectiveness of the proposed masking approach and encourage further exploration.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131922404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-lingual transfer for low-resource Arabic language understanding 低资源阿拉伯语理解的跨语迁移
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.21
Khadige Abboud, O. Yu. Golovneva, C. Dipersio
{"title":"Cross-lingual transfer for low-resource Arabic language understanding","authors":"Khadige Abboud, O. Yu. Golovneva, C. Dipersio","doi":"10.18653/v1/2022.wanlp-1.21","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.21","url":null,"abstract":"This paper explores cross-lingual transfer learning in natural language understanding (NLU), with the focus on bootstrapping Arabic from high-resource English and French languages for domain classification, intent classification, and named entity recognition tasks. We adopt a BERT-based architecture and pretrain three models using open-source Wikipedia data and large-scale commercial datasets: monolingual:Arabic, bilingual:Arabic-English, and trilingual:Arabic-English-French models. Additionally, we use off-the-shelf machine translator to translate internal data from source English language to the target Arabic language, in an effort to enhance transfer learning through translation. We conduct experiments that finetune the three models for NLU tasks and evaluate them on a large internal dataset. Despite the morphological, orthographical, and grammatical differences between Arabic and the source languages, transfer learning performance gains from source languages and through machine translation are achieved on a real-world Arabic test dataset in both a zero-shot setting and in a setting when the models are further finetuned on labeled data from the target language.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124641758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Establishing a Baseline for Arabic Patents Classification: A Comparison of Twelve Approaches 建立阿拉伯专利分类基线:12种方法的比较
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.26
Taif Omar Al-Omar, H. Al-Khalifa, Rawan N. Al-Matham
{"title":"Establishing a Baseline for Arabic Patents Classification: A Comparison of Twelve Approaches","authors":"Taif Omar Al-Omar, H. Al-Khalifa, Rawan N. Al-Matham","doi":"10.18653/v1/2022.wanlp-1.26","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.26","url":null,"abstract":"Nowadays, the number of patent applications is constantly growing and there is an economical interest on developing accurate and fast models to automate their classification task. In this paper, we introduce the first public Arabic patent dataset called ArPatent and experiment with twelve classification approaches to develop a baseline for Arabic patents classification. To achieve the goal of finding the best baseline for classifying Arabic patents, different machine learning, pre-trained language models as well as ensemble approaches were conducted. From the obtained results, we can observe that the best performing model for classifying Arabic patents was ARBERT with F1 of 66.53%, while the ensemble approach of the best three performing language models, namely: ARBERT, CAMeL-MSA, and QARiB, achieved the second best F1 score, i.e., 64.52%.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125630482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信