Workshop on Arabic Natural Language Processing最新文献_第6页

On The Arabic Dialects’ Identification: Overcoming Challenges of Geographical Similarities Between Arabic dialects and Imbalanced Datasets 阿拉伯方言识别:克服阿拉伯方言地理相似性和不平衡数据集的挑战

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.49

Salma Jamal, Aly M. Kassem, Omar Mohamed, Ali Ashraf

引用次数: 0

Word Representation Models for Arabic Dialect Identification 阿拉伯语方言识别的词表示模型

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.52

M. Sobhy, Ahmed H. Abu El-Atta, A. El-sawy, Hamada Nayel

引用次数: 2

Arabic Dialect Identification and Sentiment Classification using Transformer-based Models 基于变换模型的阿拉伯语方言识别与情感分类

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.54

Joseph Attieh, Fadi Hassan

引用次数: 3

Learning From Arabic Corpora But Not Always From Arabic Speakers: A Case Study of the Arabic Wikipedia Editions 从阿拉伯语料库中学习，但并不总是从阿拉伯语使用者那里学习:阿拉伯语维基百科版本的案例研究

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.34

Saied Alshahrani, Esma Wali, Jeanna Neefe Matthews

{"title":"Learning From Arabic Corpora But Not Always From Arabic Speakers: A Case Study of the Arabic Wikipedia Editions","authors":"Saied Alshahrani, Esma Wali, Jeanna Neefe Matthews","doi":"10.18653/v1/2022.wanlp-1.34","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.34","url":null,"abstract":"Wikipedia is a common source of training data for Natural Language Processing (NLP) research, especially as a source for corpora in languages other than English. However, for many downstream NLP tasks, it is important to understand the degree to which these corpora reflect representative contributions of native speakers. In particular, many entries in a given language may be translated from other languages or produced through other automated mechanisms. Language models built using corpora like Wikipedia can embed history, culture, bias, stereotypes, politics, and more, but it is important to understand whose views are actually being represented. In this paper, we present a case study focusing specifically on differences among the Arabic Wikipedia editions (Modern Standard Arabic, Egyptian, and Moroccan). In particular, we document issues in the Egyptian Arabic Wikipedia with automatic creation/generation and translation of content pages from English without human supervision. These issues could substantially affect the performance and accuracy of Large Language Models (LLMs) trained from these corpora, producing models that lack the cultural richness and meaningful representation of native speakers. Fortunately, the metadata maintained by Wikipedia provides visibility into these issues, but unfortunately, this is not the case for all corpora used to train LLMs.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"4 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127612819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The Effect of Arabic Dialect Familiarity on Data Annotation 阿拉伯语方言熟悉度对数据标注的影响

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.39

Ibrahim Abu Farha, Walid Magdy

引用次数: 6

Benchmarking transfer learning approaches for sentiment analysis of Arabic dialect 阿拉伯语方言情感分析的标杆迁移学习方法

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.44

Emna Fsih, Saméh Kchaou, Rahma Boujelbane, Lamia Hadrich Belguith

引用次数: 4

SQU-CS @ NADI 2022: Dialectal Arabic Identification using One-vs-One Classification with TF-IDF Weights Computed on Character n-grams sq - cs @ NADI 2022:基于字符n-图计算TF-IDF权重的一对一分类阿拉伯方言识别

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.45

A. AAlAbdulsalam

引用次数: 1