Workshop on Arabic Natural Language Processing最新文献

筛选
英文 中文
SAIDS: A Novel Approach for Sentiment Analysis Informed of Dialect and Sarcasm SAIDS:根据方言和讽刺进行情感分析的新方法
Workshop on Arabic Natural Language Processing Pub Date : 2023-01-06 DOI: 10.48550/arXiv.2301.02521
Abdelrahman Kaseb, Mona Farouk
{"title":"SAIDS: A Novel Approach for Sentiment Analysis Informed of Dialect and Sarcasm","authors":"Abdelrahman Kaseb, Mona Farouk","doi":"10.48550/arXiv.2301.02521","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02521","url":null,"abstract":"Sentiment analysis becomes an essential part of every social network, as it enables decision-makers to know more about users’ opinions in almost all life aspects. Despite its importance, there are multiple issues it encounters like the sentiment of the sarcastic text which is one of the main challenges of sentiment analysis. This paper tackles this challenge by introducing a novel system (SAIDS) that predicts the sentiment, sarcasm and dialect of Arabic tweets. SAIDS uses its prediction of sarcasm and dialect as known information to predict the sentiment. It uses MARBERT as a language model to generate sentence embedding, then passes it to the sarcasm and dialect models, and then the outputs of the three models are concatenated and passed to the sentiment analysis model. Multiple system design setups were experimented with and reported. SAIDS was applied to the ArSarcasm-v2 dataset where it outperforms the state-of-the-art model for the sentiment analysis task. By training all tasks together, SAIDS achieves results of 75.98 FPN, 59.09 F1-score and 71.13 F1-score for sentiment analysis, sarcasm detection, and dialect identification respectively. The system design can be used to enhance the performance of any task which is dependent on other tasks.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129176136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
End-to-End Speech Translation of Arabic to English Broadcast News 阿拉伯语到英语广播新闻的端到端语音翻译
Workshop on Arabic Natural Language Processing Pub Date : 2022-12-11 DOI: 10.48550/arXiv.2212.05479
Fethi Bougares, Salim Jouili
{"title":"End-to-End Speech Translation of Arabic to English Broadcast News","authors":"Fethi Bougares, Salim Jouili","doi":"10.48550/arXiv.2212.05479","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05479","url":null,"abstract":"Speech translation (ST) is the task of directly translating acoustic speech signals in a source language into text in a foreign language. ST task has been addressed, for a long time, using a pipeline approach with two modules : first an Automatic Speech Recognition (ASR) in the source language followed by a text-to-text Machine translation (MT). In the past few years, we have seen a paradigm shift towards the end-to-end approaches using sequence-to-sequence deep neural network models. This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system. Starting from independent ASR and MT LDC releases, we were able to identify about 92 hours of Arabic audio recordings for which the manual transcription was also translated into English at the segment level. These data was used to train and compare pipeline and end-to-end speech translation systems under multiple scenarios including transfer learning and data augmentation techniques.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117136151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IITD at WANLP 2022 Shared Task: Multilingual Multi-Granularity Network for Propaganda Detection IITD在WANLP 2022共享任务:用于宣传检测的多语言多粒度网络
Workshop on Arabic Natural Language Processing Pub Date : 2022-10-31 DOI: 10.48550/arXiv.2210.17190
Shubham Mittal, Preslav Nakov
{"title":"IITD at WANLP 2022 Shared Task: Multilingual Multi-Granularity Network for Propaganda Detection","authors":"Shubham Mittal, Preslav Nakov","doi":"10.48550/arXiv.2210.17190","DOIUrl":"https://doi.org/10.48550/arXiv.2210.17190","url":null,"abstract":"We present our system for the two subtasks of the shared task on propaganda detection in Arabic, part of WANLP’2022. Subtask 1 is a multi-label classification problem to find the propaganda techniques used in a given tweet. Our system for this task uses XLM-R to predict probabilities for the target tweet to use each of the techniques. In addition to finding the techniques, subtask 2 further asks to identify the textual span for each instance of each technique that is present in the tweet; the task can be modelled as a sequence tagging problem. We use a multi-granularity network with mBERT encoder for subtask 2. Overall, our system ranks second for both subtasks (out of 14 and 3 participants, respectively). Our experimental results and analysis show that it does not help to use a much larger English corpus annotated with propaganda techniques, regardless of whether used in English or after translation to Arabic.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128067431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Maknuune: A Large Open Palestinian Arabic Lexicon Maknuune:一个大的开放巴勒斯坦阿拉伯语词典
Workshop on Arabic Natural Language Processing Pub Date : 2022-10-24 DOI: 10.48550/arXiv.2210.12985
Shahd Dibas, Christian Khairallah, Nizar Habash, Omar Fayez Sadi, Tariq Sairafy, Karmel Sarabta, Abrar Ardah
{"title":"Maknuune: A Large Open Palestinian Arabic Lexicon","authors":"Shahd Dibas, Christian Khairallah, Nizar Habash, Omar Fayez Sadi, Tariq Sairafy, Karmel Sarabta, Abrar Ardah","doi":"10.48550/arXiv.2210.12985","DOIUrl":"https://doi.org/10.48550/arXiv.2210.12985","url":null,"abstract":"We present Maknuune, a large open lexicon for the Palestinian Arabic dialect. Maknuune has over 36K entries from 17K lemmas, and 3.7K roots. All entries include diacritized Arabic orthography, phonological transcription and English glosses. Some entries are enriched with additional information such as broken plurals and templatic feminine forms, associated phrases and collocations, Standard Arabic glosses, and examples or notes on grammar, usage, or location of collected entry","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114422547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Shared Task on Gender Rewriting 性别重写的共同任务
Workshop on Arabic Natural Language Processing Pub Date : 2022-10-22 DOI: 10.48550/arXiv.2210.12410
Bashar Alhafni, Nizar Habash, Houda Bouamor, Ossama Obeid, Sultan Alrowili, D. Alzeer, Khawla AlShanqiti, Ahmed Elbakry, Muhammad N. ElNokrashy, Mohamed Gabr, Abderrahmane Issam, Abdel-Naser Qaddoumi, K. Vijay-Shanker, Mahmoud Zyate
{"title":"The Shared Task on Gender Rewriting","authors":"Bashar Alhafni, Nizar Habash, Houda Bouamor, Ossama Obeid, Sultan Alrowili, D. Alzeer, Khawla AlShanqiti, Ahmed Elbakry, Muhammad N. ElNokrashy, Mohamed Gabr, Abderrahmane Issam, Abdel-Naser Qaddoumi, K. Vijay-Shanker, Mahmoud Zyate","doi":"10.48550/arXiv.2210.12410","DOIUrl":"https://doi.org/10.48550/arXiv.2210.12410","url":null,"abstract":"In this paper, we present the results and findings of the Shared Task on Gender Rewriting, which was organized as part of the Seventh Arabic Natural Language Processing Workshop. The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., a female speaker with a male listener, a male speaker with a male listener, etc.). This requires changing the grammatical gender (masculine or feminine) of certain words referring to the users. In this task, we focus on Arabic, a gender-marking morphologically rich language. A total of five teams from four countries participated in the shared task.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130044940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Joint Coreference Resolution for Zeros and non-Zeros in Arabic 阿拉伯语零和非零联合共同参考决议
Workshop on Arabic Natural Language Processing Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.12169
Abdulrahman Aloraini, Sameer Pradhan, Massimo Poesio
{"title":"Joint Coreference Resolution for Zeros and non-Zeros in Arabic","authors":"Abdulrahman Aloraini, Sameer Pradhan, Massimo Poesio","doi":"10.48550/arXiv.2210.12169","DOIUrl":"https://doi.org/10.48550/arXiv.2210.12169","url":null,"abstract":"Most existing proposals about anaphoric zero pronoun (AZP) resolution regard full mention coreference and AZP resolution as two independent tasks, even though the two tasks are clearly related. The main issues that need tackling to develop a joint model for zero and non-zero mentions are the difference between the two types of arguments (zero pronouns, being null, provide no nominal information) and the lack of annotated datasets of a suitable size in which both types of arguments are annotated for languages other than Chinese and Japanese. In this paper, we introduce two architectures for jointly resolving AZPs and non-AZPs, and evaluate them on Arabic, a language for which, as far as we know, there has been no prior work on joint resolution. Doing this also required creating a new version of the Arabic subset of the standard coreference resolution dataset used for the CoNLL-2012 shared task (Pradhan et al.,2012) in which both zeros and non-zeros are included in a single dataset.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127257268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MANorm: A Normalization Dictionary for Moroccan Arabic Dialect Written in Latin Script 用拉丁文字书写的摩洛哥阿拉伯语方言规范化词典
Workshop on Arabic Natural Language Processing Pub Date : 2022-06-18 DOI: 10.48550/arXiv.2206.09167
Randa Zarnoufi, H. Jaafar, Walid Bachri, Mounia Abik
{"title":"MANorm: A Normalization Dictionary for Moroccan Arabic Dialect Written in Latin Script","authors":"Randa Zarnoufi, H. Jaafar, Walid Bachri, Mounia Abik","doi":"10.48550/arXiv.2206.09167","DOIUrl":"https://doi.org/10.48550/arXiv.2206.09167","url":null,"abstract":"Social media user generated text is actually the main resource for many NLP tasks. This text, however, does not follow the standard rules of writing. Moreover, the use of dialect such as Moroccan Arabic in written communications increases further NLP tasks complexity. A dialect is a verbal language that does not have a standard orthography. The written dialect is based on the phonetic transliteration of spoken words which leads users to improvise spelling while writing. Thus, for the same word we can find multiple forms of transliterations. Subsequently, it is mandatory to normalize these different transliterations to one canonical word form. To reach this goal, we have exploited the powerfulness of word embedding models generated with a corpus of YouTube comments. Besides, using a Moroccan Arabic dialect dictionary that provides the canonical forms, we have built a normalization dictionary that we refer to as MANorm. We have conducted several experiments to demonstrate the efficiency of MANorm, which have shown its usefulness in dialect normalization. We made MANorm freely available online.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134028897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Emoji Sentiment Roles for Sentiment Analysis: A Case Study in Arabic Texts 表情符号情感分析中的情感角色:阿拉伯语文本的案例研究
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.32
Shatha Ali A. Hakami, R. Hendley, Phillip Smith
{"title":"Emoji Sentiment Roles for Sentiment Analysis: A Case Study in Arabic Texts","authors":"Shatha Ali A. Hakami, R. Hendley, Phillip Smith","doi":"10.18653/v1/2022.wanlp-1.32","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.32","url":null,"abstract":"Emoji (digital pictograms) are crucial features for textual sentiment analysis. However, analysing the sentiment roles of emoji is very complex. This is due to its dependency on different factors, such as textual context, cultural perspective, interlocutor’s personal traits, interlocutors’ relationships or a platforms’ functional features. This work introduces an approach to analysing the sentiment effects of emoji as textual features. Using an Arabic dataset as a benchmark, our results confirm the borrowed argument that each emoji has three different norms of sentiment role (negative, neutral or positive). Therefore, an emoji can play different sentiment roles depending upon the context. It can behave as an emphasizer, an indicator, a mitigator, a reverser or a trigger of either negative or positive sentiment within a text. In addition, an emoji may have a neutral effect (i.e., no effect) on the sentiment of the text.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123398032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Weakly and Semi-Supervised Learning for Arabic Text Classification using Monodialectal Language Models 基于单方言语言模型的阿拉伯文本分类弱和半监督学习
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.24
Reem AlYami, Rabah A. Al-Zaidy
{"title":"Weakly and Semi-Supervised Learning for Arabic Text Classification using Monodialectal Language Models","authors":"Reem AlYami, Rabah A. Al-Zaidy","doi":"10.18653/v1/2022.wanlp-1.24","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.24","url":null,"abstract":"The lack of resources such as annotated datasets and tools for low-resource languages is a significant obstacle to the advancement of Natural Language Processing (NLP) applications targeting users who speak these languages. Although learning techniques such as semi-supervised and weakly supervised learning are effective in text classification cases where annotated data is limited, they are still not widely investigated in many languages due to the sparsity of data altogether, both labeled and unlabeled. In this study, we deploy both weakly, and semi-supervised learning approaches for text classification in low-resource languages and address the underlying limitations that can hinder the effectiveness of these techniques. To that end, we propose a suite of language-agnostic techniques for large-scale data collection, automatic data annotation, and language model training in scenarios where resources are scarce. Specifically, we propose a novel data collection pipeline for under-represented languages, or dialects, that is language and task agnostic and of sufficient size for training a language model capable of achieving competitive results on common NLP tasks, as our experiments show. The models will be shared with the research community.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122024974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iCompass Working Notes for the Nuanced Arabic Dialect Identification Shared task 精细阿拉伯语方言识别共享任务的iCompass工作笔记
Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.wanlp-1.41
Abir Messaoudi, Chayma Fourati, H. Haddad, Moez BenHajhmida
{"title":"iCompass Working Notes for the Nuanced Arabic Dialect Identification Shared task","authors":"Abir Messaoudi, Chayma Fourati, H. Haddad, Moez BenHajhmida","doi":"10.18653/v1/2022.wanlp-1.41","DOIUrl":"https://doi.org/10.18653/v1/2022.wanlp-1.41","url":null,"abstract":"We describe our submitted system to the Nuanced Arabic Dialect Identification (NADI) shared task. We tackled only the first subtask (Subtask 1). We used state-of-the-art Deep Learning models and pre-trained contextualized text representation models that we finetuned according to the downstream task in hand. As a first approach, we used BERT Arabic variants: MARBERT with its two versions MARBERT v1 and MARBERT v2, we combined MARBERT embeddings with a CNN classifier, and finally, we tested the Quasi-Recurrent Neural Networks (QRNN) model. The results found show that version 2 of MARBERT outperforms all of the previously mentioned models on Subtask 1.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信