Word Representation Models for Arabic Dialect Identification

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.52

M. Sobhy, Ahmed H. Abu El-Atta, A. El-sawy, Hamada Nayel

引用次数: 2

Abstract

This paper describes the systems submitted by BFCAI team to Nuanced Arabic Dialect Identification (NADI) shared task 2022. Dialect identification task aims at detecting the source variant of a given text or speech segment automatically. There are two subtasks in NADI 2022, the first subtask for country-level identification and the second subtask for sentiment analysis. Our team participated in the first subtask. The proposed systems use Term Frequency Inverse/Document Frequency and word embeddings as vectorization models. Different machine learning algorithms have been used as classifiers. The proposed systems have been tested on two test sets: Test-A and Test-B. The proposed models achieved Macro-f1 score of 21.25% and 9.71% for Test-A and Test-B set respectively. On other hand, the best-performed submitted system achieved Macro-f1 score of 36.48% and 18.95% for Test-A and Test-B set respectively.

查看原文本刊更多论文

阿拉伯语方言识别的词表示模型

本文介绍了BFCAI团队提交给细致入微阿拉伯语方言识别(NADI)共享任务2022的系统。方言识别任务的目的是自动检测给定文本或语音片段的源变体。在NADI 2022中有两个子任务，第一个子任务用于国家层面的识别，第二个子任务用于情感分析。我们小组参加了第一个子任务。提出的系统使用词频率逆/文档频率和词嵌入作为矢量化模型。不同的机器学习算法被用作分类器。所提出的系统已经在两个测试集上进行了测试:test - a和test - b。所提出的模型在Test-A和Test-B集上的Macro-f1得分分别为21.25%和9.71%。另一方面，表现最好的提交系统在Test-A和Test-B集的Macro-f1得分分别为36.48%和18.95%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量