TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.42

Fouad Shammary, Yiyi Chen, Z. T. Kardkovács, Mehwish Alam, Haithem Afli

引用次数: 2

Abstract

This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.

查看原文本刊更多论文

TF-IDF或变压器用于阿拉伯语方言识别?itflow参与NADI 2022共享任务

本研究的目标是阿拉伯语自然语言处理研讨会(WANLP)组织的细微差别阿拉伯语方言识别(NADI)的共享任务。它进一步侧重于分任务1，即在国家一级查明阿拉伯语方言。更具体地说，它研究了传统方法(如TF-IDF)的影响，然后继续研究基于高级深度学习方法的影响。这些方法包括完全微调MARBERT，以及基于适配器的微调MARBERT，有或没有执行数据增强。评估结果表明，基于TF-IDF的传统方法在TEST-A数据集上的准确率最高，而在TEST-B数据集上，带有适配器的微调MARBERT在Macro - f1得分上排名第二。这导致所提议的系统在共享任务中平均排名第二。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量