Arabic dialect identification using machine learning and transformer-based models: Submission to the NADI 2022 Shared Task

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.50

Nouf AlShenaifi, Aqil M. Azmi

引用次数: 3

Abstract

Arabic has a wide range of dialects. Dialect is the language variation of a specific community. In this paper, we show the models we created to participate in the third Nuanced Arabic Dialect Identification (NADI) shared task (Subtask 1) that involves developing a system to classify a tweet into a country-level dialect. We utilized a number of machine learning techniques as well as deep learning transformer-based models. For the machine learning approach, we build an ensemble classifier of various machine learning models. In our deep learning approach, we consider bidirectional LSTM model and AraBERT pretrained model. The results demonstrate that the deep learning approach performs noticeably better than the other machine learning approaches with 68.7% accuracy on the development set.

查看原文本刊更多论文

使用机器学习和基于转换器的模型识别阿拉伯语方言:提交给NADI 2022共享任务

阿拉伯语有各种各样的方言。方言是某一特定群体的语言变体。在本文中，我们展示了我们为参与第三个细致入微的阿拉伯方言识别(NADI)共享任务(子任务1)而创建的模型，该任务涉及开发一个将tweet分类为国家级方言的系统。我们使用了许多机器学习技术以及基于深度学习转换器的模型。对于机器学习方法，我们建立了各种机器学习模型的集成分类器。在我们的深度学习方法中，我们考虑了双向LSTM模型和AraBERT预训练模型。结果表明，深度学习方法在开发集上的表现明显优于其他机器学习方法，准确率为68.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量