SQU-CS @ NADI 2022: Dialectal Arabic Identification using One-vs-One Classification with TF-IDF Weights Computed on Character n-grams

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.45

A. AAlAbdulsalam

引用次数: 1

Abstract

In this paper, I present an approach using one-vs-one classification scheme with TF-IDF term weighting on character n-grams for identifying Arabic dialects used in social media. The scheme was evaluated in the context of the third Nuanced Arabic Dialect Identification (NADI 2022) shared task for identifying Arabic dialects used in Twitter messages. The approach was implemented with logistic regression loss and trained using stochastic gradient decent (SGD) algorithm. This simple method achieved a macro F1 score of 22.89% and 10.83% on TEST A and TEST B, respectively, in comparison to an approach based on AraBERT pretrained transformer model which achieved a macro F1 score of 30.01% and 14.84%, respectively. My submission based on AraBERT scored a macro F1 average of 22.42% and was ranked 10 out of the 19 teams who participated in the task.

查看原文本刊更多论文

sq - cs @ NADI 2022:基于字符n-图计算TF-IDF权重的一对一分类阿拉伯方言识别

在本文中，我提出了一种方法，使用一对一的分类方案，在字符n-图上使用TF-IDF术语加权来识别社交媒体中使用的阿拉伯方言。该方案在第三次细微差别阿拉伯方言识别(NADI 2022)共享任务的背景下进行了评估，该任务旨在识别Twitter消息中使用的阿拉伯方言。该方法采用逻辑回归损失实现，并使用随机梯度体面(SGD)算法进行训练。该方法在TEST a和TEST B上的宏观F1得分分别为22.89%和10.83%，而基于AraBERT预训练变压器模型的方法的宏观F1得分分别为30.01%和14.84%。我基于AraBERT提交的作品在宏观F1平均得分为22.42%，在参加任务的19支队伍中排名第10。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量