Domain-Adapted BERT-based Models for Nuanced Arabic Dialect Identification and Tweet Sentiment Analysis

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.43

Giyaseddin Bayrak, Abdul Majeed Issifu

引用次数: 4

Abstract

This paper summarizes the solution of the Nuanced Arabic Dialect Identification (NADI) 2022 shared task. It consists of two subtasks: a country-level Arabic Dialect Identification (ADID) and an Arabic Sentiment Analysis (ASA). Our work shows the importance of using domain-adapted models and language-specific pre-processing in NLP task solutions. We implement a simple but strong baseline technique to increase the stability of fine-tuning settings to obtain a good generalization of models. Our best model for the Dialect Identification subtask achieves a Macro F-1 score of 25.54% as an average of both Test-A (33.89%) and Test-B (19.19%) F-1 scores. We also obtained a Macro F-1 score of 74.29% of positive and negative sentiments only, in the Sentiment Analysis task.

查看原文本刊更多论文

基于领域适应bert的阿拉伯语方言识别和微博情感分析模型

本文总结了细致入微的阿拉伯语方言识别(NADI) 2022共享任务的解决方案。它由两个子任务组成:国家级阿拉伯语方言识别(addid)和阿拉伯语情感分析(ASA)。我们的工作显示了在NLP任务解决方案中使用领域适应模型和特定语言预处理的重要性。我们实现了一种简单但强大的基线技术来增加微调设置的稳定性，以获得良好的模型泛化。我们的方言识别子任务的最佳模型在测试a(33.89%)和测试b(19.19%)的F-1分数的平均值下获得了25.54%的宏观F-1分数。在情绪分析任务中，我们也获得了仅正面和负面情绪的宏观F-1得分为74.29%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量