TF-IDF或变压器用于阿拉伯语方言识别?itflow参与NADI 2022共享任务

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.42

Fouad Shammary, Yiyi Chen, Z. T. Kardkovács, Mehwish Alam, Haithem Afli

{"title":"TF-IDF或变压器用于阿拉伯语方言识别?itflow参与NADI 2022共享任务","authors":"Fouad Shammary, Yiyi Chen, Z. T. Kardkovács, Mehwish Alam, Haithem Afli","doi":"10.18653/v1/2022.wanlp-1.42","DOIUrl":null,"url":null,"abstract":"This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task\",\"authors\":\"Fouad Shammary, Yiyi Chen, Z. T. Kardkovács, Mehwish Alam, Haithem Afli\",\"doi\":\"10.18653/v1/2022.wanlp-1.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.\",\"PeriodicalId\":355149,\"journal\":{\"name\":\"Workshop on Arabic Natural Language Processing\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Arabic Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.wanlp-1.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本研究的目标是阿拉伯语自然语言处理研讨会(WANLP)组织的细微差别阿拉伯语方言识别(NADI)的共享任务。它进一步侧重于分任务1，即在国家一级查明阿拉伯语方言。更具体地说，它研究了传统方法(如TF-IDF)的影响，然后继续研究基于高级深度学习方法的影响。这些方法包括完全微调MARBERT，以及基于适配器的微调MARBERT，有或没有执行数据增强。评估结果表明，基于TF-IDF的传统方法在TEST-A数据集上的准确率最高，而在TEST-B数据集上，带有适配器的微调MARBERT在Macro - f1得分上排名第二。这导致所提议的系统在共享任务中平均排名第二。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task

This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量