Arabic dialect identification using machine learning and transformer-based models: Submission to the NADI 2022 Shared Task

Nouf AlShenaifi, Aqil M. Azmi
{"title":"Arabic dialect identification using machine learning and transformer-based models: Submission to the NADI 2022 Shared Task","authors":"Nouf AlShenaifi, Aqil M. Azmi","doi":"10.18653/v1/2022.wanlp-1.50","DOIUrl":null,"url":null,"abstract":"Arabic has a wide range of dialects. Dialect is the language variation of a specific community. In this paper, we show the models we created to participate in the third Nuanced Arabic Dialect Identification (NADI) shared task (Subtask 1) that involves developing a system to classify a tweet into a country-level dialect. We utilized a number of machine learning techniques as well as deep learning transformer-based models. For the machine learning approach, we build an ensemble classifier of various machine learning models. In our deep learning approach, we consider bidirectional LSTM model and AraBERT pretrained model. The results demonstrate that the deep learning approach performs noticeably better than the other machine learning approaches with 68.7% accuracy on the development set.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Arabic has a wide range of dialects. Dialect is the language variation of a specific community. In this paper, we show the models we created to participate in the third Nuanced Arabic Dialect Identification (NADI) shared task (Subtask 1) that involves developing a system to classify a tweet into a country-level dialect. We utilized a number of machine learning techniques as well as deep learning transformer-based models. For the machine learning approach, we build an ensemble classifier of various machine learning models. In our deep learning approach, we consider bidirectional LSTM model and AraBERT pretrained model. The results demonstrate that the deep learning approach performs noticeably better than the other machine learning approaches with 68.7% accuracy on the development set.
使用机器学习和基于转换器的模型识别阿拉伯语方言:提交给NADI 2022共享任务
阿拉伯语有各种各样的方言。方言是某一特定群体的语言变体。在本文中,我们展示了我们为参与第三个细致入微的阿拉伯方言识别(NADI)共享任务(子任务1)而创建的模型,该任务涉及开发一个将tweet分类为国家级方言的系统。我们使用了许多机器学习技术以及基于深度学习转换器的模型。对于机器学习方法,我们建立了各种机器学习模型的集成分类器。在我们的深度学习方法中,我们考虑了双向LSTM模型和AraBERT预训练模型。结果表明,深度学习方法在开发集上的表现明显优于其他机器学习方法,准确率为68.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信