Arabic Dialect Identification for Travel and Twitter Text

WANLP@ACL 2019 Pub Date : 2019-08-01 DOI:10.18653/v1/W19-4628

Pruthwik Mishra, Vandan Mujadia

引用次数: 12

Abstract

This paper presents the results of the experiments done as a part of MADAR Shared Task in WANLP 2019 on Arabic Fine-Grained Dialect Identification. Dialect Identification is one of the prominent tasks in the field of Natural language processing where the subsequent language modules can be improved based on it. We explored the use of different features like char, word n-gram, language model probabilities, etc on different classifiers. Results show that these features help to improve dialect classification accuracy. Results also show that traditional machine learning classifier tends to perform better when compared to neural network models on this task in a low resource setting.

查看原文本刊更多论文

旅游和推特文本的阿拉伯语方言识别

本文介绍了WANLP 2019中MADAR共享任务中关于阿拉伯语细粒度方言识别的实验结果。方言识别是自然语言处理领域的重要课题之一，可以在此基础上对后续语言模块进行改进。我们探索了在不同的分类器上使用不同的特征，如char、word n-gram、语言模型概率等。结果表明，这些特征有助于提高方言分类的准确率。结果还表明，在低资源环境下，与神经网络模型相比，传统的机器学习分类器往往表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

WANLP@ACL 2019

自引率

0.00%

发文量