An Arabic Dialects Dictionary Using Word Embeddings

Int. J. Rough Sets Data Anal. Pub Date : 2019-07-01 DOI:10.4018/ijrsda.2019070102

Chaimae Azroumahli, Yacine El Younoussi, Otman Moussaoui, Youssra Zahidi

引用次数: 21

Abstract

The dialectical Arabic and the Modern Standard Arabic lacks sufficient standardized language resources to enable the tasks of Arabic language processing, despite it being an active research area. This work addresses this issue by firstly highlighting the steps and the issues related to building a multi Arabic dialect corpus using web data from blogs and social media platforms (i.e. Facebook, Twitter, etc.). This is to create a vectorized dictionary for the crawled data using the word Embeddings. In other terms, the goal of this article is to build an updated multi-dialect data set, and then, to extract an annotated corpus from it.

查看原文本刊更多论文

使用单词嵌入的阿拉伯方言词典

辩证阿拉伯语和现代标准阿拉伯语虽然是一个活跃的研究领域，但缺乏足够的标准化语言资源来完成阿拉伯语语言处理的任务。这项工作首先强调了使用博客和社交媒体平台(即Facebook, Twitter等)的网络数据构建多阿拉伯方言语料库的步骤和相关问题，从而解决了这一问题。这是为了使用单词嵌入为抓取的数据创建一个矢量化字典。换句话说，本文的目标是构建一个更新的多方言数据集，然后从中提取一个带注释的语料库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Rough Sets Data Anal.

自引率

0.00%

发文量