Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic Dialects

WANLP@ACL 2019 Pub Date : 2019-08-01 DOI:10.18653/v1/W19-4604

Hala Mulki, Hatem Haddad, Mourad Gridach, Ismail Babaoglu

引用次数: 7

Abstract

Arabic sentiment analysis models have employed compositional embedding features to represent the Arabic dialectal content. These embeddings are usually composed via ordered, syntax-aware composition functions and learned within deep neural frameworks. With the free word order and the varying syntax nature across the different Arabic dialects, a sentiment analysis system developed for one dialect might not be efficient for the others. Here we present syntax-ignorant n-gram embeddings to be used in sentiment analysis of several Arabic dialects. The proposed embeddings were composed and learned using an unordered composition function and a shallow neural model. Five datasets of different dialects were used to evaluate the produced embeddings in the sentiment analysis task. The obtained results revealed that, our syntax-ignorant embeddings could outperform word2vec model and doc2vec both variant models in addition to hand-crafted system baselines, while a competent performance was noticed towards baseline systems that adopted more complicated neural architectures.

查看原文本刊更多论文

无语法的阿拉伯语方言情感分析N-gram嵌入

阿拉伯语情感分析模型采用组合嵌入特征来表示阿拉伯语方言内容。这些嵌入通常通过有序的、语法感知的组合函数组成，并在深度神经框架中学习。由于不同阿拉伯语方言的词序和语法性质不同，针对一种方言开发的情感分析系统可能对其他方言无效。在这里，我们提出了语法无关的n-gram嵌入用于几种阿拉伯方言的情感分析。所提出的嵌入使用无序组合函数和浅神经模型进行组合和学习。在情感分析任务中，使用了五个不同方言的数据集来评估生成的嵌入。得到的结果表明，除了手工制作的系统基线之外，我们的无语法嵌入可以优于word2vec模型和doc2vec这两种变体模型，而对于采用更复杂神经架构的基线系统，我们注意到有足够的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

WANLP@ACL 2019

自引率

0.00%

发文量