ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

WANLP@ACL 2019 Pub Date : 2019-07-28 DOI:10.18653/v1/W19-4605

Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, D. Schwab

引用次数: 11

Abstract

Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.

查看原文本刊更多论文

ArbEngVec:阿拉伯-英语跨语言词嵌入模型

词嵌入(WE)在自然语言处理(NLP)应用中越来越受欢迎和广泛应用，因为它可以有效地捕获词的语义属性;机器翻译(MT)、信息检索(IR)和信息提取(IE)就是其中的几个领域。在本文中，我们提出了一个开源的ArbEngVec，它提供了几个阿拉伯-英语跨语言的词嵌入模型。为了训练我们的双语模型，我们使用了一个拥有超过9300万对阿拉伯语-英语平行句的大型数据集。此外，我们对不同的词嵌入模型变体进行了外在和内在的评估。外在评价评估模型在跨语言语义文本相似度(STS)上的表现，而内在评价基于词翻译(WT)任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

WANLP@ACL 2019

自引率

0.00%

发文量