ArbEngVec:阿拉伯-英语跨语言词嵌入模型

WANLP@ACL 2019 Pub Date : 2019-07-28 DOI:10.18653/v1/W19-4605

Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, D. Schwab

{"title":"ArbEngVec:阿拉伯-英语跨语言词嵌入模型","authors":"Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, D. Schwab","doi":"10.18653/v1/W19-4605","DOIUrl":null,"url":null,"abstract":"Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model\",\"authors\":\"Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, D. Schwab\",\"doi\":\"10.18653/v1/W19-4605\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.\",\"PeriodicalId\":268163,\"journal\":{\"name\":\"WANLP@ACL 2019\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WANLP@ACL 2019\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W19-4605\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WANLP@ACL 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-4605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

词嵌入(WE)在自然语言处理(NLP)应用中越来越受欢迎和广泛应用，因为它可以有效地捕获词的语义属性;机器翻译(MT)、信息检索(IR)和信息提取(IE)就是其中的几个领域。在本文中，我们提出了一个开源的ArbEngVec，它提供了几个阿拉伯-英语跨语言的词嵌入模型。为了训练我们的双语模型，我们使用了一个拥有超过9300万对阿拉伯语-英语平行句的大型数据集。此外，我们对不同的词嵌入模型变体进行了外在和内在的评估。外在评价评估模型在跨语言语义文本相似度(STS)上的表现，而内在评价基于词翻译(WT)任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model

Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

WANLP@ACL 2019

自引率

0.00%

发文量