构建语义技术的全局字典

IberSPEECH Conference Pub Date : 2018-11-21 DOI:10.21437/IBERSPEECH.2018-60

Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda

{"title":"构建语义技术的全局字典","authors":"Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda","doi":"10.21437/IBERSPEECH.2018-60","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel method for ﬁnding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two speciﬁc languages, but between a given language and a shared, universal space. The system was trained in two different modes, ﬁrst between two languages, and after that applying three languages at the same time. In the ﬁrst case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting signiﬁcantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Building a global dictionary for semantic technologies\",\"authors\":\"Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda\",\"doi\":\"10.21437/IBERSPEECH.2018-60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a novel method for ﬁnding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two speciﬁc languages, but between a given language and a shared, universal space. The system was trained in two different modes, ﬁrst between two languages, and after that applying three languages at the same time. In the ﬁrst case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting signiﬁcantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.\",\"PeriodicalId\":115963,\"journal\":{\"name\":\"IberSPEECH Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IberSPEECH Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/IBERSPEECH.2018-60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种寻找不同语言词向量间线性映射的新方法。与之前的方法相比，该方法不学习两种特定语言之间的翻译矩阵，而是学习给定语言和共享的通用空间之间的翻译矩阵。该系统以两种不同的模式进行训练，首先在两种语言之间进行训练，然后同时使用三种语言进行训练。在第一种情况下，使用了两个不同的训练数据;Dinu的英语-意大利语基准数据[1]和从PanLex数据库中提取的英语-意大利语翻译对[2]。在第二种情况下，只使用了PanLex数据库。该系统在最佳设置下对英语-意大利语的表现明显优于Mikolov等人[3]的基线系统，并与Faruqui和Dyer[4]以及Dinu等人[1]的更复杂的系统提供相当的性能。该方法利用PanLex数据库的丰富性，使学习任意数量语言之间的线性映射成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Building a global dictionary for semantic technologies

This paper proposes a novel method for ﬁnding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two speciﬁc languages, but between a given language and a shared, universal space. The system was trained in two different modes, ﬁrst between two languages, and after that applying three languages at the same time. In the ﬁrst case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting signiﬁcantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IberSPEECH Conference

自引率

0.00%

发文量