Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda
{"title":"构建语义技术的全局字典","authors":"Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda","doi":"10.21437/IBERSPEECH.2018-60","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel method for finding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two specific languages, but between a given language and a shared, universal space. The system was trained in two different modes, first between two languages, and after that applying three languages at the same time. In the first case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Building a global dictionary for semantic technologies\",\"authors\":\"Eszter Iklódi, Gábor Recski, Gábor Borbély, María José Castro Bleda\",\"doi\":\"10.21437/IBERSPEECH.2018-60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a novel method for finding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two specific languages, but between a given language and a shared, universal space. The system was trained in two different modes, first between two languages, and after that applying three languages at the same time. In the first case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.\",\"PeriodicalId\":115963,\"journal\":{\"name\":\"IberSPEECH Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IberSPEECH Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/IBERSPEECH.2018-60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building a global dictionary for semantic technologies
This paper proposes a novel method for finding linear mappings among word vectors for various languages. Compared to previous approaches, this method does not learn translation matrices between two specific languages, but between a given language and a shared, universal space. The system was trained in two different modes, first between two languages, and after that applying three languages at the same time. In the first case two different training data were applied; Dinu’s English-Italian benchmark data [1], and English-Italian translation pairs extracted from the PanLex database [2]. In the second case only the PanLex database was used. The system performs on English-Italian languages with the best setting significantly better than the baseline system of Mikolov et al. [3], and it provides a comparable performance with the more sophisticated systems of Faruqui and Dyer [4] and Dinu et al. [1]. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number languages.