{"title":"WEWD: A Combined Approach for Measuring Cross-lingual Semantic Word Similarity Based on Word Embeddings and Word Definitions","authors":"Van-Tan Bui, Phuong-Thai Nguyen","doi":"10.1109/RIVF51545.2021.9642084","DOIUrl":null,"url":null,"abstract":"Cross-lingual semantic word similarity (CLSW) ad- dresses the task of estimating the semantic distance between two words across languages. This task is an important component in many natural language processing applications. Recent studies have proposed several effective CLSW models for resource- rich language pairs such as English-German, English-French. However, This task has not been effectively addressed for language pairs consisting of Vietnamese and another one. In this paper, we propose a neural network model that exploits cross- lingual lexical resources to learn high-quality cross-lingual word embedding models. Since our neural network model is language- independent, it can learn a truly multilingual space. Furthermore, we introduce a novel cross-lingual semantic word similarity measurement method based on Word Embeddings and Word Definitions (WEWD). Last but not least, we introduce a standard Vietnamese-English dataset for the cross-lingual semantic word similarity measurement task (VESim-1000). The experimental results show that our proposed method is more robust and outperforms current state-of-the-art methods that are only based on word embeddings or lexical resources.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"94 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF51545.2021.9642084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-lingual semantic word similarity (CLSW) ad- dresses the task of estimating the semantic distance between two words across languages. This task is an important component in many natural language processing applications. Recent studies have proposed several effective CLSW models for resource- rich language pairs such as English-German, English-French. However, This task has not been effectively addressed for language pairs consisting of Vietnamese and another one. In this paper, we propose a neural network model that exploits cross- lingual lexical resources to learn high-quality cross-lingual word embedding models. Since our neural network model is language- independent, it can learn a truly multilingual space. Furthermore, we introduce a novel cross-lingual semantic word similarity measurement method based on Word Embeddings and Word Definitions (WEWD). Last but not least, we introduce a standard Vietnamese-English dataset for the cross-lingual semantic word similarity measurement task (VESim-1000). The experimental results show that our proposed method is more robust and outperforms current state-of-the-art methods that are only based on word embeddings or lexical resources.