利用深度跨语言词嵌入来推断准确的系统发育树

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI:10.1145/3371158.3371210

Yashasvi Mantha, Diptesh Kanojia, Abhijeet Dubey, P. Bhattacharyya, Malhar A. Kulkarni

{"title":"利用深度跨语言词嵌入来推断准确的系统发育树","authors":"Yashasvi Mantha, Diptesh Kanojia, Abhijeet Dubey, P. Bhattacharyya, Malhar A. Kulkarni","doi":"10.1145/3371158.3371210","DOIUrl":null,"url":null,"abstract":"Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. However, existing methods face meaning conflation deficiency due to the usage of lexical similarity-based measures. In this paper, we utilize fourteen linked Indian Wordnets to create inter-language distances using our novel approach to compute 'language distances'. Our pilot study uses deep cross-lingual word embeddings to compute inter-language distances and provide an effective distance matrix to infer phylogenetic trees. We also develop a baseline method using lexical similarity-based metrics for comparison and identify that our approach produces better phylogenetic trees which club related languages closer when compared to the baseline approach.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees\",\"authors\":\"Yashasvi Mantha, Diptesh Kanojia, Abhijeet Dubey, P. Bhattacharyya, Malhar A. Kulkarni\",\"doi\":\"10.1145/3371158.3371210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. However, existing methods face meaning conflation deficiency due to the usage of lexical similarity-based measures. In this paper, we utilize fourteen linked Indian Wordnets to create inter-language distances using our novel approach to compute 'language distances'. Our pilot study uses deep cross-lingual word embeddings to compute inter-language distances and provide an effective distance matrix to infer phylogenetic trees. We also develop a baseline method using lexical similarity-based metrics for comparison and identify that our approach produces better phylogenetic trees which club related languages closer when compared to the baseline approach.\",\"PeriodicalId\":360747,\"journal\":{\"name\":\"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3371158.3371210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3371158.3371210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

通过推断系统发育树来建立语言的亲缘关系一直是历时语言学领域的研究热点。然而，由于使用了基于词汇相似度的度量，现有的方法存在词义合并不足的问题。在本文中，我们使用我们的新方法来计算“语言距离”，利用14个链接的印度词网来创建语言间距离。我们的初步研究使用深度跨语言词嵌入来计算语言间距离，并提供有效的距离矩阵来推断系统发育树。我们还开发了一种基线方法，使用基于词汇相似性的度量进行比较，并确定我们的方法产生了更好的系统发育树，与基线方法相比，这些树使相关语言更接近。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees

Establishing language relatedness by inferring phylogenetic trees has been a topic of interest in the area of diachronic linguistics. However, existing methods face meaning conflation deficiency due to the usage of lexical similarity-based measures. In this paper, we utilize fourteen linked Indian Wordnets to create inter-language distances using our novel approach to compute 'language distances'. Our pilot study uses deep cross-lingual word embeddings to compute inter-language distances and provide an effective distance matrix to infer phylogenetic trees. We also develop a baseline method using lexical similarity-based metrics for comparison and identify that our approach produces better phylogenetic trees which club related languages closer when compared to the baseline approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD

自引率

0.00%

发文量