Using ancestral state reconstruction methods for onomasiological reconstruction in multilingual word lists

IF 1.2 0 LANGUAGE & LINGUISTICS

Language Dynamics and Change Pub Date : 2018-06-22 DOI:10.1163/22105832-00801002

Gerhard Jäger, Johann-Mattis List

{"title":"Using ancestral state reconstruction methods for onomasiological reconstruction in multilingual word lists","authors":"Gerhard Jäger, Johann-Mattis List","doi":"10.1163/22105832-00801002","DOIUrl":null,"url":null,"abstract":"Current efforts in computational historical linguistics are predominantly concerned with phylogenetic inference. Methods for ancestral state reconstruction have only been applied sporadically. In contrast to phylogenetic algorithms, automatic reconstruction methods presuppose phylogenetic information in order to explain what has evolved when and where. Here we report a pilot study exploring how well automatic methods for ancestral state reconstruction perform in the task of onomasiological reconstruction in multilingual word lists, where algorithms are used to infer how the words evolved along a given phylogeny, and reconstruct which cognate classes were used to express a given meaning in the ancestral languages. Comparing three different methods, Maximum Parsimony, Minimal Lateral Networks, and Maximum Likelihood on three different test sets (Indo-European, Austronesian, Chinese) using binary and multi-state coding of the data as well as single and sampled phylogenies, we find that Maximum Likelihood largely outperforms the other methods. At the same time, however, the general performance was disappointingly low, ranging between 0.66 (Chinese) and 0.79 (Austronesian) for the F-Scores. A closer linguistic evaluation of the reconstructions proposed by the best method and the reconstructions given in the gold standards revealed that the majority of the cases where the algorithms failed can be attributed to problems of independent semantic shift (homoplasy), to morphological processes in lexical change, and to wrong reconstructions in the independently created test sets that we employed.","PeriodicalId":43113,"journal":{"name":"Language Dynamics and Change","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2018-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1163/22105832-00801002","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Dynamics and Change","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1163/22105832-00801002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 13

Abstract

Current efforts in computational historical linguistics are predominantly concerned with phylogenetic inference. Methods for ancestral state reconstruction have only been applied sporadically. In contrast to phylogenetic algorithms, automatic reconstruction methods presuppose phylogenetic information in order to explain what has evolved when and where. Here we report a pilot study exploring how well automatic methods for ancestral state reconstruction perform in the task of onomasiological reconstruction in multilingual word lists, where algorithms are used to infer how the words evolved along a given phylogeny, and reconstruct which cognate classes were used to express a given meaning in the ancestral languages. Comparing three different methods, Maximum Parsimony, Minimal Lateral Networks, and Maximum Likelihood on three different test sets (Indo-European, Austronesian, Chinese) using binary and multi-state coding of the data as well as single and sampled phylogenies, we find that Maximum Likelihood largely outperforms the other methods. At the same time, however, the general performance was disappointingly low, ranging between 0.66 (Chinese) and 0.79 (Austronesian) for the F-Scores. A closer linguistic evaluation of the reconstructions proposed by the best method and the reconstructions given in the gold standards revealed that the majority of the cases where the algorithms failed can be attributed to problems of independent semantic shift (homoplasy), to morphological processes in lexical change, and to wrong reconstructions in the independently created test sets that we employed.

查看原文本刊更多论文

基于祖先状态重构方法的多语词表拟声重建

目前计算历史语言学主要关注系统发育推断。重建祖先状态的方法只是偶尔应用。与系统发育算法相反，自动重建方法以系统发育信息为前提，以解释什么在何时何地进化。在这里，我们报告了一项试点研究，探索祖先状态重建的自动方法在多语言单词列表中的词汇重建任务中的表现，其中算法用于推断单词如何沿着给定的系统发育进化，并重建哪些同源类用于表达祖先语言中的给定含义。在三个不同的测试集（印欧语、南岛语、汉语）上，使用数据的二进制和多状态编码以及单样本和采样系统发育，比较三种不同的方法，即最大解析法、最小横向网络和最大似然法，我们发现最大似然法在很大程度上优于其他方法。然而，与此同时，总体表现却低得令人失望，F分在0.66（中国人）和0.79（南岛人）之间。对最佳方法提出的重建和黄金标准中给出的重建进行了更仔细的语言学评估，结果表明，算法失败的大多数情况可归因于独立语义转移（同源性）问题、词汇变化中的形态过程、，以及我们使用的独立创建的测试集中的错误重建。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Language Dynamics and Change LANGUAGE & LINGUISTICS-

CiteScore

2.30

自引率

0.00%

发文量

期刊介绍： Language Dynamics and Change (LDC) is an international peer-reviewed journal that covers both new and traditional aspects of the study of language change. Work on any language or language family is welcomed, as long as it bears on topics that are also of theoretical interest. A particular focus is on new developments in the field arising from the accumulation of extensive databases of dialect variation and typological distributions, spoken corpora, parallel texts, and comparative lexicons, which allow for the application of new types of quantitative approaches to diachronic linguistics. Moreover, the journal will serve as an outlet for increasingly important interdisciplinary work on such topics as the evolution of language, archaeology and linguistics (‘archaeolinguistics’), human genetic and linguistic prehistory, and the computational modeling of language dynamics.