计算历史语言学中的序列比较

IF 2.1 0 LANGUAGE & LINGUISTICS

Journal of Language Evolution Pub Date : 2018-07-01 DOI:10.1093/JOLE/LZY006

Johann-Mattis List, M. Walworth, Simon J. Greenhill, Tiago Tresoldi, Robert Forkel

{"title":"计算历史语言学中的序列比较","authors":"Johann-Mattis List, M. Walworth, Simon J. Greenhill, Tiago Tresoldi, Robert Forkel","doi":"10.1093/JOLE/LZY006","DOIUrl":null,"url":null,"abstract":"With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/JOLE/LZY006","citationCount":"40","resultStr":"{\"title\":\"Sequence comparison in computational historical linguistics\",\"authors\":\"Johann-Mattis List, M. Walworth, Simon J. Greenhill, Tiago Tresoldi, Robert Forkel\",\"doi\":\"10.1093/JOLE/LZY006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.\",\"PeriodicalId\":37118,\"journal\":{\"name\":\"Journal of Language Evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1093/JOLE/LZY006\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Language Evolution\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/JOLE/LZY006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Language Evolution","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/JOLE/LZY006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 40

摘要

随着来自世界各地的数字化数据的不断增加，历史语言学中多语言词表中同源词的手工标注变得越来越耗时。在人工分析之前，使用可用的软件包对数据进行预处理可以大大加快同源检测的过程。此外，它使我们能够对尚未被专家深入研究的数据进行快速概述。LingPy是一个Python库，它为历史语言学中的序列比较提供了大量例程。使用LingPy，语言学家不仅可以在词汇数据中自动搜索同源词，还可以对自动识别的词进行对齐，并以各种形式输出，以方便人工检查。在本教程中，我们将简要介绍LingPy使用的算法背后的基本概念，然后在具体的工作流中说明如何将自动序列比较应用于多语言单词列表。目标是为读者提供他们所需的所有信息(1)在LingPy中执行同源检测和对齐分析，(2)为适当的任务选择适当的算法，(3)评估自动同源检测算法与专家相比的表现如何，以及(4)将其数据导出为各种格式，用于其他分析或数据共享。虽然Python语言的基本知识对所有分析都很有用，但我们的教程的结构使具有基本计算知识的学者也可以遵循所有步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sequence comparison in computational historical linguistics

With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Language Evolution Social Sciences-Linguistics and Language

CiteScore

4.50

自引率

7.70%

发文量