Lexical diversity as a lens into the classification of Slavic languages: A quantitative typology perspective

IF 0.7 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY
Chenliang Zhou, Haitao Liu
{"title":"Lexical diversity as a lens into the classification of Slavic languages: A quantitative typology perspective","authors":"Chenliang Zhou, Haitao Liu","doi":"10.1093/llc/fqad042","DOIUrl":null,"url":null,"abstract":"\n This study proposes a linguistic classification method based on quantitative typology, which leverages a large-scale multilingual parallel corpus to obtain valid language classification result by excluding the influence of covariates such as text genre and semantic content in cross-language comparison. To achieve this, we model the type–token relationships of each Slavic parallel text and calculate the lexical diversity to approximate the morphological complexity of the language. We perform automatic clustering of languages based on these lexical diversity metrics. Our findings show that (1) the lexical diversity metrics can well reflect that the language is located somewhere on the continuum of ‘analytism-synthetism’; (2) the automatic clustering based on these metrics effectively reflects the genealogical classification of Slavic languages; and (3) the geographical distribution of lexical diversity in the region where Slavic languages are spoken shows a monotonic increasing trend from southwest to northeast, which is consistent with the pattern found by previous authors on a global scale. The methodological approach taken in this study is data-driven, with the benefit of being independent of theoretical assumptions and easy for computer processing. This approach can offer a better insight into corpus-based typology and may shed light on the understanding of language as a human-driven complex adaptive system.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqad042","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

This study proposes a linguistic classification method based on quantitative typology, which leverages a large-scale multilingual parallel corpus to obtain valid language classification result by excluding the influence of covariates such as text genre and semantic content in cross-language comparison. To achieve this, we model the type–token relationships of each Slavic parallel text and calculate the lexical diversity to approximate the morphological complexity of the language. We perform automatic clustering of languages based on these lexical diversity metrics. Our findings show that (1) the lexical diversity metrics can well reflect that the language is located somewhere on the continuum of ‘analytism-synthetism’; (2) the automatic clustering based on these metrics effectively reflects the genealogical classification of Slavic languages; and (3) the geographical distribution of lexical diversity in the region where Slavic languages are spoken shows a monotonic increasing trend from southwest to northeast, which is consistent with the pattern found by previous authors on a global scale. The methodological approach taken in this study is data-driven, with the benefit of being independent of theoretical assumptions and easy for computer processing. This approach can offer a better insight into corpus-based typology and may shed light on the understanding of language as a human-driven complex adaptive system.
从数量类型学角度看斯拉夫语言分类中的词汇多样性
本研究提出了一种基于定量类型学的语言分类方法,该方法利用大规模的多语言并行语料库,在跨语言比较中排除文本类型和语义内容等协变量的影响,获得有效的语言分类结果。为了实现这一点,我们对每个斯拉夫语平行文本的类型-标记关系进行建模,并计算词汇多样性,以近似语言的形态复杂性。我们基于这些词汇多样性度量来执行语言的自动聚类。我们的研究结果表明:(1)词汇多样性度量可以很好地反映语言位于“分析-综合”连续体上的某个位置;(2) 基于这些度量的自动聚类有效地反映了斯拉夫语言的谱系分类;(3)斯拉夫语地区词汇多样性的地理分布呈现出从西南向东北单调增加的趋势,这与前人在全球范围内发现的模式一致。本研究采用的方法是数据驱动的,其优点是独立于理论假设,易于计算机处理。这种方法可以更好地了解基于语料库的类型学,并有助于理解语言作为一个人类驱动的复杂自适应系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.80
自引率
25.00%
发文量
78
期刊介绍: DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信