Unifying Models for Word Length Distributions Based on Types and Tokens

IF 0.7 2区 文学 0 LANGUAGE & LINGUISTICS
Peter Zörnig, T. Berg
{"title":"Unifying Models for Word Length Distributions Based on Types and Tokens","authors":"Peter Zörnig, T. Berg","doi":"10.1080/09296174.2023.2202061","DOIUrl":null,"url":null,"abstract":"ABSTRACT Word length studies have been one of the central issues in Quantitative Linguistics for a long time. Most models were constructed for very specific purposes, i.e. the individual models apply only to a specific language, only to token counts or only to type counts. The present paper takes up the challenge of developing unifying models which account for both type and token frequencies of a moderately large sample of languages (eight Indo-European and two non-Indo-European languages). We introduce three models which can be well fitted to all our data: the exponentiated Hyper-Poisson distribution, the generalized gamma and the Sichel distribution. We also discuss the possibility of interpreting the model parameters linguistically.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"167 - 182"},"PeriodicalIF":0.7000,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quantitative Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/09296174.2023.2202061","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1

Abstract

ABSTRACT Word length studies have been one of the central issues in Quantitative Linguistics for a long time. Most models were constructed for very specific purposes, i.e. the individual models apply only to a specific language, only to token counts or only to type counts. The present paper takes up the challenge of developing unifying models which account for both type and token frequencies of a moderately large sample of languages (eight Indo-European and two non-Indo-European languages). We introduce three models which can be well fitted to all our data: the exponentiated Hyper-Poisson distribution, the generalized gamma and the Sichel distribution. We also discuss the possibility of interpreting the model parameters linguistically.
基于类型和标记的字长分布统一模型
长期以来,字长研究一直是数量语言学的核心问题之一。大多数模型都是为非常特定的目的构建的,即单个模型仅适用于特定的语言,仅适用于令牌计数或仅适用于类型计数。本文提出了开发统一模型的挑战,该模型考虑了中等规模的语言样本(八种印欧语言和两种非印欧语言)的类型和表征频率。我们介绍了三个可以很好地拟合我们所有数据的模型:指数超泊松分布、广义伽玛和Sichel分布。我们还讨论了用语言解释模型参数的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.90
自引率
7.10%
发文量
7
期刊介绍: The Journal of Quantitative Linguistics is an international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics, and psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信