{"title":"Why Do Parameter Values in the Zipf-Mandelbrot Distribution Sometimes Explode?","authors":"Ján Mačutek","doi":"10.1080/09296174.2021.1887613","DOIUrl":null,"url":null,"abstract":"ABSTRACT The Zipf-Mandelbrot distribution serves as a mathematical model for ranked frequencies in many areas of scientific research, including linguistics. Many linguistic units, like e.g., words or word n-grams, follow this distribution. However, in some cases, such as for graphemes in linguistics or species abundance and diversity data in biology, the parameters of the Zipf-Mandelbrot distribution are virtually uninterpretable, as their values strongly depend on the precision of numerical methods used to estimate them (values from several tens to several hundreds are not uncommon). It is shown in the paper that these values can be explained by the convergence to the geometric distribution, which forces both parameters of the Zipf-Mandelbrot distribution to increase to infinity while their ratio converges to a constant. Some examples which illustrate this limit behaviour are presented.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"413 - 424"},"PeriodicalIF":0.7000,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2021.1887613","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quantitative Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/09296174.2021.1887613","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
ABSTRACT The Zipf-Mandelbrot distribution serves as a mathematical model for ranked frequencies in many areas of scientific research, including linguistics. Many linguistic units, like e.g., words or word n-grams, follow this distribution. However, in some cases, such as for graphemes in linguistics or species abundance and diversity data in biology, the parameters of the Zipf-Mandelbrot distribution are virtually uninterpretable, as their values strongly depend on the precision of numerical methods used to estimate them (values from several tens to several hundreds are not uncommon). It is shown in the paper that these values can be explained by the convergence to the geometric distribution, which forces both parameters of the Zipf-Mandelbrot distribution to increase to infinity while their ratio converges to a constant. Some examples which illustrate this limit behaviour are presented.
期刊介绍:
The Journal of Quantitative Linguistics is an international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics, and psychology.