{"title":"The logarithmic Zipf law in a general urn problem","authors":"Aristides V. Doumas, V. Papanicolaou","doi":"10.1051/ps/2020011","DOIUrl":null,"url":null,"abstract":"The origin of power-law behavior (also known variously as Zipf’s law) has been a topic of debate in the scientific community for more than a century. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. In a highly cited article, Mark Newman [Contemp. Phys. 46 (2005) 323–351] reviewed some of the empirical evidence for the existence of power-law forms, however underscored that even though many distributions do not follow a power law, quite often many of the quantities that scientists measure are close to a Zipf law, and hence are of importance. In this paper we engage a variant of Zipf’s law with a general urn problem. A collector wishes to collect m complete sets of N distinct coupons. The draws from the population are considered to be independent and identically distributed with replacement, and the probability that a type-j coupon is drawn is denoted by p j , j = 1, 2, …, N . Let T m (N ) the number of trials needed for this problem. We present the asymptotics for the expectation (five terms plus an error), the second rising moment (six terms plus an error), and the variance of T m (N ) (leading term) as N →∞ , when p j = a j / ∑j =2 N +1 a j , where a j = (ln j )−p , p > 0. \\begin{equation*} p_{j}=\\frac{a_{j}}{\\sum_{j=2}^{N+1} a_{j}}, \\,\\,\\,\\text{where}\\,\\,\\, a_{j}=\\left(\\ln j\\right)^{-p}, \\,\\,p>0.\\end{equation*} pj=aj ∑ j=2N+1aj,whereaj= lnj-p,p>0. Moreover, we prove that T m (N ) (appropriately normalized) converges in distribution to a Gumbel random variable. These “log-Zipf” classes of coupon probabilities are not covered by the existing literature and the present paper comes to fill this gap. In the spirit of a recent paper of ours [ESAIM: PS 20 (2016) 367–399] we enlarge the classes for which the Dixie cup problem is solved w.r.t. its moments, variance, distribution.","PeriodicalId":51249,"journal":{"name":"Esaim-Probability and Statistics","volume":"57 1","pages":"275-293"},"PeriodicalIF":0.6000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Esaim-Probability and Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1051/ps/2020011","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 2
Abstract
The origin of power-law behavior (also known variously as Zipf’s law) has been a topic of debate in the scientific community for more than a century. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. In a highly cited article, Mark Newman [Contemp. Phys. 46 (2005) 323–351] reviewed some of the empirical evidence for the existence of power-law forms, however underscored that even though many distributions do not follow a power law, quite often many of the quantities that scientists measure are close to a Zipf law, and hence are of importance. In this paper we engage a variant of Zipf’s law with a general urn problem. A collector wishes to collect m complete sets of N distinct coupons. The draws from the population are considered to be independent and identically distributed with replacement, and the probability that a type-j coupon is drawn is denoted by p j , j = 1, 2, …, N . Let T m (N ) the number of trials needed for this problem. We present the asymptotics for the expectation (five terms plus an error), the second rising moment (six terms plus an error), and the variance of T m (N ) (leading term) as N →∞ , when p j = a j / ∑j =2 N +1 a j , where a j = (ln j )−p , p > 0. \begin{equation*} p_{j}=\frac{a_{j}}{\sum_{j=2}^{N+1} a_{j}}, \,\,\,\text{where}\,\,\, a_{j}=\left(\ln j\right)^{-p}, \,\,p>0.\end{equation*} pj=aj ∑ j=2N+1aj,whereaj= lnj-p,p>0. Moreover, we prove that T m (N ) (appropriately normalized) converges in distribution to a Gumbel random variable. These “log-Zipf” classes of coupon probabilities are not covered by the existing literature and the present paper comes to fill this gap. In the spirit of a recent paper of ours [ESAIM: PS 20 (2016) 367–399] we enlarge the classes for which the Dixie cup problem is solved w.r.t. its moments, variance, distribution.
期刊介绍:
The journal publishes original research and survey papers in the area of Probability and Statistics. It covers theoretical and practical aspects, in any field of these domains.
Of particular interest are methodological developments with application in other scientific areas, for example Biology and Genetics, Information Theory, Finance, Bioinformatics, Random structures and Random graphs, Econometrics, Physics.
Long papers are very welcome.
Indeed, we intend to develop the journal in the direction of applications and to open it to various fields where random mathematical modelling is important. In particular we will call (survey) papers in these areas, in order to make the random community aware of important problems of both theoretical and practical interest. We all know that many recent fascinating developments in Probability and Statistics are coming from "the outside" and we think that ESAIM: P&S should be a good entry point for such exchanges. Of course this does not mean that the journal will be only devoted to practical aspects.