The logarithmic Zipf law in a general urn problem

IF 0.6 4区数学 Q4 STATISTICS & PROBABILITY

Esaim-Probability and Statistics Pub Date : 2020-01-01 DOI:10.1051/ps/2020011

Aristides V. Doumas, V. Papanicolaou

{"title":"The logarithmic Zipf law in a general urn problem","authors":"Aristides V. Doumas, V. Papanicolaou","doi":"10.1051/ps/2020011","DOIUrl":null,"url":null,"abstract":"The origin of power-law behavior (also known variously as Zipf’s law) has been a topic of debate in the scientific community for more than a century. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. In a highly cited article, Mark Newman [Contemp. Phys. 46 (2005) 323–351] reviewed some of the empirical evidence for the existence of power-law forms, however underscored that even though many distributions do not follow a power law, quite often many of the quantities that scientists measure are close to a Zipf law, and hence are of importance. In this paper we engage a variant of Zipf’s law with a general urn problem. A collector wishes to collect m complete sets of N distinct coupons. The draws from the population are considered to be independent and identically distributed with replacement, and the probability that a type-j coupon is drawn is denoted by p j , j = 1, 2, …, N . Let T m (N ) the number of trials needed for this problem. We present the asymptotics for the expectation (five terms plus an error), the second rising moment (six terms plus an error), and the variance of T m (N ) (leading term) as N →∞ , when p j = a j / ∑j =2 N +1 a j , where a j = (ln j )−p , p > 0. \\begin{equation*} p_{j}=\\frac{a_{j}}{\\sum_{j=2}^{N+1} a_{j}}, \\,\\,\\,\\text{where}\\,\\,\\, a_{j}=\\left(\\ln j\\right)^{-p}, \\,\\,p>0.\\end{equation*} pj=aj ∑ j=2N+1aj,whereaj= lnj-p,p>0. Moreover, we prove that T m (N ) (appropriately normalized) converges in distribution to a Gumbel random variable. These “log-Zipf” classes of coupon probabilities are not covered by the existing literature and the present paper comes to fill this gap. In the spirit of a recent paper of ours [ESAIM: PS 20 (2016) 367–399] we enlarge the classes for which the Dixie cup problem is solved w.r.t. its moments, variance, distribution.","PeriodicalId":51249,"journal":{"name":"Esaim-Probability and Statistics","volume":"57 1","pages":"275-293"},"PeriodicalIF":0.6000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Esaim-Probability and Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1051/ps/2020011","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 2

Abstract

The origin of power-law behavior (also known variously as Zipf’s law) has been a topic of debate in the scientific community for more than a century. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. In a highly cited article, Mark Newman [Contemp. Phys. 46 (2005) 323–351] reviewed some of the empirical evidence for the existence of power-law forms, however underscored that even though many distributions do not follow a power law, quite often many of the quantities that scientists measure are close to a Zipf law, and hence are of importance. In this paper we engage a variant of Zipf’s law with a general urn problem. A collector wishes to collect m complete sets of N distinct coupons. The draws from the population are considered to be independent and identically distributed with replacement, and the probability that a type-j coupon is drawn is denoted by p j , j = 1, 2, …, N . Let T m (N ) the number of trials needed for this problem. We present the asymptotics for the expectation (five terms plus an error), the second rising moment (six terms plus an error), and the variance of T m (N ) (leading term) as N →∞ , when p j = a j / ∑j =2 N +1 a j , where a j = (ln j )−p , p > 0. \begin{equation*} p_{j}=\frac{a_{j}}{\sum_{j=2}^{N+1} a_{j}}, \,\,\,\text{where}\,\,\, a_{j}=\left(\ln j\right)^{-p}, \,\,p>0.\end{equation*} pj=aj ∑ j=2N+1aj,whereaj= lnj-p,p>0. Moreover, we prove that T m (N ) (appropriately normalized) converges in distribution to a Gumbel random variable. These “log-Zipf” classes of coupon probabilities are not covered by the existing literature and the present paper comes to fill this gap. In the spirit of a recent paper of ours [ESAIM: PS 20 (2016) 367–399] we enlarge the classes for which the Dixie cup problem is solved w.r.t. its moments, variance, distribution.

查看原文本刊更多论文

一般瓮问题的对数齐夫律

幂律行为(也称为齐夫定律)的起源在科学界已经争论了一个多世纪。幂律广泛出现在物理学、生物学、地球和行星科学、经济学和金融学、计算机科学、人口学和社会科学中。在一篇被大量引用的文章中，马克·纽曼[当代][物理学46(2005)323-351]回顾了幂律形式存在的一些经验证据，但强调了即使许多分布不遵循幂律，科学家测量的许多量通常接近齐夫定律，因此很重要。本文将齐夫定律的一个变体与一般瓮问题联系起来。一个收藏家希望收集m套完整的N种不同的优惠券。认为从总体中抽取的券是独立的，具有替换的同分布，抽取到j型券的概率记为p j, j = 1,2，…，N。设T m (N)为这个问题所需的试验次数。当p j = a j /∑j = 2n + 1a j，其中a j = (ln j)−p, p > 0时，我们给出了期望(五项加一个误差)、第二次上升矩(六项加一个误差)和T m (N)(首项)方差为N→∞的渐近性。\begin{equation*} p_{j}=\frac{a_{j}}{\sum_{j=2}^{N+1} a_{j}}, \,\,\,\text{where}\,\,\, a_{j}=\left(\ln j\right)^{-p}, \,\,p>0.\end{equation*} pj=aj∑j=2N+1aj，其中aj= lnj-p,p>0。此外，我们证明了T m (N)(适当归一化)在分布上收敛于一个Gumbel随机变量。这些息票概率的“log-Zipf”类未被现有文献所涵盖，本文填补了这一空白。本着我们最近的一篇论文[ESAIM: PS 20(2016) 367-399]的精神，我们扩大了迪克西杯问题解决的类别，包括它的矩、方差和分布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Esaim-Probability and Statistics STATISTICS & PROBABILITY-

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： The journal publishes original research and survey papers in the area of Probability and Statistics. It covers theoretical and practical aspects, in any field of these domains. Of particular interest are methodological developments with application in other scientific areas, for example Biology and Genetics, Information Theory, Finance, Bioinformatics, Random structures and Random graphs, Econometrics, Physics. Long papers are very welcome. Indeed, we intend to develop the journal in the direction of applications and to open it to various fields where random mathematical modelling is important. In particular we will call (survey) papers in these areas, in order to make the random community aware of important problems of both theoretical and practical interest. We all know that many recent fascinating developments in Probability and Statistics are coming from "the outside" and we think that ESAIM: P&S should be a good entry point for such exchanges. Of course this does not mean that the journal will be only devoted to practical aspects.