Fadhel Ayed, M. Battiston, F. Camerlenghi, S. Favaro
{"title":"缺失质量的一致性和速率最优估计","authors":"Fadhel Ayed, M. Battiston, F. Camerlenghi, S. Favaro","doi":"10.1214/20-AIHP1126","DOIUrl":null,"url":null,"abstract":". Given n samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the ( n + 1)-th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results have shown: i) the impossibility of estimating the missing mass without imposing further assumptions on type’s proportions; ii) the consistency of the Good-Turing estimator of the missing mass under the assumption that the tail of type’s proportions decays to zero as a regularly varying function with parameter α ∈ (0 , 1); ii) the rate of convergence n − α/ 2 for the Good-Turing estimator under the class of α ∈ (0 , 1) regularly varying P . In this paper we introduce an alternative, and remarkably shorter, proof of the impossibility of a distribution-free estimation of the missing mass. Beside being of independent interest, our alternative proof suggests a natural approach to strengthen, and expand, the recent results on the rate of convergence of the Good-Turing estimator under α ∈ (0 , 1) regularly varying type’s proportions. In particular, we show that the convergence rate n − α/ 2 is the best rate that any estimator can achieve, up to a slowly varying function. Furthermore, we prove that a lower bound to the minimax estimation risk must scale at least as n − α/ 2 , which leads to conjecture that the Good-Turing estimator is a rate optimal minimax estimator under regularly varying type proportions.","PeriodicalId":42884,"journal":{"name":"Annales de l Institut Henri Poincare D","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2021-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"On consistent and rate optimal estimation of the missing mass\",\"authors\":\"Fadhel Ayed, M. Battiston, F. Camerlenghi, S. Favaro\",\"doi\":\"10.1214/20-AIHP1126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". Given n samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the ( n + 1)-th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results have shown: i) the impossibility of estimating the missing mass without imposing further assumptions on type’s proportions; ii) the consistency of the Good-Turing estimator of the missing mass under the assumption that the tail of type’s proportions decays to zero as a regularly varying function with parameter α ∈ (0 , 1); ii) the rate of convergence n − α/ 2 for the Good-Turing estimator under the class of α ∈ (0 , 1) regularly varying P . In this paper we introduce an alternative, and remarkably shorter, proof of the impossibility of a distribution-free estimation of the missing mass. Beside being of independent interest, our alternative proof suggests a natural approach to strengthen, and expand, the recent results on the rate of convergence of the Good-Turing estimator under α ∈ (0 , 1) regularly varying type’s proportions. In particular, we show that the convergence rate n − α/ 2 is the best rate that any estimator can achieve, up to a slowly varying function. Furthermore, we prove that a lower bound to the minimax estimation risk must scale at least as n − α/ 2 , which leads to conjecture that the Good-Turing estimator is a rate optimal minimax estimator under regularly varying type proportions.\",\"PeriodicalId\":42884,\"journal\":{\"name\":\"Annales de l Institut Henri Poincare D\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2021-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annales de l Institut Henri Poincare D\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/20-AIHP1126\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MATHEMATICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annales de l Institut Henri Poincare D","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/20-AIHP1126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MATHEMATICAL","Score":null,"Total":0}
On consistent and rate optimal estimation of the missing mass
. Given n samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the ( n + 1)-th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results have shown: i) the impossibility of estimating the missing mass without imposing further assumptions on type’s proportions; ii) the consistency of the Good-Turing estimator of the missing mass under the assumption that the tail of type’s proportions decays to zero as a regularly varying function with parameter α ∈ (0 , 1); ii) the rate of convergence n − α/ 2 for the Good-Turing estimator under the class of α ∈ (0 , 1) regularly varying P . In this paper we introduce an alternative, and remarkably shorter, proof of the impossibility of a distribution-free estimation of the missing mass. Beside being of independent interest, our alternative proof suggests a natural approach to strengthen, and expand, the recent results on the rate of convergence of the Good-Turing estimator under α ∈ (0 , 1) regularly varying type’s proportions. In particular, we show that the convergence rate n − α/ 2 is the best rate that any estimator can achieve, up to a slowly varying function. Furthermore, we prove that a lower bound to the minimax estimation risk must scale at least as n − α/ 2 , which leads to conjecture that the Good-Turing estimator is a rate optimal minimax estimator under regularly varying type proportions.