{"title":"Good-Turing估计器的改进收敛速率","authors":"Amichai Painsky","doi":"10.1109/ITW48936.2021.9611389","DOIUrl":null,"url":null,"abstract":"The Good-Turing (GT) estimator is perhaps the most popular framework for modelling large alphabet distributions. Classical results show that the GT estimator convergences to the occupancy probability, formally defined as the total probability of words that appear exactly k times in the sample. In this work we introduce new convergence guarantees for the GT estimator, based on worst-case MSE analysis. Our results refine and improve upon currently known bounds. Importantly, we introduce a simultaneous convergence rate to the entire collection of occupancy probabilities.","PeriodicalId":325229,"journal":{"name":"2021 IEEE Information Theory Workshop (ITW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Refined Convergence Rates of the Good-Turing Estimator\",\"authors\":\"Amichai Painsky\",\"doi\":\"10.1109/ITW48936.2021.9611389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Good-Turing (GT) estimator is perhaps the most popular framework for modelling large alphabet distributions. Classical results show that the GT estimator convergences to the occupancy probability, formally defined as the total probability of words that appear exactly k times in the sample. In this work we introduce new convergence guarantees for the GT estimator, based on worst-case MSE analysis. Our results refine and improve upon currently known bounds. Importantly, we introduce a simultaneous convergence rate to the entire collection of occupancy probabilities.\",\"PeriodicalId\":325229,\"journal\":{\"name\":\"2021 IEEE Information Theory Workshop (ITW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Information Theory Workshop (ITW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITW48936.2021.9611389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Information Theory Workshop (ITW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITW48936.2021.9611389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Refined Convergence Rates of the Good-Turing Estimator
The Good-Turing (GT) estimator is perhaps the most popular framework for modelling large alphabet distributions. Classical results show that the GT estimator convergences to the occupancy probability, formally defined as the total probability of words that appear exactly k times in the sample. In this work we introduce new convergence guarantees for the GT estimator, based on worst-case MSE analysis. Our results refine and improve upon currently known bounds. Importantly, we introduce a simultaneous convergence rate to the entire collection of occupancy probabilities.