Good-Turing估计器的改进收敛速率

2021 IEEE Information Theory Workshop (ITW) Pub Date : 2021-10-17 DOI:10.1109/ITW48936.2021.9611389

Amichai Painsky

{"title":"Good-Turing估计器的改进收敛速率","authors":"Amichai Painsky","doi":"10.1109/ITW48936.2021.9611389","DOIUrl":null,"url":null,"abstract":"The Good-Turing (GT) estimator is perhaps the most popular framework for modelling large alphabet distributions. Classical results show that the GT estimator convergences to the occupancy probability, formally defined as the total probability of words that appear exactly k times in the sample. In this work we introduce new convergence guarantees for the GT estimator, based on worst-case MSE analysis. Our results refine and improve upon currently known bounds. Importantly, we introduce a simultaneous convergence rate to the entire collection of occupancy probabilities.","PeriodicalId":325229,"journal":{"name":"2021 IEEE Information Theory Workshop (ITW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Refined Convergence Rates of the Good-Turing Estimator\",\"authors\":\"Amichai Painsky\",\"doi\":\"10.1109/ITW48936.2021.9611389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Good-Turing (GT) estimator is perhaps the most popular framework for modelling large alphabet distributions. Classical results show that the GT estimator convergences to the occupancy probability, formally defined as the total probability of words that appear exactly k times in the sample. In this work we introduce new convergence guarantees for the GT estimator, based on worst-case MSE analysis. Our results refine and improve upon currently known bounds. Importantly, we introduce a simultaneous convergence rate to the entire collection of occupancy probabilities.\",\"PeriodicalId\":325229,\"journal\":{\"name\":\"2021 IEEE Information Theory Workshop (ITW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Information Theory Workshop (ITW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITW48936.2021.9611389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Information Theory Workshop (ITW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITW48936.2021.9611389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

Good-Turing (GT)估计器可能是建模大型字母分布最流行的框架。经典结果表明，GT估计器收敛于占用概率，正式定义为单词在样本中恰好出现k次的总概率。在这项工作中，我们引入了基于最坏情况MSE分析的GT估计器的新的收敛保证。我们的结果细化和改进了目前已知的边界。重要的是，我们将同时收敛率引入到占用概率的整个集合中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Refined Convergence Rates of the Good-Turing Estimator

The Good-Turing (GT) estimator is perhaps the most popular framework for modelling large alphabet distributions. Classical results show that the GT estimator convergences to the occupancy probability, formally defined as the total probability of words that appear exactly k times in the sample. In this work we introduce new convergence guarantees for the GT estimator, based on worst-case MSE analysis. Our results refine and improve upon currently known bounds. Importantly, we introduce a simultaneous convergence rate to the entire collection of occupancy probabilities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Information Theory Workshop (ITW)

自引率

0.00%

发文量