A. Orlitsky, Sajama, N. Santhanam, K. Viswanathan, Junan Zhang
{"title":"大型字母表分布建模算法","authors":"A. Orlitsky, Sajama, N. Santhanam, K. Viswanathan, Junan Zhang","doi":"10.1109/ISIT.2004.1365341","DOIUrl":null,"url":null,"abstract":"We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks","PeriodicalId":269907,"journal":{"name":"International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2004-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Algorithms for modeling distributions over large alphabets\",\"authors\":\"A. Orlitsky, Sajama, N. Santhanam, K. Viswanathan, Junan Zhang\",\"doi\":\"10.1109/ISIT.2004.1365341\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks\",\"PeriodicalId\":269907,\"journal\":{\"name\":\"International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT.2004.1365341\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2004.1365341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Algorithms for modeling distributions over large alphabets
We consider the problem of modeling a distribution whose alphabet size is large relative to the amount of observed data. It is well known that conventional maximum-likelihood estimates do not perform well in that regime. Instead, we find the distribution maximizing the probability of the data's pattern. We derive an efficient algorithm for approximating this distribution. Simulations show that the computed distribution models the data well and yields general estimators that evaluate various data attributes as well as specific estimators designed especially for these tasks