{"title":"大字母id和马尔可夫模型的压缩和预测分布","authors":"Xiao Yang, A. Barron","doi":"10.1109/ISIT.2014.6875285","DOIUrl":null,"url":null,"abstract":"This paper considers coding and predicting sequences of random variables generated from a large alphabet. We start from the i.i.d model and propose a simple coding distribution formulated by a product of tilted Poisson distributions which achieves close to optimal performance. Then we extend to Markov models, and in particular, tree sources. A context tree based algorithm is designed according to the frequency of various contexts in the data. It is a greedy algorithm which seeks for the greatest savings in codelength when constructing the tree. Compression and prediction of individual counts associated with the contexts again uses a product of tilted Poisson distributions. Implementing this method on a Chinese novel, about 20.56% savings in codelength is achieved compared to the i.i.d model.","PeriodicalId":127191,"journal":{"name":"2014 IEEE International Symposium on Information Theory","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Compression and predictive distributions for large alphabet i.i.d and Markov models\",\"authors\":\"Xiao Yang, A. Barron\",\"doi\":\"10.1109/ISIT.2014.6875285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper considers coding and predicting sequences of random variables generated from a large alphabet. We start from the i.i.d model and propose a simple coding distribution formulated by a product of tilted Poisson distributions which achieves close to optimal performance. Then we extend to Markov models, and in particular, tree sources. A context tree based algorithm is designed according to the frequency of various contexts in the data. It is a greedy algorithm which seeks for the greatest savings in codelength when constructing the tree. Compression and prediction of individual counts associated with the contexts again uses a product of tilted Poisson distributions. Implementing this method on a Chinese novel, about 20.56% savings in codelength is achieved compared to the i.i.d model.\",\"PeriodicalId\":127191,\"journal\":{\"name\":\"2014 IEEE International Symposium on Information Theory\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Symposium on Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT.2014.6875285\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Symposium on Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2014.6875285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Compression and predictive distributions for large alphabet i.i.d and Markov models
This paper considers coding and predicting sequences of random variables generated from a large alphabet. We start from the i.i.d model and propose a simple coding distribution formulated by a product of tilted Poisson distributions which achieves close to optimal performance. Then we extend to Markov models, and in particular, tree sources. A context tree based algorithm is designed according to the frequency of various contexts in the data. It is a greedy algorithm which seeks for the greatest savings in codelength when constructing the tree. Compression and prediction of individual counts associated with the contexts again uses a product of tilted Poisson distributions. Implementing this method on a Chinese novel, about 20.56% savings in codelength is achieved compared to the i.i.d model.