{"title":"上下文模型的贝叶斯状态组合","authors":"S. Bunton","doi":"10.1109/DCC.1998.672161","DOIUrl":null,"url":null,"abstract":"The best-performing on-line methods for estimating probabilities of symbols in a sequence (required for computing minimal codes) use context trees with either information-theoretic state selection or context-tree weighting. This paper derives de novo from Bayes' theorem, a novel technique for modeling sequences on-line with context trees, which we call \"Bayesian state combining\" or BSC. BSC is comparable in function to both information-theoretic state selection and context-tree weighting. However, it is a truly distinct alternative to either of these techniques, which like BSC, can be viewed as \"dispatchers\" of probability estimates from the set of competing, memoryless models represented by the context tree. The resulting technique handles sequences over m-ary input alphabets for arbitrary m and may employ any probability estimator applicable to context models (e.g., Laplace, Krichevsky-Trofimov, blending, and more generally, mixtures). In experiments that control other (256-ary) context-tree model features such as Markov order and probability estimators, we compare the performance of BSC and information-theoretic state selection. The background notation and concepts are reviewed, as required to understand the modeling problem and application of our result. The leading notion of the paper is derived, which dynamically maps certain states in context-models to a set of mutually exclusive hypotheses and their prior and posterior probabilities. The efficient sequential computation of the posterior probabilities of the hypotheses, which was made possible via a non-obvious application of the percolating description-length update mechanism introduced by Bunton (see Proceedings Data Compression Conference, IEEE Computer Society Press, 1997) is described. The preliminary empirical performance of the technique on the Calgary Corpus is presented, the relationship of BSC to information-theoretic state selection and context-tree weighting is discussed.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Bayesian state combining for context models\",\"authors\":\"S. Bunton\",\"doi\":\"10.1109/DCC.1998.672161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The best-performing on-line methods for estimating probabilities of symbols in a sequence (required for computing minimal codes) use context trees with either information-theoretic state selection or context-tree weighting. This paper derives de novo from Bayes' theorem, a novel technique for modeling sequences on-line with context trees, which we call \\\"Bayesian state combining\\\" or BSC. BSC is comparable in function to both information-theoretic state selection and context-tree weighting. However, it is a truly distinct alternative to either of these techniques, which like BSC, can be viewed as \\\"dispatchers\\\" of probability estimates from the set of competing, memoryless models represented by the context tree. The resulting technique handles sequences over m-ary input alphabets for arbitrary m and may employ any probability estimator applicable to context models (e.g., Laplace, Krichevsky-Trofimov, blending, and more generally, mixtures). In experiments that control other (256-ary) context-tree model features such as Markov order and probability estimators, we compare the performance of BSC and information-theoretic state selection. The background notation and concepts are reviewed, as required to understand the modeling problem and application of our result. The leading notion of the paper is derived, which dynamically maps certain states in context-models to a set of mutually exclusive hypotheses and their prior and posterior probabilities. The efficient sequential computation of the posterior probabilities of the hypotheses, which was made possible via a non-obvious application of the percolating description-length update mechanism introduced by Bunton (see Proceedings Data Compression Conference, IEEE Computer Society Press, 1997) is described. The preliminary empirical performance of the technique on the Calgary Corpus is presented, the relationship of BSC to information-theoretic state selection and context-tree weighting is discussed.\",\"PeriodicalId\":191890,\"journal\":{\"name\":\"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1998.672161\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
估计序列中符号概率(计算最小码所需)的最佳在线方法使用具有信息论状态选择或上下文树加权的上下文树。本文从贝叶斯定理出发,这是一种利用上下文树在线建模序列的新技术,我们称之为“贝叶斯状态组合”或BSC。平衡计分卡在功能上可与信息论状态选择和上下文树加权相比较。然而,它是这两种技术的真正独特的替代方案,这两种技术与BSC一样,可以被视为概率估计的“调度器”,这些概率估计来自由上下文树表示的一组相互竞争的无内存模型。由此产生的技术可以处理任意m个输入字母上的序列,并且可以使用适用于上下文模型的任何概率估计器(例如,拉普拉斯,Krichevsky-Trofimov,混合,以及更一般的混合物)。在控制其他(256-ary)上下文树模型特征(如马尔可夫阶和概率估计器)的实验中,我们比较了BSC和信息论状态选择的性能。回顾了背景符号和概念,以便理解建模问题和我们的结果的应用。本文的主要概念是将上下文模型中的某些状态动态映射到一组相互排斥的假设及其先验和后验概率。本文描述了假设后验概率的有效顺序计算,这是通过Bunton引入的渗透描述长度更新机制的非明显应用而实现的(参见Proceedings Data Compression Conference, IEEE Computer Society Press, 1997)。给出了该技术在卡尔加里语料库上的初步经验性能,讨论了平衡计分卡与信息论状态选择和上下文树加权的关系。
The best-performing on-line methods for estimating probabilities of symbols in a sequence (required for computing minimal codes) use context trees with either information-theoretic state selection or context-tree weighting. This paper derives de novo from Bayes' theorem, a novel technique for modeling sequences on-line with context trees, which we call "Bayesian state combining" or BSC. BSC is comparable in function to both information-theoretic state selection and context-tree weighting. However, it is a truly distinct alternative to either of these techniques, which like BSC, can be viewed as "dispatchers" of probability estimates from the set of competing, memoryless models represented by the context tree. The resulting technique handles sequences over m-ary input alphabets for arbitrary m and may employ any probability estimator applicable to context models (e.g., Laplace, Krichevsky-Trofimov, blending, and more generally, mixtures). In experiments that control other (256-ary) context-tree model features such as Markov order and probability estimators, we compare the performance of BSC and information-theoretic state selection. The background notation and concepts are reviewed, as required to understand the modeling problem and application of our result. The leading notion of the paper is derived, which dynamically maps certain states in context-models to a set of mutually exclusive hypotheses and their prior and posterior probabilities. The efficient sequential computation of the posterior probabilities of the hypotheses, which was made possible via a non-obvious application of the percolating description-length update mechanism introduced by Bunton (see Proceedings Data Compression Conference, IEEE Computer Society Press, 1997) is described. The preliminary empirical performance of the technique on the Calgary Corpus is presented, the relationship of BSC to information-theoretic state selection and context-tree weighting is discussed.