{"title":"Markov random fields on graphs for natural languages","authors":"J. O’Sullivan, K. Mark, M. Miller","doi":"10.1109/WITS.1994.513880","DOIUrl":null,"url":null,"abstract":"The use of model-based methods for data compression for English dates back at least to Shannon's Markov chain (n-gram) models, where the probability of the next word given all previous words equals the probability of the next word given the previous n-1 words. A second approach seeks to model the hierarchical nature of language via tree graph structures arising from a context-free language (CFL). Neither the n-gram nor the CFL models approach the data compression predicted by the entropy of English as estimated by Shannon and Cover and King. This paper presents two models that incorporate the benefits of both the n-gram model and the tree-based models. In either case the neighborhood structure on the syntactic variables is determined by the tree while the neighborhood structure of the words is determined by the n-gram and the parent syntactic variable (preterminal) in the tree, Having both types of neighbors for the words should yield decreased entropy of the model and hence fewer bits per word in data compression. To motivate estimation of model parameters, some results in estimating parameters for random branching processes is reviewed.","PeriodicalId":423518,"journal":{"name":"Proceedings of 1994 Workshop on Information Theory and Statistics","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 Workshop on Information Theory and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WITS.1994.513880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The use of model-based methods for data compression for English dates back at least to Shannon's Markov chain (n-gram) models, where the probability of the next word given all previous words equals the probability of the next word given the previous n-1 words. A second approach seeks to model the hierarchical nature of language via tree graph structures arising from a context-free language (CFL). Neither the n-gram nor the CFL models approach the data compression predicted by the entropy of English as estimated by Shannon and Cover and King. This paper presents two models that incorporate the benefits of both the n-gram model and the tree-based models. In either case the neighborhood structure on the syntactic variables is determined by the tree while the neighborhood structure of the words is determined by the n-gram and the parent syntactic variable (preterminal) in the tree, Having both types of neighbors for the words should yield decreased entropy of the model and hence fewer bits per word in data compression. To motivate estimation of model parameters, some results in estimating parameters for random branching processes is reviewed.