{"title":"Codes for unordered sets of words","authors":"Y. Reznik","doi":"10.1109/ISIT.2011.6033752","DOIUrl":null,"url":null,"abstract":"We study the problem of coding of unordered sets of words, appearing in language processing, retrieval, machine learning, computer vision, and other fields. We review known results about this problem, and offer a code construction technique suitable for solving it. We show that in a memoryless model the expected length of our codes approaches Ht − log m! + O(m) where m is the number of words in the set, t is the combined length of all words, and H is the entropy of the source. We also offer design of a universal code for sets of words and perform its redundancy analysis.","PeriodicalId":208375,"journal":{"name":"2011 IEEE International Symposium on Information Theory Proceedings","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Symposium on Information Theory Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2011.6033752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
We study the problem of coding of unordered sets of words, appearing in language processing, retrieval, machine learning, computer vision, and other fields. We review known results about this problem, and offer a code construction technique suitable for solving it. We show that in a memoryless model the expected length of our codes approaches Ht − log m! + O(m) where m is the number of words in the set, t is the combined length of all words, and H is the entropy of the source. We also offer design of a universal code for sets of words and perform its redundancy analysis.