Codes for unordered sets of words

2011 IEEE International Symposium on Information Theory Proceedings Pub Date : 2011-10-03 DOI:10.1109/ISIT.2011.6033752

Y. Reznik

引用次数: 5

Abstract

We study the problem of coding of unordered sets of words, appearing in language processing, retrieval, machine learning, computer vision, and other fields. We review known results about this problem, and offer a code construction technique suitable for solving it. We show that in a memoryless model the expected length of our codes approaches Ht − log m! + O(m) where m is the number of words in the set, t is the combined length of all words, and H is the entropy of the source. We also offer design of a universal code for sets of words and perform its redundancy analysis.

查看原文本刊更多论文

无序词集的代码

我们研究无序词集的编码问题，出现在语言处理、检索、机器学习、计算机视觉和其他领域。我们回顾了关于这个问题的已知结果，并提供了一种适合解决这个问题的代码构建技术。我们证明，在无记忆模型中，我们的代码的期望长度接近Ht - log m!+ O(m)其中m为集合中的单词数，t为所有单词的组合长度，H为源的熵。我们还提供了一个通用代码的设计词集和执行其冗余分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Symposium on Information Theory Proceedings

自引率

0.00%

发文量