Reversing the error-correction scheme for a fault-tolerant indexing

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI:10.1109/DCC.1998.672237

S. Berkovich, E. El-Qawasmeh

{"title":"Reversing the error-correction scheme for a fault-tolerant indexing","authors":"S. Berkovich, E. El-Qawasmeh","doi":"10.1109/DCC.1998.672237","DOIUrl":null,"url":null,"abstract":"Summary form only given. The article presents an innovative approach to approximate matching of multi-attribute objects based on reversing the conventional scheme of error-correction coding. The approximate matching problem primarily arises in information retrieval systems, which can store fuzzily described items and operate with nebulous searching criteria. To establish an approximate equivalence relation on a set of multi-attribute objects it has been suggested to apply a decoding procedure to binary vectors corresponding to these objects and to use the obtained message words as hash codes. With this hashing technique it is possible to construct \"fault-tolerant\" indices allowing certain mismatches of binary vectors in terms of Hamming metrics. The simplest practical realization of this technique is based on the so-called perfect Golay code which maps 23-bit vectors into 12-bit message words. In this case, two different 23-bit vectors at a Hamming distance of 2 would have some common 12-bit indices. This provides an organization of a direct retrieval of a neighborhood of 23 bit-vectors with up to two mismatches from a given key. The proposed technique employs a reasonable redundancy and can trade utilization of extra memory for the speed and range of searching. Besides a direct application to information retrieval, the developed technique is also beneficial for complex computational procedures incorporating near-matching operations. A typical procedure of this kind is recovering of closed matches from vector-quantization tables.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Summary form only given. The article presents an innovative approach to approximate matching of multi-attribute objects based on reversing the conventional scheme of error-correction coding. The approximate matching problem primarily arises in information retrieval systems, which can store fuzzily described items and operate with nebulous searching criteria. To establish an approximate equivalence relation on a set of multi-attribute objects it has been suggested to apply a decoding procedure to binary vectors corresponding to these objects and to use the obtained message words as hash codes. With this hashing technique it is possible to construct "fault-tolerant" indices allowing certain mismatches of binary vectors in terms of Hamming metrics. The simplest practical realization of this technique is based on the so-called perfect Golay code which maps 23-bit vectors into 12-bit message words. In this case, two different 23-bit vectors at a Hamming distance of 2 would have some common 12-bit indices. This provides an organization of a direct retrieval of a neighborhood of 23 bit-vectors with up to two mismatches from a given key. The proposed technique employs a reasonable redundancy and can trade utilization of extra memory for the speed and range of searching. Besides a direct application to information retrieval, the developed technique is also beneficial for complex computational procedures incorporating near-matching operations. A typical procedure of this kind is recovering of closed matches from vector-quantization tables.

查看原文本刊更多论文

为容错索引反转纠错方案

只提供摘要形式。本文在颠覆传统的纠错编码方案的基础上，提出了一种多属性目标近似匹配的创新方法。近似匹配问题主要出现在信息检索系统中，因为信息检索系统存储的信息描述模糊，检索标准模糊。为了在一组多属性对象上建立近似等价关系，建议对这些对象对应的二进制向量应用解码过程，并使用获得的消息词作为哈希码。使用这种散列技术，可以构建“容错”索引，允许根据汉明度量对二进制向量进行某些不匹配。这种技术最简单的实际实现是基于所谓的完美的Golay代码，它将23位向量映射到12位消息字。在这种情况下，汉明距离为2的两个不同的23位向量将具有一些共同的12位索引。这提供了直接检索23位矢量的邻域的组织，从给定的键中最多有两个不匹配。所提出的技术采用了合理的冗余，并且可以以额外内存的利用来换取搜索的速度和范围。除了直接应用于信息检索之外，该技术还可用于包含近匹配运算的复杂计算过程。这种方法的一个典型步骤是从矢量量化表中恢复闭合匹配。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)

自引率

0.00%

发文量