{"title":"Reversing the error-correction scheme for a fault-tolerant indexing","authors":"S. Berkovich, E. El-Qawasmeh","doi":"10.1109/DCC.1998.672237","DOIUrl":null,"url":null,"abstract":"Summary form only given. The article presents an innovative approach to approximate matching of multi-attribute objects based on reversing the conventional scheme of error-correction coding. The approximate matching problem primarily arises in information retrieval systems, which can store fuzzily described items and operate with nebulous searching criteria. To establish an approximate equivalence relation on a set of multi-attribute objects it has been suggested to apply a decoding procedure to binary vectors corresponding to these objects and to use the obtained message words as hash codes. With this hashing technique it is possible to construct \"fault-tolerant\" indices allowing certain mismatches of binary vectors in terms of Hamming metrics. The simplest practical realization of this technique is based on the so-called perfect Golay code which maps 23-bit vectors into 12-bit message words. In this case, two different 23-bit vectors at a Hamming distance of 2 would have some common 12-bit indices. This provides an organization of a direct retrieval of a neighborhood of 23 bit-vectors with up to two mismatches from a given key. The proposed technique employs a reasonable redundancy and can trade utilization of extra memory for the speed and range of searching. Besides a direct application to information retrieval, the developed technique is also beneficial for complex computational procedures incorporating near-matching operations. A typical procedure of this kind is recovering of closed matches from vector-quantization tables.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Summary form only given. The article presents an innovative approach to approximate matching of multi-attribute objects based on reversing the conventional scheme of error-correction coding. The approximate matching problem primarily arises in information retrieval systems, which can store fuzzily described items and operate with nebulous searching criteria. To establish an approximate equivalence relation on a set of multi-attribute objects it has been suggested to apply a decoding procedure to binary vectors corresponding to these objects and to use the obtained message words as hash codes. With this hashing technique it is possible to construct "fault-tolerant" indices allowing certain mismatches of binary vectors in terms of Hamming metrics. The simplest practical realization of this technique is based on the so-called perfect Golay code which maps 23-bit vectors into 12-bit message words. In this case, two different 23-bit vectors at a Hamming distance of 2 would have some common 12-bit indices. This provides an organization of a direct retrieval of a neighborhood of 23 bit-vectors with up to two mismatches from a given key. The proposed technique employs a reasonable redundancy and can trade utilization of extra memory for the speed and range of searching. Besides a direct application to information retrieval, the developed technique is also beneficial for complex computational procedures incorporating near-matching operations. A typical procedure of this kind is recovering of closed matches from vector-quantization tables.