Metric Index: An Efficient and Scalable Solution for Similarity Search

David Novak, Michal Batko
{"title":"Metric Index: An Efficient and Scalable Solution for Similarity Search","authors":"David Novak, Michal Batko","doi":"10.1109/SISAP.2009.26","DOIUrl":null,"url":null,"abstract":"Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches – the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient – maintaining practically constant response times while preserving a very high recall as the dataset grows.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Second International Workshop on Similarity Search and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SISAP.2009.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 52

Abstract

Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches – the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient – maintaining practically constant response times while preserving a very high recall as the dataset grows.
度量索引:一种高效、可扩展的相似度搜索解决方案
度量空间作为一种通用的、通用的相似度模型,可以应用于非文本信息检索的各个领域。然而,一个通用的、高效的、可扩展的度量数据管理解决方案仍然是一个具有挑战性的研究课题。我们引入了一种新的索引和搜索机制,称为度量索引(M-Index),它几乎采用了所有已知的度量空间划分、修剪和过滤原理。M-Index的核心是一种通用的映射机制,它能够将数据实际存储在已建立的结构中,比如B+树,甚至是分布式存储中。我们实现了带有B+树的M-Index,并在包含数十万张数字图像的数据库中对5个MPEG-7描述符的组合进行了实验。实验测试了几个M-Index变量,并将它们与两种正交方法(PM-Tree和iDistance)进行了比较。试验表明,M-Index在搜索空间修剪的效率、I/O成本和精确相似查询的响应时间方面优于其他方法。此外,M-Index展示了保持索引中相似数据接近的出色能力,这使得它的近似算法非常高效——随着数据集的增长,保持几乎恒定的响应时间,同时保持非常高的召回率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信