无撕裂的相似搜索:omni家族的通用访问方法

R. S. Filho, A. Traina, C. Traina, C. Faloutsos
{"title":"无撕裂的相似搜索:omni家族的通用访问方法","authors":"R. S. Filho, A. Traina, C. Traina, C. Faloutsos","doi":"10.1109/ICDE.2001.914877","DOIUrl":null,"url":null,"abstract":"Designing a new access method inside a commercial DBMS is cumbersome and expensive. We propose a family of metric access methods that are fast and easy to implement on top of existing access methods, such as sequential scan, R-trees and Slim-trees. The idea is to elect a set of objects as foci, and gauge all other objects with their distances from this set. We show how to define the foci set cardinality, how to choose appropriate foci, and how to perform range and nearest-neighbor queries using them, without false dismissals. The foci increase the pruning of distance calculations during the query processing. Furthermore we index the distances from each object to the foci to reduce even triangular inequality comparisons. Experiments on real and synthetic datasets show that our methods match or outperform existing methods. They are up to 10 times faster, and perform up to 10 times fewer distance calculations and disk accesses. In addition, it scales up well, exhibiting sub-linear performance with growing database size.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"154","resultStr":"{\"title\":\"Similarity search without tears: the OMNI-family of all-purpose access methods\",\"authors\":\"R. S. Filho, A. Traina, C. Traina, C. Faloutsos\",\"doi\":\"10.1109/ICDE.2001.914877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designing a new access method inside a commercial DBMS is cumbersome and expensive. We propose a family of metric access methods that are fast and easy to implement on top of existing access methods, such as sequential scan, R-trees and Slim-trees. The idea is to elect a set of objects as foci, and gauge all other objects with their distances from this set. We show how to define the foci set cardinality, how to choose appropriate foci, and how to perform range and nearest-neighbor queries using them, without false dismissals. The foci increase the pruning of distance calculations during the query processing. Furthermore we index the distances from each object to the foci to reduce even triangular inequality comparisons. Experiments on real and synthetic datasets show that our methods match or outperform existing methods. They are up to 10 times faster, and perform up to 10 times fewer distance calculations and disk accesses. In addition, it scales up well, exhibiting sub-linear performance with growing database size.\",\"PeriodicalId\":431818,\"journal\":{\"name\":\"Proceedings 17th International Conference on Data Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"154\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 17th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2001.914877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 17th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2001.914877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 154

摘要

在商业DBMS中设计一种新的访问方法既麻烦又昂贵。我们提出了一系列快速且易于实现的度量访问方法,如顺序扫描、r树和slim -tree。这个想法是选择一组物体作为焦点,并测量所有其他物体与这组物体的距离。我们将展示如何定义焦点集基数,如何选择合适的焦点,以及如何使用它们执行范围查询和最近邻查询,而不会出现错误的忽略。焦点增加了查询处理过程中距离计算的修剪。此外,我们索引从每个对象到焦点的距离,以减少三角不平等的比较。在真实和合成数据集上的实验表明,我们的方法与现有方法相匹配或优于现有方法。它们的速度最高可达10倍,执行的距离计算和磁盘访问最多可减少10倍。此外,它可以很好地扩展,随着数据库大小的增长表现出亚线性的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Similarity search without tears: the OMNI-family of all-purpose access methods
Designing a new access method inside a commercial DBMS is cumbersome and expensive. We propose a family of metric access methods that are fast and easy to implement on top of existing access methods, such as sequential scan, R-trees and Slim-trees. The idea is to elect a set of objects as foci, and gauge all other objects with their distances from this set. We show how to define the foci set cardinality, how to choose appropriate foci, and how to perform range and nearest-neighbor queries using them, without false dismissals. The foci increase the pruning of distance calculations during the query processing. Furthermore we index the distances from each object to the foci to reduce even triangular inequality comparisons. Experiments on real and synthetic datasets show that our methods match or outperform existing methods. They are up to 10 times faster, and perform up to 10 times fewer distance calculations and disk accesses. In addition, it scales up well, exhibiting sub-linear performance with growing database size.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信