无撕裂的相似搜索:omni家族的通用访问方法

Proceedings 17th International Conference on Data Engineering Pub Date : 2001-04-02 DOI:10.1109/ICDE.2001.914877

R. S. Filho, A. Traina, C. Traina, C. Faloutsos

{"title":"无撕裂的相似搜索:omni家族的通用访问方法","authors":"R. S. Filho, A. Traina, C. Traina, C. Faloutsos","doi":"10.1109/ICDE.2001.914877","DOIUrl":null,"url":null,"abstract":"Designing a new access method inside a commercial DBMS is cumbersome and expensive. We propose a family of metric access methods that are fast and easy to implement on top of existing access methods, such as sequential scan, R-trees and Slim-trees. The idea is to elect a set of objects as foci, and gauge all other objects with their distances from this set. We show how to define the foci set cardinality, how to choose appropriate foci, and how to perform range and nearest-neighbor queries using them, without false dismissals. The foci increase the pruning of distance calculations during the query processing. Furthermore we index the distances from each object to the foci to reduce even triangular inequality comparisons. Experiments on real and synthetic datasets show that our methods match or outperform existing methods. They are up to 10 times faster, and perform up to 10 times fewer distance calculations and disk accesses. In addition, it scales up well, exhibiting sub-linear performance with growing database size.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"154","resultStr":"{\"title\":\"Similarity search without tears: the OMNI-family of all-purpose access methods\",\"authors\":\"R. S. Filho, A. Traina, C. Traina, C. Faloutsos\",\"doi\":\"10.1109/ICDE.2001.914877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designing a new access method inside a commercial DBMS is cumbersome and expensive. We propose a family of metric access methods that are fast and easy to implement on top of existing access methods, such as sequential scan, R-trees and Slim-trees. The idea is to elect a set of objects as foci, and gauge all other objects with their distances from this set. We show how to define the foci set cardinality, how to choose appropriate foci, and how to perform range and nearest-neighbor queries using them, without false dismissals. The foci increase the pruning of distance calculations during the query processing. Furthermore we index the distances from each object to the foci to reduce even triangular inequality comparisons. Experiments on real and synthetic datasets show that our methods match or outperform existing methods. They are up to 10 times faster, and perform up to 10 times fewer distance calculations and disk accesses. In addition, it scales up well, exhibiting sub-linear performance with growing database size.\",\"PeriodicalId\":431818,\"journal\":{\"name\":\"Proceedings 17th International Conference on Data Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"154\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 17th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2001.914877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 17th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2001.914877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 154

摘要

在商业DBMS中设计一种新的访问方法既麻烦又昂贵。我们提出了一系列快速且易于实现的度量访问方法，如顺序扫描、r树和slim -tree。这个想法是选择一组物体作为焦点，并测量所有其他物体与这组物体的距离。我们将展示如何定义焦点集基数，如何选择合适的焦点，以及如何使用它们执行范围查询和最近邻查询，而不会出现错误的忽略。焦点增加了查询处理过程中距离计算的修剪。此外，我们索引从每个对象到焦点的距离，以减少三角不平等的比较。在真实和合成数据集上的实验表明，我们的方法与现有方法相匹配或优于现有方法。它们的速度最高可达10倍，执行的距离计算和磁盘访问最多可减少10倍。此外，它可以很好地扩展，随着数据库大小的增长表现出亚线性的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Similarity search without tears: the OMNI-family of all-purpose access methods

Designing a new access method inside a commercial DBMS is cumbersome and expensive. We propose a family of metric access methods that are fast and easy to implement on top of existing access methods, such as sequential scan, R-trees and Slim-trees. The idea is to elect a set of objects as foci, and gauge all other objects with their distances from this set. We show how to define the foci set cardinality, how to choose appropriate foci, and how to perform range and nearest-neighbor queries using them, without false dismissals. The foci increase the pruning of distance calculations during the query processing. Furthermore we index the distances from each object to the foci to reduce even triangular inequality comparisons. Experiments on real and synthetic datasets show that our methods match or outperform existing methods. They are up to 10 times faster, and perform up to 10 times fewer distance calculations and disk accesses. In addition, it scales up well, exhibiting sub-linear performance with growing database size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 17th International Conference on Data Engineering

自引率

0.00%

发文量