基于一维嵌入的相似性搜索

Proceedings of the Symposium on Applied Computing Pub Date : 2017-04-03 DOI:10.1145/3019612.3019674

H. Razente, Rafael L. Bernardes Lima, M. Barioni

{"title":"基于一维嵌入的相似性搜索","authors":"H. Razente, Rafael L. Bernardes Lima, M. Barioni","doi":"10.1145/3019612.3019674","DOIUrl":null,"url":null,"abstract":"The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.","PeriodicalId":20728,"journal":{"name":"Proceedings of the Symposium on Applied Computing","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Similarity search through one-dimensional embeddings\",\"authors\":\"H. Razente, Rafael L. Bernardes Lima, M. Barioni\",\"doi\":\"10.1145/3019612.3019674\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.\",\"PeriodicalId\":20728,\"journal\":{\"name\":\"Proceedings of the Symposium on Applied Computing\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Symposium on Applied Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3019612.3019674\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Symposium on Applied Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3019612.3019674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

相似性查询的优化通常使用称为度量访问方法的专用数据结构来完成。最近有人提出使用B+树来索引度量空间中的范围和最近邻搜索的高维数据。本文介绍了一种名为GroupSim的新访问方法和查询算法，用于根据相似性对复杂数据进行索引和检索。它使用单个B+树，以便根据一组一维嵌入动态索引数据元素。我们的策略使用了一种新的方案来存储距离信息，允许直接确定每个元素是否位于嵌入的交叉点上。我们将GroupSim与iDistance和OmniB-Forest这两种相关方法进行了比较，并通过经验证明，在运行相似性查询所需的时间方面，新的访问方法优于它们。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Similarity search through one-dimensional embeddings

The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Symposium on Applied Computing

自引率

0.00%

发文量