{"title":"Similarity search through one-dimensional embeddings","authors":"H. Razente, Rafael L. Bernardes Lima, M. Barioni","doi":"10.1145/3019612.3019674","DOIUrl":null,"url":null,"abstract":"The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.","PeriodicalId":20728,"journal":{"name":"Proceedings of the Symposium on Applied Computing","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Symposium on Applied Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3019612.3019674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The optimization of similarity queries is often done with specialized data structures known as metric access methods. It has recently been proposed the use of B+trees to index high dimensional data for range and nearest neighbor search in metric spaces. This work1 introduces a new access method called GroupSim and query algorithms for indexing and retrieving complex data by similarity. It employs a single B+tree in order to dynamically index data elements with regard to a set of one-dimensional embeddings. Our strategy uses a new scheme to store distance information, allowing to determine directly if each element lies on the intersection of the embeddings. We compare GroupSim with two related methods, iDistance and OmniB-Forest, and we show empirically the new access method outperforms them with regard to the time required to run similarity queries.