{"title":"$\\boldsymbol{Steiner}$-Hardness:基于图的 ANN 索引的查询硬度度量","authors":"Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, Wei Wang","doi":"arxiv-2408.13899","DOIUrl":null,"url":null,"abstract":"Graph-based indexes have been widely employed to accelerate approximate\nsimilarity search of high-dimensional vectors. However, the performance of\ngraph indexes to answer different queries varies vastly, leading to an unstable\nquality of service for downstream applications. This necessitates an effective\nmeasure to test query hardness on graph indexes. Nonetheless, popular\ndistance-based hardness measures like LID lose their effects due to the\nignorance of the graph structure. In this paper, we propose $Steiner$-hardness,\na novel connection-based graph-native query hardness measure. Specifically, we\nfirst propose a theoretical framework to analyze the minimum query effort on\ngraph indexes and then define $Steiner$-hardness as the minimum effort on a\nrepresentative graph. Moreover, we prove that our $Steiner$-hardness is highly\nrelevant to the classical Directed $Steiner$ Tree (DST) problems. In this case,\nwe design a novel algorithm to reduce our problem to DST problems and then\nleverage their solvers to help calculate $Steiner$-hardness efficiently.\nCompared with LID and other similar measures, $Steiner$-hardness shows a\nsignificantly better correlation with the actual query effort on various\ndatasets. Additionally, an unbiased evaluation designed based on\n$Steiner$-hardness reveals new ranking results, indicating a meaningful\ndirection for enhancing the robustness of graph indexes. This paper is accepted\nby PVLDB 2025.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"95 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"$\\\\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes\",\"authors\":\"Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, Wei Wang\",\"doi\":\"arxiv-2408.13899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph-based indexes have been widely employed to accelerate approximate\\nsimilarity search of high-dimensional vectors. However, the performance of\\ngraph indexes to answer different queries varies vastly, leading to an unstable\\nquality of service for downstream applications. This necessitates an effective\\nmeasure to test query hardness on graph indexes. Nonetheless, popular\\ndistance-based hardness measures like LID lose their effects due to the\\nignorance of the graph structure. In this paper, we propose $Steiner$-hardness,\\na novel connection-based graph-native query hardness measure. Specifically, we\\nfirst propose a theoretical framework to analyze the minimum query effort on\\ngraph indexes and then define $Steiner$-hardness as the minimum effort on a\\nrepresentative graph. Moreover, we prove that our $Steiner$-hardness is highly\\nrelevant to the classical Directed $Steiner$ Tree (DST) problems. In this case,\\nwe design a novel algorithm to reduce our problem to DST problems and then\\nleverage their solvers to help calculate $Steiner$-hardness efficiently.\\nCompared with LID and other similar measures, $Steiner$-hardness shows a\\nsignificantly better correlation with the actual query effort on various\\ndatasets. Additionally, an unbiased evaluation designed based on\\n$Steiner$-hardness reveals new ranking results, indicating a meaningful\\ndirection for enhancing the robustness of graph indexes. This paper is accepted\\nby PVLDB 2025.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":\"95 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.13899\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.13899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
$\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes
Graph-based indexes have been widely employed to accelerate approximate
similarity search of high-dimensional vectors. However, the performance of
graph indexes to answer different queries varies vastly, leading to an unstable
quality of service for downstream applications. This necessitates an effective
measure to test query hardness on graph indexes. Nonetheless, popular
distance-based hardness measures like LID lose their effects due to the
ignorance of the graph structure. In this paper, we propose $Steiner$-hardness,
a novel connection-based graph-native query hardness measure. Specifically, we
first propose a theoretical framework to analyze the minimum query effort on
graph indexes and then define $Steiner$-hardness as the minimum effort on a
representative graph. Moreover, we prove that our $Steiner$-hardness is highly
relevant to the classical Directed $Steiner$ Tree (DST) problems. In this case,
we design a novel algorithm to reduce our problem to DST problems and then
leverage their solvers to help calculate $Steiner$-hardness efficiently.
Compared with LID and other similar measures, $Steiner$-hardness shows a
significantly better correlation with the actual query effort on various
datasets. Additionally, an unbiased evaluation designed based on
$Steiner$-hardness reveals new ranking results, indicating a meaningful
direction for enhancing the robustness of graph indexes. This paper is accepted
by PVLDB 2025.