{"title":"Improving Nearest Neighbor Indexing by Multitask Learning","authors":"Amorntip Prayoonwong, Ke Zeng, Chih-Yi Chiu","doi":"10.1145/3549555.3549579","DOIUrl":null,"url":null,"abstract":"In the task of approximate nearest neighbor search, the conventional lookup-table indexing calculates the distances (or similarities) between the query and codewords, and then re-ranks the data points associated with the nearest (or the most similar) codewords. To address the codeword quantization loss problem exhibited in the conventional method, the probability-based indexing leverages the data distribution among codewords learned by neural networks to locate the nearest neighbor [8]. In this paper, we present a multitasking model to improve the probability-based indexing method. The model is formulated by two objectives of NN distribution probabilities and data retrieval quantity. The NN distribution probabilities are an estimation to determine the possible codewords where the nearest neighbor may be associated. The candidate retrieval quantity specifies the prediction for the least number of codewords to be re-ranked for capturing the nearest neighbor. The proposed model is then trained by minimizing triplet loss, probability loss, and quantity loss. By learning these tasks in parallel, we find the predictions for both data distribution probability and data retrieval quantity are more accurate, so that search accuracy and computation efficiency can be improved together. We experiment on two billion-scale benchmark datasets to evaluate the proposed method and compare with several approximate nearest neighbor search methods, and the results demonstrate the outperformance of the proposed method.","PeriodicalId":191591,"journal":{"name":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Conference on Content-based Multimedia Indexing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3549555.3549579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the task of approximate nearest neighbor search, the conventional lookup-table indexing calculates the distances (or similarities) between the query and codewords, and then re-ranks the data points associated with the nearest (or the most similar) codewords. To address the codeword quantization loss problem exhibited in the conventional method, the probability-based indexing leverages the data distribution among codewords learned by neural networks to locate the nearest neighbor [8]. In this paper, we present a multitasking model to improve the probability-based indexing method. The model is formulated by two objectives of NN distribution probabilities and data retrieval quantity. The NN distribution probabilities are an estimation to determine the possible codewords where the nearest neighbor may be associated. The candidate retrieval quantity specifies the prediction for the least number of codewords to be re-ranked for capturing the nearest neighbor. The proposed model is then trained by minimizing triplet loss, probability loss, and quantity loss. By learning these tasks in parallel, we find the predictions for both data distribution probability and data retrieval quantity are more accurate, so that search accuracy and computation efficiency can be improved together. We experiment on two billion-scale benchmark datasets to evaluate the proposed method and compare with several approximate nearest neighbor search methods, and the results demonstrate the outperformance of the proposed method.