{"title":"Instability results for cosine-dissimilarity-based nearest neighbor search on high dimensional Gaussian data","authors":"Chris R. Giannella","doi":"10.1016/j.ipl.2024.106542","DOIUrl":null,"url":null,"abstract":"<div><div>Because many dissimilarity functions behave differently in low versus high-dimensional spaces, the behavior of high-dimensional nearest neighbor search has been studied extensively. One line of research involves the characterization of nearest neighbor queries as unstable if their query points have nearly identical dissimilarity with most points in the dataset. This research has shown that, for various data distributions and dissimilarity functions, the probability of query instability approaches one. Previous work in <em>Information Processing Letters</em> by C. Giannella in 2021 explicated this phenomenon for centered Gaussian data and Euclidean distance. This paper addresses the problem of characterizing query instability behavior over centered Gaussian data and a fundamentally different dissimilarity function, cosine dissimilarity. Conditions are provided on the covariance matrices and dataset size function guaranteeing that the probability of query instability goes to one. Furthermore, conditions are provided under which the instability probability is bounded away from one.</div></div>","PeriodicalId":56290,"journal":{"name":"Information Processing Letters","volume":"189 ","pages":"Article 106542"},"PeriodicalIF":0.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020019024000723","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Because many dissimilarity functions behave differently in low versus high-dimensional spaces, the behavior of high-dimensional nearest neighbor search has been studied extensively. One line of research involves the characterization of nearest neighbor queries as unstable if their query points have nearly identical dissimilarity with most points in the dataset. This research has shown that, for various data distributions and dissimilarity functions, the probability of query instability approaches one. Previous work in Information Processing Letters by C. Giannella in 2021 explicated this phenomenon for centered Gaussian data and Euclidean distance. This paper addresses the problem of characterizing query instability behavior over centered Gaussian data and a fundamentally different dissimilarity function, cosine dissimilarity. Conditions are provided on the covariance matrices and dataset size function guaranteeing that the probability of query instability goes to one. Furthermore, conditions are provided under which the instability probability is bounded away from one.
期刊介绍:
Information Processing Letters invites submission of original research articles that focus on fundamental aspects of information processing and computing. This naturally includes work in the broadly understood field of theoretical computer science; although papers in all areas of scientific inquiry will be given consideration, provided that they describe research contributions credibly motivated by applications to computing and involve rigorous methodology. High quality experimental papers that address topics of sufficiently broad interest may also be considered.
Since its inception in 1971, Information Processing Letters has served as a forum for timely dissemination of short, concise and focused research contributions. Continuing with this tradition, and to expedite the reviewing process, manuscripts are generally limited in length to nine pages when they appear in print.