k++ kNN:精确搜索k个最近邻的快速算法

2023 18th Iberian Conference on Information Systems and Technologies (CISTI) Pub Date : 2023-06-20 DOI:10.23919/CISTI58278.2023.10211848

Raphael Lopes de Souza, Osvaldo Luiz De Oliveira

{"title":"k++ kNN:精确搜索k个最近邻的快速算法","authors":"Raphael Lopes de Souza, Osvaldo Luiz De Oliveira","doi":"10.23919/CISTI58278.2023.10211848","DOIUrl":null,"url":null,"abstract":"The k-NN algorithm - k-nearest neighbor - is widely used in Machine Learning and Statistics for tasks involving classification and regression. Having as inputs an instance x, a set of instances T and an integer $k \\geqslant 1$, the k-NN performs an exhaustive search in T of the k instances most similar to instance x (k-nearest neighbors). In applications involving many instances and/or instances with high dimensionality, the search process is time-consuming due to the need to perform many calculations of similarity functions between instances. Several proposals to reduce the k-NN search time have been made, some of them aiming at the exact search of the k most similar instances to x in T and, others, reducing the search time via approximate methods to calculate the most similar instances to x. This work proposes an algorithm called $\\mathrm{kM}++\\mathrm{kNN}$ for the exact search of the k most similar instances to x in T, which uses the triangular inequality concept to reduce the ${\\mathrm {k-N N}}$ search time. The ${\\mathrm {k M++k N N}}$ algorithm is compared, in experiments to measure the economy of the number of calculations of similarity functions between instances and search time, with an algorithm currently considered fast, the kMkNN.","PeriodicalId":121747,"journal":{"name":"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"kM++kNN : A fast algorithm for the exact search of k-nearest neighbors\",\"authors\":\"Raphael Lopes de Souza, Osvaldo Luiz De Oliveira\",\"doi\":\"10.23919/CISTI58278.2023.10211848\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The k-NN algorithm - k-nearest neighbor - is widely used in Machine Learning and Statistics for tasks involving classification and regression. Having as inputs an instance x, a set of instances T and an integer $k \\\\geqslant 1$, the k-NN performs an exhaustive search in T of the k instances most similar to instance x (k-nearest neighbors). In applications involving many instances and/or instances with high dimensionality, the search process is time-consuming due to the need to perform many calculations of similarity functions between instances. Several proposals to reduce the k-NN search time have been made, some of them aiming at the exact search of the k most similar instances to x in T and, others, reducing the search time via approximate methods to calculate the most similar instances to x. This work proposes an algorithm called $\\\\mathrm{kM}++\\\\mathrm{kNN}$ for the exact search of the k most similar instances to x in T, which uses the triangular inequality concept to reduce the ${\\\\mathrm {k-N N}}$ search time. The ${\\\\mathrm {k M++k N N}}$ algorithm is compared, in experiments to measure the economy of the number of calculations of similarity functions between instances and search time, with an algorithm currently considered fast, the kMkNN.\",\"PeriodicalId\":121747,\"journal\":{\"name\":\"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/CISTI58278.2023.10211848\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CISTI58278.2023.10211848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

k-NN算法- k近邻-被广泛应用于机器学习和统计中涉及分类和回归的任务。以实例x、一组实例T和一个整数$k \geqslant 1$作为输入，k- nn在T中执行与实例x最相似的k个实例(k-近邻)的穷举搜索。在涉及许多实例和/或具有高维的实例的应用程序中，由于需要在实例之间执行许多相似性函数的计算，因此搜索过程非常耗时。已经提出了几个减少k- nn搜索时间的建议，其中一些建议旨在精确搜索T中与x最相似的k个实例，另一些建议通过近似方法计算与x最相似的实例来减少搜索时间。这项工作提出了一个名为$\mathrm{kM}++\mathrm{kNN}$的算法，用于精确搜索T中与x最相似的k个实例，该算法使用三角不等式概念来减少${\mathrm {k-N N}}$搜索时间。在实验中，将${\mathrm {k M++k N N}}$算法与目前被认为快速的kMkNN算法进行比较，以衡量实例之间相似函数的计算次数和搜索时间的经济性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

kM++kNN : A fast algorithm for the exact search of k-nearest neighbors

The k-NN algorithm - k-nearest neighbor - is widely used in Machine Learning and Statistics for tasks involving classification and regression. Having as inputs an instance x, a set of instances T and an integer $k \geqslant 1$, the k-NN performs an exhaustive search in T of the k instances most similar to instance x (k-nearest neighbors). In applications involving many instances and/or instances with high dimensionality, the search process is time-consuming due to the need to perform many calculations of similarity functions between instances. Several proposals to reduce the k-NN search time have been made, some of them aiming at the exact search of the k most similar instances to x in T and, others, reducing the search time via approximate methods to calculate the most similar instances to x. This work proposes an algorithm called $\mathrm{kM}++\mathrm{kNN}$ for the exact search of the k most similar instances to x in T, which uses the triangular inequality concept to reduce the ${\mathrm {k-N N}}$ search time. The ${\mathrm {k M++k N N}}$ algorithm is compared, in experiments to measure the economy of the number of calculations of similarity functions between instances and search time, with an algorithm currently considered fast, the kMkNN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 18th Iberian Conference on Information Systems and Technologies (CISTI)

自引率

0.00%

发文量