k++ kNN:精确搜索k个最近邻的快速算法

Raphael Lopes de Souza, Osvaldo Luiz De Oliveira
{"title":"k++ kNN:精确搜索k个最近邻的快速算法","authors":"Raphael Lopes de Souza, Osvaldo Luiz De Oliveira","doi":"10.23919/CISTI58278.2023.10211848","DOIUrl":null,"url":null,"abstract":"The k-NN algorithm - k-nearest neighbor - is widely used in Machine Learning and Statistics for tasks involving classification and regression. Having as inputs an instance x, a set of instances T and an integer $k \\geqslant 1$, the k-NN performs an exhaustive search in T of the k instances most similar to instance x (k-nearest neighbors). In applications involving many instances and/or instances with high dimensionality, the search process is time-consuming due to the need to perform many calculations of similarity functions between instances. Several proposals to reduce the k-NN search time have been made, some of them aiming at the exact search of the k most similar instances to x in T and, others, reducing the search time via approximate methods to calculate the most similar instances to x. This work proposes an algorithm called $\\mathrm{kM}++\\mathrm{kNN}$ for the exact search of the k most similar instances to x in T, which uses the triangular inequality concept to reduce the ${\\mathrm {k-N N}}$ search time. The ${\\mathrm {k M++k N N}}$ algorithm is compared, in experiments to measure the economy of the number of calculations of similarity functions between instances and search time, with an algorithm currently considered fast, the kMkNN.","PeriodicalId":121747,"journal":{"name":"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"kM++kNN : A fast algorithm for the exact search of k-nearest neighbors\",\"authors\":\"Raphael Lopes de Souza, Osvaldo Luiz De Oliveira\",\"doi\":\"10.23919/CISTI58278.2023.10211848\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The k-NN algorithm - k-nearest neighbor - is widely used in Machine Learning and Statistics for tasks involving classification and regression. Having as inputs an instance x, a set of instances T and an integer $k \\\\geqslant 1$, the k-NN performs an exhaustive search in T of the k instances most similar to instance x (k-nearest neighbors). In applications involving many instances and/or instances with high dimensionality, the search process is time-consuming due to the need to perform many calculations of similarity functions between instances. Several proposals to reduce the k-NN search time have been made, some of them aiming at the exact search of the k most similar instances to x in T and, others, reducing the search time via approximate methods to calculate the most similar instances to x. This work proposes an algorithm called $\\\\mathrm{kM}++\\\\mathrm{kNN}$ for the exact search of the k most similar instances to x in T, which uses the triangular inequality concept to reduce the ${\\\\mathrm {k-N N}}$ search time. The ${\\\\mathrm {k M++k N N}}$ algorithm is compared, in experiments to measure the economy of the number of calculations of similarity functions between instances and search time, with an algorithm currently considered fast, the kMkNN.\",\"PeriodicalId\":121747,\"journal\":{\"name\":\"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/CISTI58278.2023.10211848\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th Iberian Conference on Information Systems and Technologies (CISTI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CISTI58278.2023.10211848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

k-NN算法- k近邻-被广泛应用于机器学习和统计中涉及分类和回归的任务。以实例x、一组实例T和一个整数$k \geqslant 1$作为输入,k- nn在T中执行与实例x最相似的k个实例(k-近邻)的穷举搜索。在涉及许多实例和/或具有高维的实例的应用程序中,由于需要在实例之间执行许多相似性函数的计算,因此搜索过程非常耗时。已经提出了几个减少k- nn搜索时间的建议,其中一些建议旨在精确搜索T中与x最相似的k个实例,另一些建议通过近似方法计算与x最相似的实例来减少搜索时间。这项工作提出了一个名为$\mathrm{kM}++\mathrm{kNN}$的算法,用于精确搜索T中与x最相似的k个实例,该算法使用三角不等式概念来减少${\mathrm {k-N N}}$搜索时间。在实验中,将${\mathrm {k M++k N N}}$算法与目前被认为快速的kMkNN算法进行比较,以衡量实例之间相似函数的计算次数和搜索时间的经济性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
kM++kNN : A fast algorithm for the exact search of k-nearest neighbors
The k-NN algorithm - k-nearest neighbor - is widely used in Machine Learning and Statistics for tasks involving classification and regression. Having as inputs an instance x, a set of instances T and an integer $k \geqslant 1$, the k-NN performs an exhaustive search in T of the k instances most similar to instance x (k-nearest neighbors). In applications involving many instances and/or instances with high dimensionality, the search process is time-consuming due to the need to perform many calculations of similarity functions between instances. Several proposals to reduce the k-NN search time have been made, some of them aiming at the exact search of the k most similar instances to x in T and, others, reducing the search time via approximate methods to calculate the most similar instances to x. This work proposes an algorithm called $\mathrm{kM}++\mathrm{kNN}$ for the exact search of the k most similar instances to x in T, which uses the triangular inequality concept to reduce the ${\mathrm {k-N N}}$ search time. The ${\mathrm {k M++k N N}}$ algorithm is compared, in experiments to measure the economy of the number of calculations of similarity functions between instances and search time, with an algorithm currently considered fast, the kMkNN.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信