KNN算法的高性能GPU实现：综述

IF 1.9 Q2 MULTIDISCIPLINARY SCIENCES

MethodsX Pub Date : 2025-09-17 DOI:10.1016/j.mex.2025.103633

Pooja Bidye, Pradnya Borkar, Nitin Rakesh

{"title":"KNN算法的高性能GPU实现：综述","authors":"Pooja Bidye, Pradnya Borkar, Nitin Rakesh","doi":"10.1016/j.mex.2025.103633","DOIUrl":null,"url":null,"abstract":"<div><div>With large volumes of complex data generated by different applications, Machine Learning (ML) algorithms alone may not yield significant performance benefits on a single or multi-core CPU. Applying optimization techniques to these ML algorithms in a High-Performance Computing (HPC) environment can give considerable speedups for high-dimensional datasets. One of the most widely used classification algorithms, with applications in various domains, is the K-Nearest Neighbor (KNN). Despite its simplicity, KNN poses several challenges while handling high-dimensional data. However, the algorithm’s inherent nature presents an opportunity for parallelization. This paper reviews the optimization techniques employed by several researchers to accelerate the KNN algorithm on a GPU platform. The study reveals that techniques such as coalesced-memory access, tiling with shared memory, chunking, data segmentation, and pivot-based partitioning significantly contribute towards speeding up the KNN algorithm to leverage the GPU capabilities. The algorithms reviewed have performed exceptionally well on high-dimensional data with speedups up to 750x for a dual-GPU platform and up to 1840x for a multi-GPU platform. This study serves as a valuable resource for researchers examining KNN acceleration in high-performance computing environments and its applications in various fields.</div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103633"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High performance GPU implementation of KNN algorithm: A review\",\"authors\":\"Pooja Bidye, Pradnya Borkar, Nitin Rakesh\",\"doi\":\"10.1016/j.mex.2025.103633\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With large volumes of complex data generated by different applications, Machine Learning (ML) algorithms alone may not yield significant performance benefits on a single or multi-core CPU. Applying optimization techniques to these ML algorithms in a High-Performance Computing (HPC) environment can give considerable speedups for high-dimensional datasets. One of the most widely used classification algorithms, with applications in various domains, is the K-Nearest Neighbor (KNN). Despite its simplicity, KNN poses several challenges while handling high-dimensional data. However, the algorithm’s inherent nature presents an opportunity for parallelization. This paper reviews the optimization techniques employed by several researchers to accelerate the KNN algorithm on a GPU platform. The study reveals that techniques such as coalesced-memory access, tiling with shared memory, chunking, data segmentation, and pivot-based partitioning significantly contribute towards speeding up the KNN algorithm to leverage the GPU capabilities. The algorithms reviewed have performed exceptionally well on high-dimensional data with speedups up to 750x for a dual-GPU platform and up to 1840x for a multi-GPU platform. This study serves as a valuable resource for researchers examining KNN acceleration in high-performance computing environments and its applications in various fields.</div></div>\",\"PeriodicalId\":18446,\"journal\":{\"name\":\"MethodsX\",\"volume\":\"15 \",\"pages\":\"Article 103633\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MethodsX\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215016125004777\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125004777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

对于由不同应用程序生成的大量复杂数据，单独的机器学习（ML）算法可能无法在单核或多核CPU上产生显着的性能优势。在高性能计算（HPC）环境中对这些ML算法应用优化技术可以为高维数据集提供相当大的加速。最广泛使用的分类算法之一是k -最近邻（KNN），它在各个领域都有应用。尽管简单，但KNN在处理高维数据时提出了几个挑战。然而，该算法的固有特性为并行化提供了机会。本文综述了几位研究者在GPU平台上加速KNN算法所采用的优化技术。研究表明，诸如合并内存访问、共享内存平铺、分块、数据分段和基于枢轴的分区等技术显著有助于加速KNN算法，以利用GPU功能。所审查的算法在高维数据上表现非常好，双gpu平台的加速高达750倍，多gpu平台的加速高达1840倍。本研究为研究人员在高性能计算环境中研究KNN加速及其在各个领域的应用提供了宝贵的资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

High performance GPU implementation of KNN algorithm: A review

查看原文本刊更多论文

High performance GPU implementation of KNN algorithm: A review

With large volumes of complex data generated by different applications, Machine Learning (ML) algorithms alone may not yield significant performance benefits on a single or multi-core CPU. Applying optimization techniques to these ML algorithms in a High-Performance Computing (HPC) environment can give considerable speedups for high-dimensional datasets. One of the most widely used classification algorithms, with applications in various domains, is the K-Nearest Neighbor (KNN). Despite its simplicity, KNN poses several challenges while handling high-dimensional data. However, the algorithm’s inherent nature presents an opportunity for parallelization. This paper reviews the optimization techniques employed by several researchers to accelerate the KNN algorithm on a GPU platform. The study reveals that techniques such as coalesced-memory access, tiling with shared memory, chunking, data segmentation, and pivot-based partitioning significantly contribute towards speeding up the KNN algorithm to leverage the GPU capabilities. The algorithms reviewed have performed exceptionally well on high-dimensional data with speedups up to 750x for a dual-GPU platform and up to 1840x for a multi-GPU platform. This study serves as a valuable resource for researchers examining KNN acceleration in high-performance computing environments and its applications in various fields.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊