Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator

arXiv - MATH - Information Theory Pub Date : 2024-09-08 DOI:arxiv-2409.05084

Alexandre Luís Magalhães Levada, Frank Nielsen, Michel Ferreira Cardia Haddad

{"title":"Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator","authors":"Alexandre Luís Magalhães Levada, Frank Nielsen, Michel Ferreira Cardia Haddad","doi":"arxiv-2409.05084","DOIUrl":null,"url":null,"abstract":"The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular\nmethods for nonparametric classification. However, a relevant limitation\nconcerns the definition of the number of neighbors $k$. This parameter exerts a\ndirect impact on several properties of the classifier, such as the\nbias-variance tradeoff, smoothness of decision boundaries, robustness to noise,\nand class imbalance handling. In the present paper, we introduce a new adaptive\n$k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at\na sample to adaptively defining the neighborhood size. The rationale is that\npoints with low curvature could have larger neighborhoods (locally, the tangent\nspace approximates well the underlying data shape), whereas points with high\ncurvature could have smaller neighborhoods (locally, the tangent space is a\nloose approximation). We estimate the local Gaussian curvature by computing an\napproximation to the local shape operator in terms of the local covariance\nmatrix as well as the local Hessian matrix. Results on many real-world datasets\nindicate that the new $kK$-NN algorithm yields superior balanced accuracy\ncompared to the established $k$-NN method and also another adaptive $k$-NN\nalgorithm. This is particularly evident when the number of samples in the\ntraining data is limited, suggesting that the $kK$-NN is capable of learning\nmore discriminant functions with less data considering many relevant cases.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular methods for nonparametric classification. However, a relevant limitation concerns the definition of the number of neighbors $k$. This parameter exerts a direct impact on several properties of the classifier, such as the bias-variance tradeoff, smoothness of decision boundaries, robustness to noise, and class imbalance handling. In the present paper, we introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. The rationale is that points with low curvature could have larger neighborhoods (locally, the tangent space approximates well the underlying data shape), whereas points with high curvature could have smaller neighborhoods (locally, the tangent space is a loose approximation). We estimate the local Gaussian curvature by computing an approximation to the local shape operator in terms of the local covariance matrix as well as the local Hessian matrix. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method and also another adaptive $k$-NN algorithm. This is particularly evident when the number of samples in the training data is limited, suggesting that the $kK$-NN is capable of learning more discriminant functions with less data considering many relevant cases.

查看原文本刊更多论文

基于形状算子局部估计的自适应 k$ 近邻分类器

k$ 近邻（$k$-NN）算法是最流行的非参数分类方法之一。然而，一个相关的限制因素是近邻数 $k$ 的定义。该参数直接影响分类器的几个特性，如偏差-方差权衡、决策边界的平滑性、对噪声的鲁棒性和类不平衡处理。在本文中，我们介绍了一种新的自适应 k$ 近邻（kK$-NN）算法，该算法利用样本的局部曲率来自适应地定义邻域大小。这种算法的原理是，曲率低的点可以有较大的邻域（在局部，切线空间可以很好地近似底层数据的形状），而曲率高的点可以有较小的邻域（在局部，切线空间是一种很好的近似）。我们通过计算局部协方差矩阵和局部黑森矩阵与局部形状算子的近似值来估计局部高斯曲率。在许多实际数据集上的结果表明，与已有的$k$-NN方法和另一种自适应$k$-NN算法相比，新的$kK$-NN算法具有更高的平衡精度。这一点在训练数据样本数量有限的情况下尤为明显，表明考虑到许多相关案例，$kK$-NN 能够用较少的数据学习更多的判别函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - MATH - Information Theory

自引率

0.00%

发文量