Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator

Alexandre Luís Magalhães Levada, Frank Nielsen, Michel Ferreira Cardia Haddad
{"title":"Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator","authors":"Alexandre Luís Magalhães Levada, Frank Nielsen, Michel Ferreira Cardia Haddad","doi":"arxiv-2409.05084","DOIUrl":null,"url":null,"abstract":"The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular\nmethods for nonparametric classification. However, a relevant limitation\nconcerns the definition of the number of neighbors $k$. This parameter exerts a\ndirect impact on several properties of the classifier, such as the\nbias-variance tradeoff, smoothness of decision boundaries, robustness to noise,\nand class imbalance handling. In the present paper, we introduce a new adaptive\n$k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at\na sample to adaptively defining the neighborhood size. The rationale is that\npoints with low curvature could have larger neighborhoods (locally, the tangent\nspace approximates well the underlying data shape), whereas points with high\ncurvature could have smaller neighborhoods (locally, the tangent space is a\nloose approximation). We estimate the local Gaussian curvature by computing an\napproximation to the local shape operator in terms of the local covariance\nmatrix as well as the local Hessian matrix. Results on many real-world datasets\nindicate that the new $kK$-NN algorithm yields superior balanced accuracy\ncompared to the established $k$-NN method and also another adaptive $k$-NN\nalgorithm. This is particularly evident when the number of samples in the\ntraining data is limited, suggesting that the $kK$-NN is capable of learning\nmore discriminant functions with less data considering many relevant cases.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular methods for nonparametric classification. However, a relevant limitation concerns the definition of the number of neighbors $k$. This parameter exerts a direct impact on several properties of the classifier, such as the bias-variance tradeoff, smoothness of decision boundaries, robustness to noise, and class imbalance handling. In the present paper, we introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. The rationale is that points with low curvature could have larger neighborhoods (locally, the tangent space approximates well the underlying data shape), whereas points with high curvature could have smaller neighborhoods (locally, the tangent space is a loose approximation). We estimate the local Gaussian curvature by computing an approximation to the local shape operator in terms of the local covariance matrix as well as the local Hessian matrix. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method and also another adaptive $k$-NN algorithm. This is particularly evident when the number of samples in the training data is limited, suggesting that the $kK$-NN is capable of learning more discriminant functions with less data considering many relevant cases.
基于形状算子局部估计的自适应 k$ 近邻分类器
k$ 近邻($k$-NN)算法是最流行的非参数分类方法之一。然而,一个相关的限制因素是近邻数 $k$ 的定义。该参数直接影响分类器的几个特性,如偏差-方差权衡、决策边界的平滑性、对噪声的鲁棒性和类不平衡处理。在本文中,我们介绍了一种新的自适应 k$ 近邻(kK$-NN)算法,该算法利用样本的局部曲率来自适应地定义邻域大小。这种算法的原理是,曲率低的点可以有较大的邻域(在局部,切线空间可以很好地近似底层数据的形状),而曲率高的点可以有较小的邻域(在局部,切线空间是一种很好的近似)。我们通过计算局部协方差矩阵和局部黑森矩阵与局部形状算子的近似值来估计局部高斯曲率。在许多实际数据集上的结果表明,与已有的$k$-NN方法和另一种自适应$k$-NN算法相比,新的$kK$-NN算法具有更高的平衡精度。这一点在训练数据样本数量有限的情况下尤为明显,表明考虑到许多相关案例,$kK$-NN 能够用较少的数据学习更多的判别函数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信