Alexandre Luís Magalhães Levada, Frank Nielsen, Michel Ferreira Cardia Haddad
{"title":"基于形状算子局部估计的自适应 k$ 近邻分类器","authors":"Alexandre Luís Magalhães Levada, Frank Nielsen, Michel Ferreira Cardia Haddad","doi":"arxiv-2409.05084","DOIUrl":null,"url":null,"abstract":"The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular\nmethods for nonparametric classification. However, a relevant limitation\nconcerns the definition of the number of neighbors $k$. This parameter exerts a\ndirect impact on several properties of the classifier, such as the\nbias-variance tradeoff, smoothness of decision boundaries, robustness to noise,\nand class imbalance handling. In the present paper, we introduce a new adaptive\n$k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at\na sample to adaptively defining the neighborhood size. The rationale is that\npoints with low curvature could have larger neighborhoods (locally, the tangent\nspace approximates well the underlying data shape), whereas points with high\ncurvature could have smaller neighborhoods (locally, the tangent space is a\nloose approximation). We estimate the local Gaussian curvature by computing an\napproximation to the local shape operator in terms of the local covariance\nmatrix as well as the local Hessian matrix. Results on many real-world datasets\nindicate that the new $kK$-NN algorithm yields superior balanced accuracy\ncompared to the established $k$-NN method and also another adaptive $k$-NN\nalgorithm. This is particularly evident when the number of samples in the\ntraining data is limited, suggesting that the $kK$-NN is capable of learning\nmore discriminant functions with less data considering many relevant cases.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator\",\"authors\":\"Alexandre Luís Magalhães Levada, Frank Nielsen, Michel Ferreira Cardia Haddad\",\"doi\":\"arxiv-2409.05084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular\\nmethods for nonparametric classification. However, a relevant limitation\\nconcerns the definition of the number of neighbors $k$. This parameter exerts a\\ndirect impact on several properties of the classifier, such as the\\nbias-variance tradeoff, smoothness of decision boundaries, robustness to noise,\\nand class imbalance handling. In the present paper, we introduce a new adaptive\\n$k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at\\na sample to adaptively defining the neighborhood size. The rationale is that\\npoints with low curvature could have larger neighborhoods (locally, the tangent\\nspace approximates well the underlying data shape), whereas points with high\\ncurvature could have smaller neighborhoods (locally, the tangent space is a\\nloose approximation). We estimate the local Gaussian curvature by computing an\\napproximation to the local shape operator in terms of the local covariance\\nmatrix as well as the local Hessian matrix. Results on many real-world datasets\\nindicate that the new $kK$-NN algorithm yields superior balanced accuracy\\ncompared to the established $k$-NN method and also another adaptive $k$-NN\\nalgorithm. This is particularly evident when the number of samples in the\\ntraining data is limited, suggesting that the $kK$-NN is capable of learning\\nmore discriminant functions with less data considering many relevant cases.\",\"PeriodicalId\":501082,\"journal\":{\"name\":\"arXiv - MATH - Information Theory\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator
The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular
methods for nonparametric classification. However, a relevant limitation
concerns the definition of the number of neighbors $k$. This parameter exerts a
direct impact on several properties of the classifier, such as the
bias-variance tradeoff, smoothness of decision boundaries, robustness to noise,
and class imbalance handling. In the present paper, we introduce a new adaptive
$k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at
a sample to adaptively defining the neighborhood size. The rationale is that
points with low curvature could have larger neighborhoods (locally, the tangent
space approximates well the underlying data shape), whereas points with high
curvature could have smaller neighborhoods (locally, the tangent space is a
loose approximation). We estimate the local Gaussian curvature by computing an
approximation to the local shape operator in terms of the local covariance
matrix as well as the local Hessian matrix. Results on many real-world datasets
indicate that the new $kK$-NN algorithm yields superior balanced accuracy
compared to the established $k$-NN method and also another adaptive $k$-NN
algorithm. This is particularly evident when the number of samples in the
training data is limited, suggesting that the $kK$-NN is capable of learning
more discriminant functions with less data considering many relevant cases.