Modifying one of the Machine Learning Algorithms kNN to Make it Independent of the Parameter k by Re-defining Neighbor

International Journal of Mathematical Sciences and Computing Pub Date : 2020-08-08 DOI:10.5815/ijmsc.2020.04.02

P. Sinha

{"title":"Modifying one of the Machine Learning Algorithms kNN to Make it Independent of the Parameter k by Re-defining Neighbor","authors":"P. Sinha","doi":"10.5815/ijmsc.2020.04.02","DOIUrl":null,"url":null,"abstract":"When we are given a data set where in based upon the values and or characteristics of attributes each data point is assigned a class, it is known as classification. In machine learning a very simple and powerful tool to do this is the kNearest Neighbor (kNN) algorithm. It is based on the concept that the data points of a particular class are neighbors of each other. For a given test data or an unknown data, to find the class to which it is the neighbor one measures in kNN the Euclidean distances of the test data or the unknown data from all the data points of all the classes in the training data. Then out of the k nearest distances, where k is any number greater than or equal to 1, the class to which the test data or unknown data is the nearest most number of times is the class assigned to the test data or unknown data. In this paper, I propose a variation of kNN, which I call the ANN method (Alternative Nearest Neighbor) to distinguish it from kNN. The striking feature of ANN that makes it different from kNN is its definition of neighbor. In ANN the class from whose data points the maximum Euclidean distance of the unknown data is less than or equal to the maximum Euclidean distance between all the training data points of the class, is the class to which the unknown data is neighbor. It follows, henceforth, naturally that ANN gives a unique solution to each unknown data. Where as , in kNN the solution may vary depending on the value of the number of nearest neighbors k. So, in kNN, as k is varied the performance may vary too. But this is not the case in ANN, its performance for a particular training data is unique. For the training data [1] considered in this paper, the ANN gives 100% accurate result.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmsc.2020.04.02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

When we are given a data set where in based upon the values and or characteristics of attributes each data point is assigned a class, it is known as classification. In machine learning a very simple and powerful tool to do this is the kNearest Neighbor (kNN) algorithm. It is based on the concept that the data points of a particular class are neighbors of each other. For a given test data or an unknown data, to find the class to which it is the neighbor one measures in kNN the Euclidean distances of the test data or the unknown data from all the data points of all the classes in the training data. Then out of the k nearest distances, where k is any number greater than or equal to 1, the class to which the test data or unknown data is the nearest most number of times is the class assigned to the test data or unknown data. In this paper, I propose a variation of kNN, which I call the ANN method (Alternative Nearest Neighbor) to distinguish it from kNN. The striking feature of ANN that makes it different from kNN is its definition of neighbor. In ANN the class from whose data points the maximum Euclidean distance of the unknown data is less than or equal to the maximum Euclidean distance between all the training data points of the class, is the class to which the unknown data is neighbor. It follows, henceforth, naturally that ANN gives a unique solution to each unknown data. Where as , in kNN the solution may vary depending on the value of the number of nearest neighbors k. So, in kNN, as k is varied the performance may vary too. But this is not the case in ANN, its performance for a particular training data is unique. For the training data [1] considered in this paper, the ANN gives 100% accurate result.

查看原文本刊更多论文

修改一种机器学习算法kNN，通过重新定义邻居使其独立于参数k

当我们给定一个数据集，其中基于属性的值和/或特征为每个数据点分配了一个类，这被称为分类。在机器学习中，一个非常简单而强大的工具就是最近邻(kNN)算法。它基于一个特定类的数据点彼此相邻的概念。对于给定的测试数据或未知数据，为了找到与它相邻的类，以kNN为单位测量测试数据或未知数据与训练数据中所有类的所有数据点的欧氏距离。然后，在k个最近的距离中，其中k是大于等于1的任意数字，则与测试数据或未知数据最近次数最多的类就是分配给测试数据或未知数据的类。在本文中，我提出了kNN的一种变体，我称之为ANN方法(Alternative Nearest Neighbor)，以区别于kNN。ANN与kNN的显著区别在于它对邻居的定义。在人工神经网络中，未知数据的最大欧几里德距离小于或等于该类所有训练数据点之间的最大欧几里德距离的类是未知数据的邻居类。因此，自然地，人工神经网络对每个未知数据给出了唯一的解决方案。而在kNN中，解的变化可能取决于最近邻居k的数量。因此，在kNN中，随着k的变化，性能也可能发生变化。但在人工神经网络中不是这样，它对特定训练数据的性能是唯一的。对于本文考虑的训练数据[1]，人工神经网络给出了100%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Mathematical Sciences and Computing

自引率

0.00%

发文量