{"title":"Modifying one of the Machine Learning Algorithms kNN to Make it Independent of the Parameter k by Re-defining Neighbor","authors":"P. Sinha","doi":"10.5815/ijmsc.2020.04.02","DOIUrl":null,"url":null,"abstract":"When we are given a data set where in based upon the values and or characteristics of attributes each data point is assigned a class, it is known as classification. In machine learning a very simple and powerful tool to do this is the kNearest Neighbor (kNN) algorithm. It is based on the concept that the data points of a particular class are neighbors of each other. For a given test data or an unknown data, to find the class to which it is the neighbor one measures in kNN the Euclidean distances of the test data or the unknown data from all the data points of all the classes in the training data. Then out of the k nearest distances, where k is any number greater than or equal to 1, the class to which the test data or unknown data is the nearest most number of times is the class assigned to the test data or unknown data. In this paper, I propose a variation of kNN, which I call the ANN method (Alternative Nearest Neighbor) to distinguish it from kNN. The striking feature of ANN that makes it different from kNN is its definition of neighbor. In ANN the class from whose data points the maximum Euclidean distance of the unknown data is less than or equal to the maximum Euclidean distance between all the training data points of the class, is the class to which the unknown data is neighbor. It follows, henceforth, naturally that ANN gives a unique solution to each unknown data. Where as , in kNN the solution may vary depending on the value of the number of nearest neighbors k. So, in kNN, as k is varied the performance may vary too. But this is not the case in ANN, its performance for a particular training data is unique. For the training data [1] considered in this paper, the ANN gives 100% accurate result.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmsc.2020.04.02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
When we are given a data set where in based upon the values and or characteristics of attributes each data point is assigned a class, it is known as classification. In machine learning a very simple and powerful tool to do this is the kNearest Neighbor (kNN) algorithm. It is based on the concept that the data points of a particular class are neighbors of each other. For a given test data or an unknown data, to find the class to which it is the neighbor one measures in kNN the Euclidean distances of the test data or the unknown data from all the data points of all the classes in the training data. Then out of the k nearest distances, where k is any number greater than or equal to 1, the class to which the test data or unknown data is the nearest most number of times is the class assigned to the test data or unknown data. In this paper, I propose a variation of kNN, which I call the ANN method (Alternative Nearest Neighbor) to distinguish it from kNN. The striking feature of ANN that makes it different from kNN is its definition of neighbor. In ANN the class from whose data points the maximum Euclidean distance of the unknown data is less than or equal to the maximum Euclidean distance between all the training data points of the class, is the class to which the unknown data is neighbor. It follows, henceforth, naturally that ANN gives a unique solution to each unknown data. Where as , in kNN the solution may vary depending on the value of the number of nearest neighbors k. So, in kNN, as k is varied the performance may vary too. But this is not the case in ANN, its performance for a particular training data is unique. For the training data [1] considered in this paper, the ANN gives 100% accurate result.