{"title":"Fast and Accurate k-Nearest Neighbor Classification Using Prototype Selection by Clustering","authors":"Stefanos Ougiaroglou, Georgios Evangelidis","doi":"10.1109/PCi.2012.69","DOIUrl":null,"url":null,"abstract":"Data reduction is very important especially when using the k-NN Classifier on large datasets. Many prototype selection and generation Algorithms have been proposed aiming to condense the initial training data as much as possible and keep the classification accuracy at a high level. The Prototype Selection by Clustering (PSC) algorithm is one of them and is based on a cluster generation procedure. Contrary to many other prototype selection and generation algorithms, its main goal is the fast execution of the data reduction procedure rather than high reduction rate. In this paper, we demonstrate that the reduction rate and the classification accuracy of PSC can be improved by generating a larger number of clusters. Moreover, we compare the performance of the particular algorithm with two state-of-the-art algorithms, one selection and one generation, using six real life datasets. The experimental results indicate that the classification performance of the Prototype Selection by Clustering algorithm is comparable with that of its competitors when using many clusters.","PeriodicalId":131195,"journal":{"name":"2012 16th Panhellenic Conference on Informatics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 16th Panhellenic Conference on Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCi.2012.69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Data reduction is very important especially when using the k-NN Classifier on large datasets. Many prototype selection and generation Algorithms have been proposed aiming to condense the initial training data as much as possible and keep the classification accuracy at a high level. The Prototype Selection by Clustering (PSC) algorithm is one of them and is based on a cluster generation procedure. Contrary to many other prototype selection and generation algorithms, its main goal is the fast execution of the data reduction procedure rather than high reduction rate. In this paper, we demonstrate that the reduction rate and the classification accuracy of PSC can be improved by generating a larger number of clusters. Moreover, we compare the performance of the particular algorithm with two state-of-the-art algorithms, one selection and one generation, using six real life datasets. The experimental results indicate that the classification performance of the Prototype Selection by Clustering algorithm is comparable with that of its competitors when using many clusters.