George D. C. Cavalcanti, Ing Ren Tsang, Breno A. Vale
{"title":"Data Complexity Measures and Nearest Neighbor Classifiers: A Practical Analysis for Meta-learning","authors":"George D. C. Cavalcanti, Ing Ren Tsang, Breno A. Vale","doi":"10.1109/ICTAI.2012.150","DOIUrl":null,"url":null,"abstract":"The classifier accuracy is affected by the properties of the data sets used to train it. Nearest neighbor classifiers are known for being simple and accurate in several domains, but their behavior is strongly dependent on data complexity. On the other hand, there are data complexity measures which aim to describe properties of the data sets. This work aims to show how data complexity measures can be efficiently used to predict the behavior of the Nearest Neighbor classifier. Seven data complexity measures and seventeen real datasets are used in the experimental study. Each data complexity measure is analyzed individually in order to find a relationship between its value and the accuracy of the classifier on a given dataset. No single measure used is good enough to predict the behavior of the Nearest Neighbor classifier. However, the combination of these measures provides a powerful tool to predict the accuracy of the Nearest Neighbor classifier.","PeriodicalId":155588,"journal":{"name":"2012 IEEE 24th International Conference on Tools with Artificial Intelligence","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 24th International Conference on Tools with Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2012.150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
The classifier accuracy is affected by the properties of the data sets used to train it. Nearest neighbor classifiers are known for being simple and accurate in several domains, but their behavior is strongly dependent on data complexity. On the other hand, there are data complexity measures which aim to describe properties of the data sets. This work aims to show how data complexity measures can be efficiently used to predict the behavior of the Nearest Neighbor classifier. Seven data complexity measures and seventeen real datasets are used in the experimental study. Each data complexity measure is analyzed individually in order to find a relationship between its value and the accuracy of the classifier on a given dataset. No single measure used is good enough to predict the behavior of the Nearest Neighbor classifier. However, the combination of these measures provides a powerful tool to predict the accuracy of the Nearest Neighbor classifier.