{"title":"一种包含粗糙集和聚类技术的增强特征选择方法","authors":"A. Murugan, T. Sridevi","doi":"10.1109/ICCIC.2014.7238376","DOIUrl":null,"url":null,"abstract":"Feature selection or variable reduction is a fundamental problem in data mining, refers to the process of identifying the few most important features for application of a learning algorithm. The best subset contains the minimum number of dimensions retaining a suitably high accuracy on classifier in representing the original features. The objective of the proposed approach is to reduce the number of input features thus to identify the key features and eliminating irrelevant features with no predictive information using clustering technique, K-nearest neighbors (KNN) and rough set. This paper deals with two partition based clustering algorithm in data mining namely K-Means and Fuzzy C Means (FCM). These two algorithms are implemented for original data set without considering the class labels and further rough set theory implemented on the partitioned data set to generate feature subset after removing the outlier by using KNN. Wisconsin Breast Cancer datasets derived from UCI machine learning database are used for the purpose of testing the proposed hybrid method. The results show that the hybrid method is able to produce more accurate diagnosis and prognosis results than the full input model with respect to the classification accuracy.","PeriodicalId":187874,"journal":{"name":"2014 IEEE International Conference on Computational Intelligence and Computing Research","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An enhanced feature selection method comprising rough set and clustering techniques\",\"authors\":\"A. Murugan, T. Sridevi\",\"doi\":\"10.1109/ICCIC.2014.7238376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection or variable reduction is a fundamental problem in data mining, refers to the process of identifying the few most important features for application of a learning algorithm. The best subset contains the minimum number of dimensions retaining a suitably high accuracy on classifier in representing the original features. The objective of the proposed approach is to reduce the number of input features thus to identify the key features and eliminating irrelevant features with no predictive information using clustering technique, K-nearest neighbors (KNN) and rough set. This paper deals with two partition based clustering algorithm in data mining namely K-Means and Fuzzy C Means (FCM). These two algorithms are implemented for original data set without considering the class labels and further rough set theory implemented on the partitioned data set to generate feature subset after removing the outlier by using KNN. Wisconsin Breast Cancer datasets derived from UCI machine learning database are used for the purpose of testing the proposed hybrid method. The results show that the hybrid method is able to produce more accurate diagnosis and prognosis results than the full input model with respect to the classification accuracy.\",\"PeriodicalId\":187874,\"journal\":{\"name\":\"2014 IEEE International Conference on Computational Intelligence and Computing Research\",\"volume\":\"105 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Computational Intelligence and Computing Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIC.2014.7238376\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Computational Intelligence and Computing Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIC.2014.7238376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An enhanced feature selection method comprising rough set and clustering techniques
Feature selection or variable reduction is a fundamental problem in data mining, refers to the process of identifying the few most important features for application of a learning algorithm. The best subset contains the minimum number of dimensions retaining a suitably high accuracy on classifier in representing the original features. The objective of the proposed approach is to reduce the number of input features thus to identify the key features and eliminating irrelevant features with no predictive information using clustering technique, K-nearest neighbors (KNN) and rough set. This paper deals with two partition based clustering algorithm in data mining namely K-Means and Fuzzy C Means (FCM). These two algorithms are implemented for original data set without considering the class labels and further rough set theory implemented on the partitioned data set to generate feature subset after removing the outlier by using KNN. Wisconsin Breast Cancer datasets derived from UCI machine learning database are used for the purpose of testing the proposed hybrid method. The results show that the hybrid method is able to produce more accurate diagnosis and prognosis results than the full input model with respect to the classification accuracy.