{"title":"缺失值输入对基于遗传规划的多特征分类构建的影响","authors":"Cao Truong Tran, Peter M. Andreae, Mengjie Zhang","doi":"10.1109/CEC.2015.7257182","DOIUrl":null,"url":null,"abstract":"Missing values are a common problem in many real world databases. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. Genetic programming-based multiple feature construction (GPMFC) is a filter approach to multiple feature construction for classifiers using Genetic programming. The GPMFC algorithm has been demonstrated to improve classification performance in decision tree and rule-based classifiers for complete data, but it has not been tested on imputed data. This paper studies the effect of GPMFC on classification accuracy with imputed data and how the choice of different imputation methods (mean imputation, hot deck imputation, Knn imputation, EM imputation and MICE imputation) affects classifiers using constructed features. Results show that GPMFC improves classification performance for datasets with a small amount of missing values. The combination of GPMFC and MICE imputation, in most cases, enhances classification performance for datasets with varying amounts of missing values and obtains the best classification accuracy.","PeriodicalId":403666,"journal":{"name":"2015 IEEE Congress on Evolutionary Computation (CEC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Impact of imputation of missing values on genetic programming based multiple feature construction for classification\",\"authors\":\"Cao Truong Tran, Peter M. Andreae, Mengjie Zhang\",\"doi\":\"10.1109/CEC.2015.7257182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing values are a common problem in many real world databases. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. Genetic programming-based multiple feature construction (GPMFC) is a filter approach to multiple feature construction for classifiers using Genetic programming. The GPMFC algorithm has been demonstrated to improve classification performance in decision tree and rule-based classifiers for complete data, but it has not been tested on imputed data. This paper studies the effect of GPMFC on classification accuracy with imputed data and how the choice of different imputation methods (mean imputation, hot deck imputation, Knn imputation, EM imputation and MICE imputation) affects classifiers using constructed features. Results show that GPMFC improves classification performance for datasets with a small amount of missing values. The combination of GPMFC and MICE imputation, in most cases, enhances classification performance for datasets with varying amounts of missing values and obtains the best classification accuracy.\",\"PeriodicalId\":403666,\"journal\":{\"name\":\"2015 IEEE Congress on Evolutionary Computation (CEC)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Congress on Evolutionary Computation (CEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEC.2015.7257182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2015.7257182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Impact of imputation of missing values on genetic programming based multiple feature construction for classification
Missing values are a common problem in many real world databases. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. Genetic programming-based multiple feature construction (GPMFC) is a filter approach to multiple feature construction for classifiers using Genetic programming. The GPMFC algorithm has been demonstrated to improve classification performance in decision tree and rule-based classifiers for complete data, but it has not been tested on imputed data. This paper studies the effect of GPMFC on classification accuracy with imputed data and how the choice of different imputation methods (mean imputation, hot deck imputation, Knn imputation, EM imputation and MICE imputation) affects classifiers using constructed features. Results show that GPMFC improves classification performance for datasets with a small amount of missing values. The combination of GPMFC and MICE imputation, in most cases, enhances classification performance for datasets with varying amounts of missing values and obtains the best classification accuracy.