Impact of imputation of missing values on genetic programming based multiple feature construction for classification

Cao Truong Tran, Peter M. Andreae, Mengjie Zhang
{"title":"Impact of imputation of missing values on genetic programming based multiple feature construction for classification","authors":"Cao Truong Tran, Peter M. Andreae, Mengjie Zhang","doi":"10.1109/CEC.2015.7257182","DOIUrl":null,"url":null,"abstract":"Missing values are a common problem in many real world databases. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. Genetic programming-based multiple feature construction (GPMFC) is a filter approach to multiple feature construction for classifiers using Genetic programming. The GPMFC algorithm has been demonstrated to improve classification performance in decision tree and rule-based classifiers for complete data, but it has not been tested on imputed data. This paper studies the effect of GPMFC on classification accuracy with imputed data and how the choice of different imputation methods (mean imputation, hot deck imputation, Knn imputation, EM imputation and MICE imputation) affects classifiers using constructed features. Results show that GPMFC improves classification performance for datasets with a small amount of missing values. The combination of GPMFC and MICE imputation, in most cases, enhances classification performance for datasets with varying amounts of missing values and obtains the best classification accuracy.","PeriodicalId":403666,"journal":{"name":"2015 IEEE Congress on Evolutionary Computation (CEC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2015.7257182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Missing values are a common problem in many real world databases. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. Genetic programming-based multiple feature construction (GPMFC) is a filter approach to multiple feature construction for classifiers using Genetic programming. The GPMFC algorithm has been demonstrated to improve classification performance in decision tree and rule-based classifiers for complete data, but it has not been tested on imputed data. This paper studies the effect of GPMFC on classification accuracy with imputed data and how the choice of different imputation methods (mean imputation, hot deck imputation, Knn imputation, EM imputation and MICE imputation) affects classifiers using constructed features. Results show that GPMFC improves classification performance for datasets with a small amount of missing values. The combination of GPMFC and MICE imputation, in most cases, enhances classification performance for datasets with varying amounts of missing values and obtains the best classification accuracy.
缺失值输入对基于遗传规划的多特征分类构建的影响
在现实世界的许多数据库中,值缺失是一个常见的问题。处理这个问题的一种常用方法是使用插值方法用合理的值来填充缺失值。基于遗传规划的多特征构造(GPMFC)是一种基于遗传规划的分类器多特征构造的滤波方法。GPMFC算法已被证明可以提高决策树和基于规则的分类器对完整数据的分类性能,但尚未在输入数据上进行测试。本文研究了GPMFC对输入数据分类精度的影响,以及不同输入方法(mean imputation、hot deck imputation、Knn imputation、EM imputation和MICE imputation)的选择对使用构造特征的分类器的影响。结果表明,GPMFC可以提高缺失值较少的数据集的分类性能。在大多数情况下,GPMFC和MICE imputation相结合可以提高缺失值数量不同的数据集的分类性能,并获得最佳的分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信