Unified Strategy for Feature Selection and Data Imputation

C. V. Bratu, R. Potolea
{"title":"Unified Strategy for Feature Selection and Data Imputation","authors":"C. V. Bratu, R. Potolea","doi":"10.1109/SYNASC.2009.53","DOIUrl":null,"url":null,"abstract":"Data-related issues represent the main causes for insufficient performance in data mining. Existing strategies for tackling these issues include procedures for handling incomplete data – mandatory in various schemes, and feature selection, both augmenting the learning process. Our previous work on data imputation has shown that a good imputation policy for strongly correlated attributes with the class can improve the learning accuracy. Moreover, feature selection also enhances the performance of an inducer. The focus of the paper is to validate the performance and stability of our combined methodology for pre-processing data. The novelty of the method resides in the combination of feature selection with data imputation, in order to obtain an improved version of the training set. The experimental results have shown that, when mining incomplete data, our combined pre-processing methodology boosts the accuracy of a classifier. Moreover, it is more successful than each of the individual steps it combines,feature selection and imputation, producing better or similar results.","PeriodicalId":286180,"journal":{"name":"2009 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2009.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Data-related issues represent the main causes for insufficient performance in data mining. Existing strategies for tackling these issues include procedures for handling incomplete data – mandatory in various schemes, and feature selection, both augmenting the learning process. Our previous work on data imputation has shown that a good imputation policy for strongly correlated attributes with the class can improve the learning accuracy. Moreover, feature selection also enhances the performance of an inducer. The focus of the paper is to validate the performance and stability of our combined methodology for pre-processing data. The novelty of the method resides in the combination of feature selection with data imputation, in order to obtain an improved version of the training set. The experimental results have shown that, when mining incomplete data, our combined pre-processing methodology boosts the accuracy of a classifier. Moreover, it is more successful than each of the individual steps it combines,feature selection and imputation, producing better or similar results.
特征选择与数据输入的统一策略
数据相关问题是导致数据挖掘性能不足的主要原因。解决这些问题的现有策略包括处理不完整数据的程序(在各种方案中是强制性的)和特征选择,两者都增加了学习过程。我们之前在数据输入方面的工作表明,一个好的与类强相关属性的输入策略可以提高学习的准确性。此外,特征选择还可以提高电感器的性能。本文的重点是验证我们的组合方法预处理数据的性能和稳定性。该方法的新颖之处在于将特征选择与数据输入相结合,以获得训练集的改进版本。实验结果表明,当挖掘不完整数据时,我们的组合预处理方法提高了分类器的准确性。此外,它比它结合的每一个单独的步骤,特征选择和imputation都更成功,产生更好或相似的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信