Unified Strategy for Feature Selection and Data Imputation

2009 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing Pub Date : 2009-09-26 DOI:10.1109/SYNASC.2009.53

C. V. Bratu, R. Potolea

{"title":"Unified Strategy for Feature Selection and Data Imputation","authors":"C. V. Bratu, R. Potolea","doi":"10.1109/SYNASC.2009.53","DOIUrl":null,"url":null,"abstract":"Data-related issues represent the main causes for insufficient performance in data mining. Existing strategies for tackling these issues include procedures for handling incomplete data – mandatory in various schemes, and feature selection, both augmenting the learning process. Our previous work on data imputation has shown that a good imputation policy for strongly correlated attributes with the class can improve the learning accuracy. Moreover, feature selection also enhances the performance of an inducer. The focus of the paper is to validate the performance and stability of our combined methodology for pre-processing data. The novelty of the method resides in the combination of feature selection with data imputation, in order to obtain an improved version of the training set. The experimental results have shown that, when mining incomplete data, our combined pre-processing methodology boosts the accuracy of a classifier. Moreover, it is more successful than each of the individual steps it combines,feature selection and imputation, producing better or similar results.","PeriodicalId":286180,"journal":{"name":"2009 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2009.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Data-related issues represent the main causes for insufficient performance in data mining. Existing strategies for tackling these issues include procedures for handling incomplete data – mandatory in various schemes, and feature selection, both augmenting the learning process. Our previous work on data imputation has shown that a good imputation policy for strongly correlated attributes with the class can improve the learning accuracy. Moreover, feature selection also enhances the performance of an inducer. The focus of the paper is to validate the performance and stability of our combined methodology for pre-processing data. The novelty of the method resides in the combination of feature selection with data imputation, in order to obtain an improved version of the training set. The experimental results have shown that, when mining incomplete data, our combined pre-processing methodology boosts the accuracy of a classifier. Moreover, it is more successful than each of the individual steps it combines,feature selection and imputation, producing better or similar results.

查看原文本刊更多论文

特征选择与数据输入的统一策略

数据相关问题是导致数据挖掘性能不足的主要原因。解决这些问题的现有策略包括处理不完整数据的程序(在各种方案中是强制性的)和特征选择，两者都增加了学习过程。我们之前在数据输入方面的工作表明，一个好的与类强相关属性的输入策略可以提高学习的准确性。此外，特征选择还可以提高电感器的性能。本文的重点是验证我们的组合方法预处理数据的性能和稳定性。该方法的新颖之处在于将特征选择与数据输入相结合，以获得训练集的改进版本。实验结果表明，当挖掘不完整数据时，我们的组合预处理方法提高了分类器的准确性。此外，它比它结合的每一个单独的步骤，特征选择和imputation都更成功，产生更好或相似的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

自引率

0.00%

发文量