{"title":"The Impact of Under-sampling on the Performance of Bootstrap-based Ensemble Feature Selection","authors":"Huseyin Guney, H. Oztoprak","doi":"10.1109/SIU.2018.8404342","DOIUrl":null,"url":null,"abstract":"DNA Microarrays are promising tool for cancer diagnosis and prognosis. DNA Microarrays are high-dimensional and gene selection is a difficult task. However, Bootstrap-based ensemble feature selection (Bagging) recently becomes popular and shows significant improvements in the field. This method aims to generate several slightly different sampled datasets, using bootstrap resampling, from training dataset. Afterwards, it aggregates all ranked feature lists, generated from sampled datasets, to obtain final (ensemble) feature list. Performance of bagging is proportional to diversity of generated sampled datasets. Therefore, it is proposed to use under-sampling of training set instead of using entire training set for bootstrap resampling to improve classification performance and gene selection stability. The proposed method was evaluated using support vector machine (SVM) as the classifier and recursive feature elimination (SVM-RFE) as the feature selection technique. Four microarray datasets were used for evaluation of the proposed method. The results show that 50% under-sampling approach have similar classification performance and outperforms conventional approach in terms of gene selection stability. In addition, 50% under-sampling uses only half of the samples of training dataset at each run of ensemble method so it has less computational cost.","PeriodicalId":409299,"journal":{"name":"Signal Processing and Communications Applications Conference","volume":"297 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing and Communications Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2018.8404342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
DNA Microarrays are promising tool for cancer diagnosis and prognosis. DNA Microarrays are high-dimensional and gene selection is a difficult task. However, Bootstrap-based ensemble feature selection (Bagging) recently becomes popular and shows significant improvements in the field. This method aims to generate several slightly different sampled datasets, using bootstrap resampling, from training dataset. Afterwards, it aggregates all ranked feature lists, generated from sampled datasets, to obtain final (ensemble) feature list. Performance of bagging is proportional to diversity of generated sampled datasets. Therefore, it is proposed to use under-sampling of training set instead of using entire training set for bootstrap resampling to improve classification performance and gene selection stability. The proposed method was evaluated using support vector machine (SVM) as the classifier and recursive feature elimination (SVM-RFE) as the feature selection technique. Four microarray datasets were used for evaluation of the proposed method. The results show that 50% under-sampling approach have similar classification performance and outperforms conventional approach in terms of gene selection stability. In addition, 50% under-sampling uses only half of the samples of training dataset at each run of ensemble method so it has less computational cost.