{"title":"面向软件缺陷预测的集成特征选择技术比较研究","authors":"Huanjing Wang, T. Khoshgoftaar, Amri Napolitano","doi":"10.1109/ICMLA.2010.27","DOIUrl":null,"url":null,"abstract":"Feature selection has become the essential step in many data mining applications. Using a single feature subset selection method may generate local optima. Ensembles of feature selection methods attempt to combine multiple feature selection methods instead of using a single one. We present a comprehensive empirical study examining 17 different ensembles of feature ranking techniques (rankers) including six commonly-used feature ranking techniques, the signal-to-noise filter technique, and 11 threshold-based feature ranking techniques. This study utilized 16 real-world software measurement data sets of different sizes and built 13,600 classification models. Experimental results indicate that ensembles of very few rankers are very effective and even better than ensembles of many or all rankers.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"30 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"106","resultStr":"{\"title\":\"A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction\",\"authors\":\"Huanjing Wang, T. Khoshgoftaar, Amri Napolitano\",\"doi\":\"10.1109/ICMLA.2010.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection has become the essential step in many data mining applications. Using a single feature subset selection method may generate local optima. Ensembles of feature selection methods attempt to combine multiple feature selection methods instead of using a single one. We present a comprehensive empirical study examining 17 different ensembles of feature ranking techniques (rankers) including six commonly-used feature ranking techniques, the signal-to-noise filter technique, and 11 threshold-based feature ranking techniques. This study utilized 16 real-world software measurement data sets of different sizes and built 13,600 classification models. Experimental results indicate that ensembles of very few rankers are very effective and even better than ensembles of many or all rankers.\",\"PeriodicalId\":336514,\"journal\":{\"name\":\"2010 Ninth International Conference on Machine Learning and Applications\",\"volume\":\"30 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"106\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Ninth International Conference on Machine Learning and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2010.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Ninth International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2010.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction
Feature selection has become the essential step in many data mining applications. Using a single feature subset selection method may generate local optima. Ensembles of feature selection methods attempt to combine multiple feature selection methods instead of using a single one. We present a comprehensive empirical study examining 17 different ensembles of feature ranking techniques (rankers) including six commonly-used feature ranking techniques, the signal-to-noise filter technique, and 11 threshold-based feature ranking techniques. This study utilized 16 real-world software measurement data sets of different sizes and built 13,600 classification models. Experimental results indicate that ensembles of very few rankers are very effective and even better than ensembles of many or all rankers.