Robust Feature Selection Technique using Rank Aggregation.

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Artificial Intelligence Pub Date : 2014-01-01 DOI:10.1080/08839514.2014.883903

Chandrima Sarkar, Sarah Cooley, Jaideep Srivastava

{"title":"Robust Feature Selection Technique using Rank Aggregation.","authors":"Chandrima Sarkar, Sarah Cooley, Jaideep Srivastava","doi":"10.1080/08839514.2014.883903","DOIUrl":null,"url":null,"abstract":"<p><p>Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.</p>","PeriodicalId":8260,"journal":{"name":"Applied Artificial Intelligence","volume":"28 3","pages":"243-257"},"PeriodicalIF":2.9000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08839514.2014.883903","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/08839514.2014.883903","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 42

Abstract

Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.

Abstract Image

查看原文本刊更多论文

基于秩聚合的鲁棒特征选择技术。

虽然特征选择是一个非常发达的研究领域，但仍然需要开发使分类器更有效的方法。一个重要的挑战是缺乏一种通用的特征选择技术，这种技术可以在所有类型的分类器上产生相似的结果。这是因为所有的特征选择技术都有各自的统计偏差，而分类器利用数据的不同统计属性进行评估。在许多情况下，这可能会使研究人员陷入困境，即从大量的选择中选择哪种特征选择方法和分类器。在本文中，我们提出了一种聚合各种特征选择方法的共识属性以开发更优解的技术。我们技术的集成特性使其在各种分类器上更加健壮。换句话说，它可以稳定地在各种分类器中实现相似的、理想的更高的分类精度。我们用一种称为稳健性指数(RI)的度量来量化稳健性的概念。我们在8个不同维度的数据集上对我们的技术进行了广泛的实证评估，包括心律失常、肺癌、马德龙、mft -fourier、互联网广告、白血病-3c和胚胎肿瘤，以及一个现实世界的数据集，即急性髓性白血病(AML)。我们不仅证明了我们的算法更具鲁棒性，而且与其他技术相比，我们的算法在广泛的分类器上将分类精度提高了大约3-4%(在少于500个特征的数据集上)和超过5%(在超过500个特征的数据集上)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

5.20

自引率

3.60%

发文量

106

审稿时长

6 months

期刊介绍： Applied Artificial Intelligence addresses concerns in applied research and applications of artificial intelligence (AI). The journal also acts as a medium for exchanging ideas and thoughts about impacts of AI research. Articles highlight advances in uses of AI systems for solving tasks in management, industry, engineering, administration, and education; evaluations of existing AI systems and tools, emphasizing comparative studies and user experiences; and the economic, social, and cultural impacts of AI. Papers on key applications, highlighting methods, time schedules, person-months needed, and other relevant material are welcome.