{"title":"Feature Selection for High-Dimensional Data Through Instance Vote Combining","authors":"Lily Chamakura, G. Saha","doi":"10.1145/3371158.3371177","DOIUrl":null,"url":null,"abstract":"Supervised feature selection (FS) is used to select a discriminative and non-redundant subset of features in classification problems dealing with high dimensional inputs. In this paper, feature selection is posed akin to the set-covering problem where the goal is to select a subset of features such that they cover the instances. To solve this formulation, we quantify the local relevance (i.e., votes assigned by instances) of each feature that captures the extent to which a given feature is useful to classify the individual instances correctly. In this work, we propose to combine the instance votes across features to infer their joint local relevance. The votes are combined on the basis of geometric principles underlying classification and feature spaces. Further, we show how such instance vote combining may be employed to derive a heuristic search strategy for selecting a relevant and non-redundant subset of features. We illustrate the effectiveness of our approach by evaluating the classification performance and robustness to data variations on publicly available benchmark datasets. We observed that the proposed method outperforms state-of-the-art mutual information based FS techniques and performs comparably to other heuristic approaches that solve the set-covering formulation of feature selection.","PeriodicalId":360747,"journal":{"name":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th ACM IKDD CoDS and 25th COMAD","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3371158.3371177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Supervised feature selection (FS) is used to select a discriminative and non-redundant subset of features in classification problems dealing with high dimensional inputs. In this paper, feature selection is posed akin to the set-covering problem where the goal is to select a subset of features such that they cover the instances. To solve this formulation, we quantify the local relevance (i.e., votes assigned by instances) of each feature that captures the extent to which a given feature is useful to classify the individual instances correctly. In this work, we propose to combine the instance votes across features to infer their joint local relevance. The votes are combined on the basis of geometric principles underlying classification and feature spaces. Further, we show how such instance vote combining may be employed to derive a heuristic search strategy for selecting a relevant and non-redundant subset of features. We illustrate the effectiveness of our approach by evaluating the classification performance and robustness to data variations on publicly available benchmark datasets. We observed that the proposed method outperforms state-of-the-art mutual information based FS techniques and performs comparably to other heuristic approaches that solve the set-covering formulation of feature selection.