{"title":"Personalized PageRank Based Feature Selection for High-dimension Data","authors":"Zhibo Zhu, Qinke Peng, Xinyu Guan","doi":"10.1109/KSE.2019.8919274","DOIUrl":null,"url":null,"abstract":"Feature selection is critical of data mining applications, especially for extracting valuable information from high-dimension data. It not only improves the performance of learning models, but also enhances the interpretability and generality of knowledge. In this paper, we propose a feature selection method based on the personalized PageRank. Derived from mutual information, a non-symmetrical metric is used to build a feature redundancy network firstly, in which nodes are features and directed edges represent the redundancy relation between features. Then, we compute the personalized PageRank on the network and assign a score for each feature as the redundancy measure given a specific feature subset. Finally, this redundancy integrates into the generalized MRMR framework to achieve the feature selection task. Due to the global characteristics of network and PageRank, our method can provide a better measure of the high-order relationship between the candidate feature and the subset of selected features. Extensive experiments conducted on five microarray datasets verify the effectiveness of the proposed method which outperforming popular benchmarks.","PeriodicalId":439841,"journal":{"name":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2019.8919274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Feature selection is critical of data mining applications, especially for extracting valuable information from high-dimension data. It not only improves the performance of learning models, but also enhances the interpretability and generality of knowledge. In this paper, we propose a feature selection method based on the personalized PageRank. Derived from mutual information, a non-symmetrical metric is used to build a feature redundancy network firstly, in which nodes are features and directed edges represent the redundancy relation between features. Then, we compute the personalized PageRank on the network and assign a score for each feature as the redundancy measure given a specific feature subset. Finally, this redundancy integrates into the generalized MRMR framework to achieve the feature selection task. Due to the global characteristics of network and PageRank, our method can provide a better measure of the high-order relationship between the candidate feature and the subset of selected features. Extensive experiments conducted on five microarray datasets verify the effectiveness of the proposed method which outperforming popular benchmarks.