{"title":"Detecting Local Manifold Structure for Unsupervised Feature Selection","authors":"Ding-Cheng FENG , Feng CHEN , Wen-Li XU","doi":"10.1016/S1874-1029(14)60362-1","DOIUrl":null,"url":null,"abstract":"<div><p>Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a locally linear rotation-selection (LLRS) algorithm which extends LLS to identify the optimal coordinate subset from a new space. Experimental results on real-world datasets show that our method can be more effective than Laplacian eigenmap based feature selection methods.</p></div>","PeriodicalId":35798,"journal":{"name":"自动化学报","volume":"40 10","pages":"Pages 2253-2261"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S1874-1029(14)60362-1","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"自动化学报","FirstCategoryId":"1093","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874102914603621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 9
Abstract
Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a locally linear rotation-selection (LLRS) algorithm which extends LLS to identify the optimal coordinate subset from a new space. Experimental results on real-world datasets show that our method can be more effective than Laplacian eigenmap based feature selection methods.
自动化学报Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
4.80
自引率
0.00%
发文量
6655
期刊介绍:
ACTA AUTOMATICA SINICA is a joint publication of Chinese Association of Automation and the Institute of Automation, the Chinese Academy of Sciences. The objective is the high quality and rapid publication of the articles, with a strong focus on new trends, original theoretical and experimental research and developments, emerging technology, and industrial standards in automation.