{"title":"基于PCA的序列特征空间学习基因选择","authors":"Jinglin Yang, Han-Xiong Li","doi":"10.1109/ICMLC.2010.5580720","DOIUrl":null,"url":null,"abstract":"The expression of genes could be used for tumor subtype classification, clinical diagnosis and prognosis outcome prediction, but the underlying mechanism remains unknown. It is possible for data-based machine learning method to be employed for phenotype classification problem. But high dimensionality and small sample size make many machine learning methods fail. In this research, a PCA based sequential feature space learning method is proposed for gene selection. A two level feature selection process is conducted. In the first level PCA decomposition is conducted to obtain the orthogonal axis, and then features are projected and evaluated on the orthogonal axis. In second level, the features that have large projections are selected to form the feature space. Then the projections of all features onto the feature space are evaluated. Only features that have large projections both on orthogonal axis and feature subspace are selected as the feature subset. Then a neural network (NN) is employed to learn the classification model. The PCA based feature space learning is processed in a sequential manner until the classification performance is under pre-specified threshold and stable. The proposed methods have been applied to two gene microarray databases and showing good results.","PeriodicalId":126080,"journal":{"name":"2010 International Conference on Machine Learning and Cybernetics","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"PCA based sequential feature space learning for gene selection\",\"authors\":\"Jinglin Yang, Han-Xiong Li\",\"doi\":\"10.1109/ICMLC.2010.5580720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The expression of genes could be used for tumor subtype classification, clinical diagnosis and prognosis outcome prediction, but the underlying mechanism remains unknown. It is possible for data-based machine learning method to be employed for phenotype classification problem. But high dimensionality and small sample size make many machine learning methods fail. In this research, a PCA based sequential feature space learning method is proposed for gene selection. A two level feature selection process is conducted. In the first level PCA decomposition is conducted to obtain the orthogonal axis, and then features are projected and evaluated on the orthogonal axis. In second level, the features that have large projections are selected to form the feature space. Then the projections of all features onto the feature space are evaluated. Only features that have large projections both on orthogonal axis and feature subspace are selected as the feature subset. Then a neural network (NN) is employed to learn the classification model. The PCA based feature space learning is processed in a sequential manner until the classification performance is under pre-specified threshold and stable. The proposed methods have been applied to two gene microarray databases and showing good results.\",\"PeriodicalId\":126080,\"journal\":{\"name\":\"2010 International Conference on Machine Learning and Cybernetics\",\"volume\":\"121 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 International Conference on Machine Learning and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC.2010.5580720\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2010.5580720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
PCA based sequential feature space learning for gene selection
The expression of genes could be used for tumor subtype classification, clinical diagnosis and prognosis outcome prediction, but the underlying mechanism remains unknown. It is possible for data-based machine learning method to be employed for phenotype classification problem. But high dimensionality and small sample size make many machine learning methods fail. In this research, a PCA based sequential feature space learning method is proposed for gene selection. A two level feature selection process is conducted. In the first level PCA decomposition is conducted to obtain the orthogonal axis, and then features are projected and evaluated on the orthogonal axis. In second level, the features that have large projections are selected to form the feature space. Then the projections of all features onto the feature space are evaluated. Only features that have large projections both on orthogonal axis and feature subspace are selected as the feature subset. Then a neural network (NN) is employed to learn the classification model. The PCA based feature space learning is processed in a sequential manner until the classification performance is under pre-specified threshold and stable. The proposed methods have been applied to two gene microarray databases and showing good results.