{"title":"基于交叉验证递归特征消除和无监督深度信念网络分类器的基因表达数据特征选择方案","authors":"Nimrita Koul, S. Manvi","doi":"10.1109/ICCCT2.2019.8824943","DOIUrl":null,"url":null,"abstract":"In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.","PeriodicalId":445544,"journal":{"name":"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier\",\"authors\":\"Nimrita Koul, S. Manvi\",\"doi\":\"10.1109/ICCCT2.2019.8824943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.\",\"PeriodicalId\":445544,\"journal\":{\"name\":\"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCT2.2019.8824943\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCT2.2019.8824943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier
In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.