基于交叉验证递归特征消除和无监督深度信念网络分类器的基因表达数据特征选择方案

2019 3rd International Conference on Computing and Communications Technologies (ICCCT) Pub Date : 2019-02-01 DOI:10.1109/ICCCT2.2019.8824943

Nimrita Koul, S. Manvi

{"title":"基于交叉验证递归特征消除和无监督深度信念网络分类器的基因表达数据特征选择方案","authors":"Nimrita Koul, S. Manvi","doi":"10.1109/ICCCT2.2019.8824943","DOIUrl":null,"url":null,"abstract":"In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.","PeriodicalId":445544,"journal":{"name":"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier\",\"authors\":\"Nimrita Koul, S. Manvi\",\"doi\":\"10.1109/ICCCT2.2019.8824943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.\",\"PeriodicalId\":445544,\"journal\":{\"name\":\"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCT2.2019.8824943\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCT2.2019.8824943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在癌症的治疗中，其疗效取决于尽早对肿瘤性质的正确诊断。微阵列基因表达数据包含了整个基因组的表达谱，为癌症生物标志物的鉴定提供了分析来源。微阵列数据具有特征数量多、样本数量少的特点。为了有效地利用这些数据，选择数量较少的基因用于分类等任务是非常有益的。本文提出了一种两级肿瘤特征选择和分类方案。首先，使用递归特征消除法对基因进行排序，递归特征消除法使用随机森林分类器对基因的适应度进行评估，并进行五次交叉验证，然后将这些基因用于预训练无监督深度信念网络分类器，根据所选基因对样本进行分类。我们将该方法得到的交叉验证矩阵参数(分类精度、精密度和召回率)与一些标准的特征选择器-分类器组合(支持向量机互信息、支持向量机核主成分分析、支持向量机-递归特征消除和随机森林分类器互信息)得到的结果进行了比较。结果表明，我们的方案与用于基因表达数据特征选择的标准方法相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier

In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 3rd International Conference on Computing and Communications Technologies (ICCCT)

自引率

0.00%

发文量