利用基于支持向量机的集成特征选择方法对基因表达数据进行分析。

IF 0.9 4区 数学 Q3 Mathematics
Shizhi Zhang, Mingjin Zhang
{"title":"利用基于支持向量机的集成特征选择方法对基因表达数据进行分析。","authors":"Shizhi Zhang,&nbsp;Mingjin Zhang","doi":"10.1515/sagmb-2022-0002","DOIUrl":null,"url":null,"abstract":"<p><p>Gene selection is one of the key steps for gene expression data analysis. An SVM-based ensemble feature selection method is proposed in this paper. Firstly, the method builds many subsets by using Monte Carlo sampling. Secondly, ranking all the features on each of the subsets and integrating them to obtain a final ranking list. Finally, the optimum feature set is determined by a backward feature elimination strategy. This method is applied to the analysis of 4 public datasets: the Leukemia, Prostate, Colorectal, and SMK_CAN, resulting 7, 10, 13, and 32 features. The AUC obtained from independent test sets are 0.9867, 0.9796, 0.9571, and 0.9575, respectively. These results indicate that the features selected by the proposed method can improve sample classification accuracy, and thus be effective for gene selection from gene expression data.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of SVM-based ensemble feature selection method for gene expression data analysis.\",\"authors\":\"Shizhi Zhang,&nbsp;Mingjin Zhang\",\"doi\":\"10.1515/sagmb-2022-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Gene selection is one of the key steps for gene expression data analysis. An SVM-based ensemble feature selection method is proposed in this paper. Firstly, the method builds many subsets by using Monte Carlo sampling. Secondly, ranking all the features on each of the subsets and integrating them to obtain a final ranking list. Finally, the optimum feature set is determined by a backward feature elimination strategy. This method is applied to the analysis of 4 public datasets: the Leukemia, Prostate, Colorectal, and SMK_CAN, resulting 7, 10, 13, and 32 features. The AUC obtained from independent test sets are 0.9867, 0.9796, 0.9571, and 0.9575, respectively. These results indicate that the features selected by the proposed method can improve sample classification accuracy, and thus be effective for gene selection from gene expression data.</p>\",\"PeriodicalId\":49477,\"journal\":{\"name\":\"Statistical Applications in Genetics and Molecular Biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2022-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Applications in Genetics and Molecular Biology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/sagmb-2022-0002\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2022-0002","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

摘要

基因选择是基因表达数据分析的关键步骤之一。提出了一种基于支持向量机的集成特征选择方法。该方法首先利用蒙特卡罗采样方法构建多个子集;其次,对每个子集上的所有特征进行排序并进行积分,得到最终的排序表。最后,通过反向特征消除策略确定最优特征集。该方法应用于白血病、前列腺癌、结肠直肠癌和SMK_CAN 4个公共数据集的分析,得到7个、10个、13个和32个特征。独立测试集的AUC分别为0.9867、0.9796、0.9571和0.9575。这些结果表明,该方法所选择的特征可以提高样本分类精度,从而有效地从基因表达数据中进行基因选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Use of SVM-based ensemble feature selection method for gene expression data analysis.

Gene selection is one of the key steps for gene expression data analysis. An SVM-based ensemble feature selection method is proposed in this paper. Firstly, the method builds many subsets by using Monte Carlo sampling. Secondly, ranking all the features on each of the subsets and integrating them to obtain a final ranking list. Finally, the optimum feature set is determined by a backward feature elimination strategy. This method is applied to the analysis of 4 public datasets: the Leukemia, Prostate, Colorectal, and SMK_CAN, resulting 7, 10, 13, and 32 features. The AUC obtained from independent test sets are 0.9867, 0.9796, 0.9571, and 0.9575, respectively. These results indicate that the features selected by the proposed method can improve sample classification accuracy, and thus be effective for gene selection from gene expression data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.20
自引率
11.10%
发文量
8
审稿时长
6-12 weeks
期刊介绍: Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信