多支持向量机递归特征消除模型在癌症特征基因选择中的应用

Wenbin Xu, H. Xia, Weiying Zheng
{"title":"多支持向量机递归特征消除模型在癌症特征基因选择中的应用","authors":"Wenbin Xu, H. Xia, Weiying Zheng","doi":"10.3760/CMA.J.ISSN.1673-4181.2019.01.006","DOIUrl":null,"url":null,"abstract":"Objective \nTo analyze the cancergene expression profile data using multi-support vector machine recursive feature elimination algorithm (MSVM-RFE) and calculate the genetic ranking score to obtain the optimal feature gene subset. \n \n \nMethods \nGene expression profiles of bladder cancer, breast cancer, colon cancer and lung cancer were downloaded from GEO (Gene Expression Omnibus) database. The differentially expressed genes were obtained by differential expression analysis. The differential gene expressions were sequenced by MSVM-RFE algorithm and the average test errors of each gene subset were calculated. Then the optimal gene subsetsof four kinds of cancer were obtained according to the minimum average test errors. Based on the datasets of four kinds of cancer characteristic genes before and after screening, linear SVM classifiers were constructed and the classification efficiencies of the optimal feature gene subsets were verified. \n \n \nResults \nUsing the optimal feature gene subsetobtained by MSVM-RFE algorithm, the classification accuracy was improved from (96.77±1.28)% to (99.85±0.46)% for the bladder cancer data, improved from (83.77±4.93)% to (88.30±3.85)% for the breast cancer data, and improved from (72.69±2.41)% to (90.21±3.31)% for the lung cancer data.Besides, theoptimal feature gene subsetkept the classification accuracy of colon cancer classifierat a high level (>99.5%). \n \n \nConclusions \nThe feature gene extraction based on MSVM-RFE algorithm can improve the classification efficiency of cancer. \n \n \nKey words: \nGene expression profile; Recursive feature elimination; Support vector machine; Feature gene","PeriodicalId":61751,"journal":{"name":"国际生物医学工程杂志","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of multiple support vector machine recursive feature elimination model in cancer feature gene selection\",\"authors\":\"Wenbin Xu, H. Xia, Weiying Zheng\",\"doi\":\"10.3760/CMA.J.ISSN.1673-4181.2019.01.006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective \\nTo analyze the cancergene expression profile data using multi-support vector machine recursive feature elimination algorithm (MSVM-RFE) and calculate the genetic ranking score to obtain the optimal feature gene subset. \\n \\n \\nMethods \\nGene expression profiles of bladder cancer, breast cancer, colon cancer and lung cancer were downloaded from GEO (Gene Expression Omnibus) database. The differentially expressed genes were obtained by differential expression analysis. The differential gene expressions were sequenced by MSVM-RFE algorithm and the average test errors of each gene subset were calculated. Then the optimal gene subsetsof four kinds of cancer were obtained according to the minimum average test errors. Based on the datasets of four kinds of cancer characteristic genes before and after screening, linear SVM classifiers were constructed and the classification efficiencies of the optimal feature gene subsets were verified. \\n \\n \\nResults \\nUsing the optimal feature gene subsetobtained by MSVM-RFE algorithm, the classification accuracy was improved from (96.77±1.28)% to (99.85±0.46)% for the bladder cancer data, improved from (83.77±4.93)% to (88.30±3.85)% for the breast cancer data, and improved from (72.69±2.41)% to (90.21±3.31)% for the lung cancer data.Besides, theoptimal feature gene subsetkept the classification accuracy of colon cancer classifierat a high level (>99.5%). \\n \\n \\nConclusions \\nThe feature gene extraction based on MSVM-RFE algorithm can improve the classification efficiency of cancer. \\n \\n \\nKey words: \\nGene expression profile; Recursive feature elimination; Support vector machine; Feature gene\",\"PeriodicalId\":61751,\"journal\":{\"name\":\"国际生物医学工程杂志\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"国际生物医学工程杂志\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://doi.org/10.3760/CMA.J.ISSN.1673-4181.2019.01.006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"国际生物医学工程杂志","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.3760/CMA.J.ISSN.1673-4181.2019.01.006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的利用多支持向量机递归特征消除算法(MSVM-RFE)对肿瘤基因表达谱数据进行分析,计算遗传排序得分,获得最优特征基因子集。方法从GEO (Gene expression Omnibus)数据库下载膀胱癌、乳腺癌、结肠癌和肺癌的基因表达谱。差异表达基因通过差异表达分析得到。采用MSVM-RFE算法对差异基因表达进行测序,计算每个基因子集的平均检测误差。然后根据最小平均检测误差得到四种癌症的最优基因亚群。基于筛选前后四种癌症特征基因的数据集,构建线性支持向量机分类器,验证最优特征基因子集的分类效率。结果利用MSVM-RFE算法获得的最优特征基因亚群,膀胱癌数据的分类准确率从(96.77±1.28)%提高到(99.85±0.46)%,乳腺癌数据的分类准确率从(83.77±4.93)%提高到(88.30±3.85)%,肺癌数据的分类准确率从(72.69±2.41)%提高到(90.21±3.31)%。此外,最优特征基因亚群使结肠癌分类器的分类准确率保持在较高水平(约99.5%)。结论基于MSVM-RFE算法的特征基因提取可以提高肿瘤的分类效率。关键词:基因表达谱;递归特征消除;支持向量机;功能基因
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application of multiple support vector machine recursive feature elimination model in cancer feature gene selection
Objective To analyze the cancergene expression profile data using multi-support vector machine recursive feature elimination algorithm (MSVM-RFE) and calculate the genetic ranking score to obtain the optimal feature gene subset. Methods Gene expression profiles of bladder cancer, breast cancer, colon cancer and lung cancer were downloaded from GEO (Gene Expression Omnibus) database. The differentially expressed genes were obtained by differential expression analysis. The differential gene expressions were sequenced by MSVM-RFE algorithm and the average test errors of each gene subset were calculated. Then the optimal gene subsetsof four kinds of cancer were obtained according to the minimum average test errors. Based on the datasets of four kinds of cancer characteristic genes before and after screening, linear SVM classifiers were constructed and the classification efficiencies of the optimal feature gene subsets were verified. Results Using the optimal feature gene subsetobtained by MSVM-RFE algorithm, the classification accuracy was improved from (96.77±1.28)% to (99.85±0.46)% for the bladder cancer data, improved from (83.77±4.93)% to (88.30±3.85)% for the breast cancer data, and improved from (72.69±2.41)% to (90.21±3.31)% for the lung cancer data.Besides, theoptimal feature gene subsetkept the classification accuracy of colon cancer classifierat a high level (>99.5%). Conclusions The feature gene extraction based on MSVM-RFE algorithm can improve the classification efficiency of cancer. Key words: Gene expression profile; Recursive feature elimination; Support vector machine; Feature gene
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
1974
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信