基于基因表达数据的疾病分类模型与算法研究

Yue Li, Changyin Zhou
{"title":"基于基因表达数据的疾病分类模型与算法研究","authors":"Yue Li, Changyin Zhou","doi":"10.1109/ICDSBA48748.2019.00055","DOIUrl":null,"url":null,"abstract":"High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combination model of weak classifier is constructed by boosting method and the feature subset is selected by removing the zero-weight feature genes in the boosting method. Then, three classification methods, boosting method, SVM and K-nearest neighbor are integrated to learn in order to improve the accuracy of the classification model. Finally, the classification model of ensemble learning is applied in colon cancer dataset. Rather than a single classification model, ensemble method can reduce dimension of data and obtain higher accuracy shown by the experimental results.","PeriodicalId":382429,"journal":{"name":"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Disease Classification Model and Algorithms Based on Gene Expression Data\",\"authors\":\"Yue Li, Changyin Zhou\",\"doi\":\"10.1109/ICDSBA48748.2019.00055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combination model of weak classifier is constructed by boosting method and the feature subset is selected by removing the zero-weight feature genes in the boosting method. Then, three classification methods, boosting method, SVM and K-nearest neighbor are integrated to learn in order to improve the accuracy of the classification model. Finally, the classification model of ensemble learning is applied in colon cancer dataset. Rather than a single classification model, ensemble method can reduce dimension of data and obtain higher accuracy shown by the experimental results.\",\"PeriodicalId\":382429,\"journal\":{\"name\":\"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDSBA48748.2019.00055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSBA48748.2019.00055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基因表达数据的高维数、小样本量给疾病分类带来了很大的困难,针对这一问题进行了深入的模型和算法研究。首先,利用增强方法构建弱分类器的线性组合模型,通过去除增强方法中的零权特征基因来选择特征子集;然后,结合boosting法、SVM和k近邻三种分类方法进行学习,以提高分类模型的准确率。最后,将集成学习的分类模型应用于结肠癌数据集。实验结果表明,与单一的分类模型相比,集成方法可以降低数据的维数,获得更高的分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on Disease Classification Model and Algorithms Based on Gene Expression Data
High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combination model of weak classifier is constructed by boosting method and the feature subset is selected by removing the zero-weight feature genes in the boosting method. Then, three classification methods, boosting method, SVM and K-nearest neighbor are integrated to learn in order to improve the accuracy of the classification model. Finally, the classification model of ensemble learning is applied in colon cancer dataset. Rather than a single classification model, ensemble method can reduce dimension of data and obtain higher accuracy shown by the experimental results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信