{"title":"基于基因表达数据的疾病分类模型与算法研究","authors":"Yue Li, Changyin Zhou","doi":"10.1109/ICDSBA48748.2019.00055","DOIUrl":null,"url":null,"abstract":"High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combination model of weak classifier is constructed by boosting method and the feature subset is selected by removing the zero-weight feature genes in the boosting method. Then, three classification methods, boosting method, SVM and K-nearest neighbor are integrated to learn in order to improve the accuracy of the classification model. Finally, the classification model of ensemble learning is applied in colon cancer dataset. Rather than a single classification model, ensemble method can reduce dimension of data and obtain higher accuracy shown by the experimental results.","PeriodicalId":382429,"journal":{"name":"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Disease Classification Model and Algorithms Based on Gene Expression Data\",\"authors\":\"Yue Li, Changyin Zhou\",\"doi\":\"10.1109/ICDSBA48748.2019.00055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combination model of weak classifier is constructed by boosting method and the feature subset is selected by removing the zero-weight feature genes in the boosting method. Then, three classification methods, boosting method, SVM and K-nearest neighbor are integrated to learn in order to improve the accuracy of the classification model. Finally, the classification model of ensemble learning is applied in colon cancer dataset. Rather than a single classification model, ensemble method can reduce dimension of data and obtain higher accuracy shown by the experimental results.\",\"PeriodicalId\":382429,\"journal\":{\"name\":\"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDSBA48748.2019.00055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Data Science and Business Analytics (ICDSBA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSBA48748.2019.00055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on Disease Classification Model and Algorithms Based on Gene Expression Data
High dimension, small sample size of gene expression data lead a great deal of difficulty to disease classification, in-depth model and algorithm research is carried out to solve this problem. Firstly, a linear combination model of weak classifier is constructed by boosting method and the feature subset is selected by removing the zero-weight feature genes in the boosting method. Then, three classification methods, boosting method, SVM and K-nearest neighbor are integrated to learn in order to improve the accuracy of the classification model. Finally, the classification model of ensemble learning is applied in colon cancer dataset. Rather than a single classification model, ensemble method can reduce dimension of data and obtain higher accuracy shown by the experimental results.