E. Marchiori, N. Heegaard, C. Jiménez, Mikkel West-Nielsen
{"title":"混合质量蛋白质组数据分类的特征选择","authors":"E. Marchiori, N. Heegaard, C. Jiménez, Mikkel West-Nielsen","doi":"10.1109/CIBCB.2005.1594944","DOIUrl":null,"url":null,"abstract":"In this paper we assess experimentally the performance of two state-of-the-art feature selection methods, called RFE and RELIEF, when used for classifying pattern proteomic samples of mixed quality. The data are generated by spiking human sera to artificially create differentiable sample groups, and by handling samples at different storage temperature. We consider two type of classifiers: support vector machines (SVM) and k-nearest neighbour (kNN). Results of leave-one-out cross validation (LOOCV) experiments indicate that RELIEF selects more stable feature subsets than RFE over the runs, where the selected features are mainly spiked ones. However, RFE outperforms RELIEF in terms of (average LOOCV) accuracy, both when combined with SVM and kNN. Perfect LOOCV accuracy is obtained by RFE combined with 1NN. Almost all the samples that are wrongly classified by the algorithms have high storage temperature. The results of experiments on this data indicate that when samples of mixed quality are analyzed computationally, feature selection of only relevant (spiked) features does not necessarily correspond to highest accuracy of classification.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Feature Selection for Classification with Proteomic Data of Mixed Quality\",\"authors\":\"E. Marchiori, N. Heegaard, C. Jiménez, Mikkel West-Nielsen\",\"doi\":\"10.1109/CIBCB.2005.1594944\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we assess experimentally the performance of two state-of-the-art feature selection methods, called RFE and RELIEF, when used for classifying pattern proteomic samples of mixed quality. The data are generated by spiking human sera to artificially create differentiable sample groups, and by handling samples at different storage temperature. We consider two type of classifiers: support vector machines (SVM) and k-nearest neighbour (kNN). Results of leave-one-out cross validation (LOOCV) experiments indicate that RELIEF selects more stable feature subsets than RFE over the runs, where the selected features are mainly spiked ones. However, RFE outperforms RELIEF in terms of (average LOOCV) accuracy, both when combined with SVM and kNN. Perfect LOOCV accuracy is obtained by RFE combined with 1NN. Almost all the samples that are wrongly classified by the algorithms have high storage temperature. The results of experiments on this data indicate that when samples of mixed quality are analyzed computationally, feature selection of only relevant (spiked) features does not necessarily correspond to highest accuracy of classification.\",\"PeriodicalId\":330810,\"journal\":{\"name\":\"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIBCB.2005.1594944\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2005.1594944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature Selection for Classification with Proteomic Data of Mixed Quality
In this paper we assess experimentally the performance of two state-of-the-art feature selection methods, called RFE and RELIEF, when used for classifying pattern proteomic samples of mixed quality. The data are generated by spiking human sera to artificially create differentiable sample groups, and by handling samples at different storage temperature. We consider two type of classifiers: support vector machines (SVM) and k-nearest neighbour (kNN). Results of leave-one-out cross validation (LOOCV) experiments indicate that RELIEF selects more stable feature subsets than RFE over the runs, where the selected features are mainly spiked ones. However, RFE outperforms RELIEF in terms of (average LOOCV) accuracy, both when combined with SVM and kNN. Perfect LOOCV accuracy is obtained by RFE combined with 1NN. Almost all the samples that are wrongly classified by the algorithms have high storage temperature. The results of experiments on this data indicate that when samples of mixed quality are analyzed computationally, feature selection of only relevant (spiked) features does not necessarily correspond to highest accuracy of classification.