C. H. Yamamoto, Maria Cristina Ferreira de Oliveira, M. L. Fujimoto, S. O. Rezende
{"title":"基于概率模型和肽特征的支持向量机分类,用于改进霰弹枪蛋白质组学的肽识别","authors":"C. H. Yamamoto, Maria Cristina Ferreira de Oliveira, M. L. Fujimoto, S. O. Rezende","doi":"10.1109/ICMLA.2007.17","DOIUrl":null,"url":null,"abstract":"Mass spectrometry (MS)-based proteomics is a powerful and popular high-throughput process for characterizing the global protein content of a sample. In shotgun proteomics, typically proteins are digested into fragments (peptides) prior to mass analysis, and the presence of a protein in inferred from the identification of its constituent peptides. Thus, accurate proteome characterization is dependent upon the accuracy of this peptide identification step. Database search routines generate predicted spectra for all peptides derived from the known genome information, and thus, identify a peptide by 'matching' an experimental to a predicted spectrum. However, due to many problems, such as incomplete fragmentation, this process results in a large number of false positives. We present a new scoring algorithm that integrates probabilistic database scoring metrics (from the MSPolygraph program) with physico-chemical properties in a support vector machine (SVM). We demonstrate that this peptide identification classifier SVM (PICS) score is not only more accurate than the single best database scoring metric, but is also significantly more accurate than models derived using a linear discriminant analysis, decision tree, or artificial neural network.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"81 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics\",\"authors\":\"C. H. Yamamoto, Maria Cristina Ferreira de Oliveira, M. L. Fujimoto, S. O. Rezende\",\"doi\":\"10.1109/ICMLA.2007.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mass spectrometry (MS)-based proteomics is a powerful and popular high-throughput process for characterizing the global protein content of a sample. In shotgun proteomics, typically proteins are digested into fragments (peptides) prior to mass analysis, and the presence of a protein in inferred from the identification of its constituent peptides. Thus, accurate proteome characterization is dependent upon the accuracy of this peptide identification step. Database search routines generate predicted spectra for all peptides derived from the known genome information, and thus, identify a peptide by 'matching' an experimental to a predicted spectrum. However, due to many problems, such as incomplete fragmentation, this process results in a large number of false positives. We present a new scoring algorithm that integrates probabilistic database scoring metrics (from the MSPolygraph program) with physico-chemical properties in a support vector machine (SVM). We demonstrate that this peptide identification classifier SVM (PICS) score is not only more accurate than the single best database scoring metric, but is also significantly more accurate than models derived using a linear discriminant analysis, decision tree, or artificial neural network.\",\"PeriodicalId\":448863,\"journal\":{\"name\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"volume\":\"81 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2007.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics
Mass spectrometry (MS)-based proteomics is a powerful and popular high-throughput process for characterizing the global protein content of a sample. In shotgun proteomics, typically proteins are digested into fragments (peptides) prior to mass analysis, and the presence of a protein in inferred from the identification of its constituent peptides. Thus, accurate proteome characterization is dependent upon the accuracy of this peptide identification step. Database search routines generate predicted spectra for all peptides derived from the known genome information, and thus, identify a peptide by 'matching' an experimental to a predicted spectrum. However, due to many problems, such as incomplete fragmentation, this process results in a large number of false positives. We present a new scoring algorithm that integrates probabilistic database scoring metrics (from the MSPolygraph program) with physico-chemical properties in a support vector machine (SVM). We demonstrate that this peptide identification classifier SVM (PICS) score is not only more accurate than the single best database scoring metric, but is also significantly more accurate than models derived using a linear discriminant analysis, decision tree, or artificial neural network.