Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics

C. H. Yamamoto, Maria Cristina Ferreira de Oliveira, M. L. Fujimoto, S. O. Rezende
{"title":"Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics","authors":"C. H. Yamamoto, Maria Cristina Ferreira de Oliveira, M. L. Fujimoto, S. O. Rezende","doi":"10.1109/ICMLA.2007.17","DOIUrl":null,"url":null,"abstract":"Mass spectrometry (MS)-based proteomics is a powerful and popular high-throughput process for characterizing the global protein content of a sample. In shotgun proteomics, typically proteins are digested into fragments (peptides) prior to mass analysis, and the presence of a protein in inferred from the identification of its constituent peptides. Thus, accurate proteome characterization is dependent upon the accuracy of this peptide identification step. Database search routines generate predicted spectra for all peptides derived from the known genome information, and thus, identify a peptide by 'matching' an experimental to a predicted spectrum. However, due to many problems, such as incomplete fragmentation, this process results in a large number of false positives. We present a new scoring algorithm that integrates probabilistic database scoring metrics (from the MSPolygraph program) with physico-chemical properties in a support vector machine (SVM). We demonstrate that this peptide identification classifier SVM (PICS) score is not only more accurate than the single best database scoring metric, but is also significantly more accurate than models derived using a linear discriminant analysis, decision tree, or artificial neural network.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"81 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Mass spectrometry (MS)-based proteomics is a powerful and popular high-throughput process for characterizing the global protein content of a sample. In shotgun proteomics, typically proteins are digested into fragments (peptides) prior to mass analysis, and the presence of a protein in inferred from the identification of its constituent peptides. Thus, accurate proteome characterization is dependent upon the accuracy of this peptide identification step. Database search routines generate predicted spectra for all peptides derived from the known genome information, and thus, identify a peptide by 'matching' an experimental to a predicted spectrum. However, due to many problems, such as incomplete fragmentation, this process results in a large number of false positives. We present a new scoring algorithm that integrates probabilistic database scoring metrics (from the MSPolygraph program) with physico-chemical properties in a support vector machine (SVM). We demonstrate that this peptide identification classifier SVM (PICS) score is not only more accurate than the single best database scoring metric, but is also significantly more accurate than models derived using a linear discriminant analysis, decision tree, or artificial neural network.
基于概率模型和肽特征的支持向量机分类,用于改进霰弹枪蛋白质组学的肽识别
质谱(MS)为基础的蛋白质组学是一个强大的和流行的高通量的过程,表征样品的整体蛋白质含量。在散弹枪蛋白质组学中,通常蛋白质在质量分析之前被消化成片段(肽),并且从其组成肽的鉴定中推断出蛋白质的存在。因此,准确的蛋白质组表征取决于这一肽鉴定步骤的准确性。数据库搜索程序生成来自已知基因组信息的所有肽的预测光谱,因此,通过“匹配”实验与预测光谱来识别肽。然而,由于碎片化不完全等诸多问题,这一过程会产生大量的误报。我们提出了一种新的评分算法,该算法将概率数据库评分指标(来自MSPolygraph程序)与支持向量机(SVM)的物理化学性质相结合。我们证明,这种肽识别分类器SVM (PICS)评分不仅比单一最佳数据库评分指标更准确,而且比使用线性判别分析,决策树或人工神经网络导出的模型更准确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信