一种评估串联质谱分析中肽鉴定可靠性的计算方法

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003 Pub Date : 2003-08-11 DOI:10.1109/CSB.2003.1227353

J. Razumovskaya, V. Olman, Dong Xu, E. Uberbacher, N. Verberkmoes, Ying Xu

{"title":"一种评估串联质谱分析中肽鉴定可靠性的计算方法","authors":"J. Razumovskaya, V. Olman, Dong Xu, E. Uberbacher, N. Verberkmoes, Ying Xu","doi":"10.1109/CSB.2003.1227353","DOIUrl":null,"url":null,"abstract":"High throughput protein identification in mass spectrometry is predominantly achieved by first identifying tryptic peptides using SEQUEST and then by combining the peptide hits for protein identification. Peptide identification is typically carried out by selecting SEQUEST hits above a specified threshold, the value of which is typically chosen empirically in an attempt to separate true identifications from the false ones. These SEQUEST scores are not normalized with respect to the composition, length and other parameters of the peptides. Furthermore, there is no rigorous reliability estimate assigned to the protein identifications derived from these scores. Hence the interpretation of SEQUEST hits generally requires human involvement, making it difficult to scale up the identification process for genome-scale applications. To overcome these limitations, we have developed a method, which combines a neural network and a statistical model, for \"normalizing\" SEQUEST scores, and also for providing a reliability estimate for each SEQUEST hit. This method improves the sensitivity and specificity of peptide identification compared to the standard filtering procedure used in the SEQUEST package, and provides a basis for estimating the reliability of protein identifications.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":"{\"title\":\"A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST\",\"authors\":\"J. Razumovskaya, V. Olman, Dong Xu, E. Uberbacher, N. Verberkmoes, Ying Xu\",\"doi\":\"10.1109/CSB.2003.1227353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High throughput protein identification in mass spectrometry is predominantly achieved by first identifying tryptic peptides using SEQUEST and then by combining the peptide hits for protein identification. Peptide identification is typically carried out by selecting SEQUEST hits above a specified threshold, the value of which is typically chosen empirically in an attempt to separate true identifications from the false ones. These SEQUEST scores are not normalized with respect to the composition, length and other parameters of the peptides. Furthermore, there is no rigorous reliability estimate assigned to the protein identifications derived from these scores. Hence the interpretation of SEQUEST hits generally requires human involvement, making it difficult to scale up the identification process for genome-scale applications. To overcome these limitations, we have developed a method, which combines a neural network and a statistical model, for \\\"normalizing\\\" SEQUEST scores, and also for providing a reliability estimate for each SEQUEST hit. This method improves the sensitivity and specificity of peptide identification compared to the standard filtering procedure used in the SEQUEST package, and provides a basis for estimating the reliability of protein identifications.\",\"PeriodicalId\":147883,\"journal\":{\"name\":\"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSB.2003.1227353\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSB.2003.1227353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 56

摘要

质谱中的高通量蛋白质鉴定主要是通过首先使用SEQUEST鉴定色氨酸，然后结合肽命中进行蛋白质鉴定来实现的。肽鉴定通常是通过选择超过指定阈值的SEQUEST命中值来进行的，该值通常是根据经验选择的，试图将真实的鉴定与错误的鉴定分开。这些SEQUEST分数并没有根据肽的组成、长度和其他参数进行标准化。此外，没有严格的可靠性估计分配到从这些分数得出的蛋白质鉴定。因此，对SEQUEST命中的解释通常需要人类的参与，这使得扩大基因组规模应用的鉴定过程变得困难。为了克服这些限制，我们开发了一种方法，该方法结合了神经网络和统计模型，用于“规范化”SEQUEST分数，并为每个SEQUEST命中提供可靠性估计。与SEQUEST包中使用的标准过滤程序相比，该方法提高了多肽鉴定的敏感性和特异性，并为估计蛋白质鉴定的可靠性提供了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A computational method for assessing peptide-identification reliability in tandem mass spectrometry analysis with SEQUEST

High throughput protein identification in mass spectrometry is predominantly achieved by first identifying tryptic peptides using SEQUEST and then by combining the peptide hits for protein identification. Peptide identification is typically carried out by selecting SEQUEST hits above a specified threshold, the value of which is typically chosen empirically in an attempt to separate true identifications from the false ones. These SEQUEST scores are not normalized with respect to the composition, length and other parameters of the peptides. Furthermore, there is no rigorous reliability estimate assigned to the protein identifications derived from these scores. Hence the interpretation of SEQUEST hits generally requires human involvement, making it difficult to scale up the identification process for genome-scale applications. To overcome these limitations, we have developed a method, which combines a neural network and a statistical model, for "normalizing" SEQUEST scores, and also for providing a reliability estimate for each SEQUEST hit. This method improves the sensitivity and specificity of peptide identification compared to the standard filtering procedure used in the SEQUEST package, and provides a basis for estimating the reliability of protein identifications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003

自引率

0.00%

发文量