实用高效的蛋白质组学搜索:跨引擎比较。

Joao A Paulo
{"title":"实用高效的蛋白质组学搜索:跨引擎比较。","authors":"Joao A Paulo","doi":"10.9754/journal.wplus.2013.0052","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses.</p><p><strong>Methods: </strong>A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates.</p><p><strong>Results: </strong>The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%.</p><p><strong>Conclusions: </strong>The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.</p>","PeriodicalId":23609,"journal":{"name":"WebmedCentral","volume":"4 10","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.9754/journal.wplus.2013.0052","citationCount":"32","resultStr":"{\"title\":\"Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.\",\"authors\":\"Joao A Paulo\",\"doi\":\"10.9754/journal.wplus.2013.0052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses.</p><p><strong>Methods: </strong>A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates.</p><p><strong>Results: </strong>The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%.</p><p><strong>Conclusions: </strong>The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.</p>\",\"PeriodicalId\":23609,\"journal\":{\"name\":\"WebmedCentral\",\"volume\":\"4 10\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.9754/journal.wplus.2013.0052\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WebmedCentral\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.9754/journal.wplus.2013.0052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WebmedCentral","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9754/journal.wplus.2013.0052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

摘要

背景:基于质谱的蛋白质组学产生的大型数据集的分析依赖于数据库搜索算法来测序肽和识别蛋白质。有几种这样的评分方法可用,每种方法基于不同的统计基础,因此不会产生相同的结果。在这里,目的是使用多个搜索引擎比较肽和蛋白质鉴定,并检查通过增加技术重复分析的数量获得的额外蛋白质。方法:用Orbitrap质谱仪对HeLa全细胞裂解液进行10次技术重复分析。使用Mascot、SEQUEST和Andromeda对数据进行组合和检索。比较肽和蛋白质鉴定的搜索引擎之间。此外,使用每个引擎执行的搜索会增加技术复制的数量。结果:多肽和蛋白质的数量和特性在不同的搜索引擎中存在差异。在这三种搜索引擎中,蛋白质鉴定的差异大于肽鉴定的差异,表明差异的主要来源可能是蛋白质推理分组水平。数据还显示,对2个技术重复的分析可将蛋白质鉴定提高10-15%,而第三个重复可将蛋白质鉴定提高4-5%。结论:这些数据强调了两种提高质谱数据分析稳健性的实用方法。数据表明,1)使用多个搜索引擎可以扩大鉴定的蛋白质数量(union)和验证蛋白质鉴定(intersection), 2)分析2或3个技术重复可以大大扩大蛋白质鉴定。此外,通过使用不同的引擎进行数据库搜索和执行技术重复,可以从数据集中提取信息,不需要额外的样品制备,有效地利用了研究时间和精力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

Background: Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses.

Methods: A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates.

Results: The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%.

Conclusions: The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信