实用高效的蛋白质组学搜索:跨引擎比较。

WebmedCentral Pub Date : 2013-10-01 DOI:10.9754/journal.wplus.2013.0052

Joao A Paulo

{"title":"实用高效的蛋白质组学搜索:跨引擎比较。","authors":"Joao A Paulo","doi":"10.9754/journal.wplus.2013.0052","DOIUrl":null,"url":null,"abstract":"Background: Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses.Methods: A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates.Results: The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%.Conclusions: The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.","PeriodicalId":23609,"journal":{"name":"WebmedCentral","volume":"4 10","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.9754/journal.wplus.2013.0052","citationCount":"32","resultStr":"{\"title\":\"Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.\",\"authors\":\"Joao A Paulo\",\"doi\":\"10.9754/journal.wplus.2013.0052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses.Methods: A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates.Results: The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%.Conclusions: The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.\",\"PeriodicalId\":23609,\"journal\":{\"name\":\"WebmedCentral\",\"volume\":\"4 10\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.9754/journal.wplus.2013.0052\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WebmedCentral\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.9754/journal.wplus.2013.0052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WebmedCentral","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9754/journal.wplus.2013.0052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

背景:基于质谱的蛋白质组学产生的大型数据集的分析依赖于数据库搜索算法来测序肽和识别蛋白质。有几种这样的评分方法可用，每种方法基于不同的统计基础，因此不会产生相同的结果。在这里，目的是使用多个搜索引擎比较肽和蛋白质鉴定，并检查通过增加技术重复分析的数量获得的额外蛋白质。方法:用Orbitrap质谱仪对HeLa全细胞裂解液进行10次技术重复分析。使用Mascot、SEQUEST和Andromeda对数据进行组合和检索。比较肽和蛋白质鉴定的搜索引擎之间。此外，使用每个引擎执行的搜索会增加技术复制的数量。结果:多肽和蛋白质的数量和特性在不同的搜索引擎中存在差异。在这三种搜索引擎中，蛋白质鉴定的差异大于肽鉴定的差异，表明差异的主要来源可能是蛋白质推理分组水平。数据还显示，对2个技术重复的分析可将蛋白质鉴定提高10-15%，而第三个重复可将蛋白质鉴定提高4-5%。结论:这些数据强调了两种提高质谱数据分析稳健性的实用方法。数据表明，1)使用多个搜索引擎可以扩大鉴定的蛋白质数量(union)和验证蛋白质鉴定(intersection)， 2)分析2或3个技术重复可以大大扩大蛋白质鉴定。此外，通过使用不同的引擎进行数据库搜索和执行技术重复，可以从数据集中提取信息，不需要额外的样品制备，有效地利用了研究时间和精力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

查看原文本刊更多论文

Practical and Efficient Searching in Proteomics: A Cross Engine Comparison.

Background: Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses.

Methods: A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates.

Results: The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%.

Conclusions: The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

WebmedCentral

自引率

0.00%

发文量