肽森林:半监督机器学习集成多个搜索引擎的肽识别。

IF 3.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Tristan Ranff, Matthew Dennison, Jeroen Bédorf, Stefan Schulze, Nico Zinn, Marcus Bantscheff, Jasper J R M van Heugten, Christian Fufezan
{"title":"肽森林:半监督机器学习集成多个搜索引擎的肽识别。","authors":"Tristan Ranff, Matthew Dennison, Jeroen Bédorf, Stefan Schulze, Nico Zinn, Marcus Bantscheff, Jasper J R M van Heugten, Christian Fufezan","doi":"10.1021/acs.jproteome.4c00686","DOIUrl":null,"url":null,"abstract":"<p><p>The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and <i>Escherichia coli</i> proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification.\",\"authors\":\"Tristan Ranff, Matthew Dennison, Jeroen Bédorf, Stefan Schulze, Nico Zinn, Marcus Bantscheff, Jasper J R M van Heugten, Christian Fufezan\",\"doi\":\"10.1021/acs.jproteome.4c00686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and <i>Escherichia coli</i> proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.</p>\",\"PeriodicalId\":48,\"journal\":{\"name\":\"Journal of Proteome Research\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Proteome Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jproteome.4c00686\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acs.jproteome.4c00686","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

自下而上蛋白质组学的第一步是将测量的片段质谱分配给肽序列,也称为肽谱匹配。近年来,新颖的算法将这项任务推向了新的高度;不幸的是,不同的算法有不同的优点和缺点,选择合适的算法对用户来说是一个挑战。在这里,我们介绍peptidforest,这是一种半监督机器学习方法,它集成了多种算法的分配来训练随机森林分类器来缓解这个问题。此外,与MS-GF+数据相比,PeptideForest在含有HEK和大肠杆菌混合蛋白质组的样品中增加了q值低于1%的肽-光谱匹配数25.2±1.6%。然而,数量的增加并不一定反映质量的提高,这就是为什么我们设计了一种新的方法,通过已知地面真理的样品的TMT量化来确定分配光谱的质量。因此,我们可以证明,低于1% q值的psm的增加并不会导致定量质量的下降,因此peptidforest提供了更深入了解自下而上蛋白质组学的可能性。peptidforest已经集成到我们的管道框架Ursgal中,因此可以与各种算法相结合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification.

The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Proteome Research
Journal of Proteome Research 生物-生化研究方法
CiteScore
9.00
自引率
4.50%
发文量
251
审稿时长
3 months
期刊介绍: Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信