Validation and Community Sharing of Ocean Spectral Libraries Generated by Machine Learning for Data Independent Acquisition Ocean Metaproteomic Analyses.

IF 3.4 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Proteomics Pub Date : 2025-06-11 DOI:10.1002/pmic.13971
Margaret Mars Brisbin, Matthew R McIlvin, Damien Beau Wilburn, Jaclyn K Saunders, Natalie R Cohen, Maya Bhatia, Elizabeth Kujawinski, Brian C Searle, Mak A Saito
{"title":"Validation and Community Sharing of Ocean Spectral Libraries Generated by Machine Learning for Data Independent Acquisition Ocean Metaproteomic Analyses.","authors":"Margaret Mars Brisbin, Matthew R McIlvin, Damien Beau Wilburn, Jaclyn K Saunders, Natalie R Cohen, Maya Bhatia, Elizabeth Kujawinski, Brian C Searle, Mak A Saito","doi":"10.1002/pmic.13971","DOIUrl":null,"url":null,"abstract":"<p><p>Ocean metaproteomics provides valuable insights into the structure and function of marine microbial communities. Yet, ocean samples are challenging due to their extensive biological diversity, which results in a very large number of peptides with a large dynamic range. This study characterized the capabilities of data independent acquisition (DIA) mode for use in ocean metaproteomic samples. Spectral libraries were constructed from discovered peptides and proteins using machine learning (ML) algorithms to remove the incorporation of false positives in the libraries. When compared with 1-dimensional and 2-dimensional data dependent acquisition analyses (DDA), DIA outperformed DDA both with and without gas phase fractionation. We found that larger discovered protein spectral libraries performed better, regardless of the geographic distance between where samples were collected for library generation and where the test samples were collected. Moreover, the spectral library containing all unique proteins present in the Ocean Protein Portal (OPP) outperformed smaller libraries generated from individual sampling campaigns. However, a spectral library constructed from all open reading frames (ORFs) in a metagenome was found to be too large to be workable, resulting in low peptide identifications due to challenges in maintaining a low false discovery rate with such a large database size. Given sufficient sequencing depth and validation studies, spectral libraries generated from previously discovered proteins can serve as a community resource, saving resequencing efforts. The spectral libraries generated in this study are available at the OPP to enable future ocean proteomic studies.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":" ","pages":"e13971"},"PeriodicalIF":3.4000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.13971","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Ocean metaproteomics provides valuable insights into the structure and function of marine microbial communities. Yet, ocean samples are challenging due to their extensive biological diversity, which results in a very large number of peptides with a large dynamic range. This study characterized the capabilities of data independent acquisition (DIA) mode for use in ocean metaproteomic samples. Spectral libraries were constructed from discovered peptides and proteins using machine learning (ML) algorithms to remove the incorporation of false positives in the libraries. When compared with 1-dimensional and 2-dimensional data dependent acquisition analyses (DDA), DIA outperformed DDA both with and without gas phase fractionation. We found that larger discovered protein spectral libraries performed better, regardless of the geographic distance between where samples were collected for library generation and where the test samples were collected. Moreover, the spectral library containing all unique proteins present in the Ocean Protein Portal (OPP) outperformed smaller libraries generated from individual sampling campaigns. However, a spectral library constructed from all open reading frames (ORFs) in a metagenome was found to be too large to be workable, resulting in low peptide identifications due to challenges in maintaining a low false discovery rate with such a large database size. Given sufficient sequencing depth and validation studies, spectral libraries generated from previously discovered proteins can serve as a community resource, saving resequencing efforts. The spectral libraries generated in this study are available at the OPP to enable future ocean proteomic studies.

基于机器学习的数据独立获取海洋元蛋白质组学分析海洋光谱库的验证与社区共享。
海洋宏蛋白质组学为海洋微生物群落的结构和功能提供了有价值的见解。然而,海洋样品由于其广泛的生物多样性而具有挑战性,这导致了大量具有大动态范围的肽。本研究描述了数据独立采集(DIA)模式在海洋元蛋白质组学样本中使用的能力。利用机器学习(ML)算法从发现的肽和蛋白质构建光谱文库,以消除文库中假阳性的结合。与一维和二维数据依赖采集分析(DDA)相比,无论是否有气相分馏,DIA都优于DDA。我们发现,较大的已发现的蛋白质光谱文库表现得更好,无论在收集样本以生成文库的地点和收集测试样本的地点之间的地理距离如何。此外,包含海洋蛋白质门户(OPP)中存在的所有独特蛋白质的光谱文库优于由单个采样活动生成的较小文库。然而,从宏基因组中所有开放阅读框(orf)构建的光谱库被发现太大而不可行,由于在如此大的数据库规模下保持低错误发现率的挑战,导致肽鉴定低。如果有足够的测序深度和验证研究,从先前发现的蛋白质中生成的谱库可以作为社区资源,节省重测序工作。本研究生成的光谱库可在OPP中使用,以支持未来的海洋蛋白质组学研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proteomics
Proteomics 生物-生化研究方法
CiteScore
6.30
自引率
5.90%
发文量
193
审稿时长
3 months
期刊介绍: PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信