An automated workflow to address proteome complexity and the large search space problem in proteomics and HLA-I immunopeptidomics.

IF 5.5 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Yehor Horokhovskyi, Hanna P Roetschke, John A Cormican, Martin Pašen, Sina Garazhian, Michele Mishto, Juliane Liepe
{"title":"An automated workflow to address proteome complexity and the large search space problem in proteomics and HLA-I immunopeptidomics.","authors":"Yehor Horokhovskyi, Hanna P Roetschke, John A Cormican, Martin Pašen, Sina Garazhian, Michele Mishto, Juliane Liepe","doi":"10.1016/j.mcpro.2025.101039","DOIUrl":null,"url":null,"abstract":"<p><p>Antigenic noncanonical epitope and novel protein discovery are research areas with therapeutical applications, predominantly done via mass spectrometry. The latter should rely on a well-characterized proteogenomic search space. Its size is barely known for antigenic noncanonical peptides and novel proteins, and this could impact on their identification. To address these issues, we here develop an automated workflow comprised of Sequoia for the creation of RNA sequencing informed and exhaustive sequence search spaces for various noncanonical peptide origins, and SPIsnake for pre-filtering and exploration of sequence search space prior to mass spectrometry searches. We apply our workflow to characterize the exact sizes of tryptic and nonspecific peptide sequence search spaces in a variety of definitions, their reduction when using RNA expression, their inflation by post-translational modifications, and the frequency of peptide sequence multimapping to different noncanonical origins. Furthermore, we explore the application of Sequoia and SPIsnake on HLA-I immunopeptidomes, thereby rescuing sensitivity in peptide identification when confronted with inflated search spaces. Taken together, Sequoia and SPIsnake pave the way for an educated development of methods addressing large-scale exhaustive proteogenomic discovery by exposing the consequences of database size inflation and ambiguity of peptide and protein sequence identification.</p>","PeriodicalId":18712,"journal":{"name":"Molecular & Cellular Proteomics","volume":" ","pages":"101039"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular & Cellular Proteomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.mcpro.2025.101039","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Antigenic noncanonical epitope and novel protein discovery are research areas with therapeutical applications, predominantly done via mass spectrometry. The latter should rely on a well-characterized proteogenomic search space. Its size is barely known for antigenic noncanonical peptides and novel proteins, and this could impact on their identification. To address these issues, we here develop an automated workflow comprised of Sequoia for the creation of RNA sequencing informed and exhaustive sequence search spaces for various noncanonical peptide origins, and SPIsnake for pre-filtering and exploration of sequence search space prior to mass spectrometry searches. We apply our workflow to characterize the exact sizes of tryptic and nonspecific peptide sequence search spaces in a variety of definitions, their reduction when using RNA expression, their inflation by post-translational modifications, and the frequency of peptide sequence multimapping to different noncanonical origins. Furthermore, we explore the application of Sequoia and SPIsnake on HLA-I immunopeptidomes, thereby rescuing sensitivity in peptide identification when confronted with inflated search spaces. Taken together, Sequoia and SPIsnake pave the way for an educated development of methods addressing large-scale exhaustive proteogenomic discovery by exposing the consequences of database size inflation and ambiguity of peptide and protein sequence identification.

解决蛋白质组学和hla - 1免疫肽组学中蛋白质组复杂性和大搜索空间问题的自动化工作流程。
抗原非规范表位和新蛋白的发现是具有治疗应用的研究领域,主要是通过质谱法完成的。后者应该依赖于一个良好表征的蛋白质基因组搜索空间。它的大小几乎不知道抗原非规范肽和新的蛋白质,这可能会影响他们的鉴定。为了解决这些问题,我们在这里开发了一个自动化的工作流程,包括Sequoia用于为各种非典型肽起源创建RNA测序信息和详尽的序列搜索空间,spissnake用于在质谱搜索之前对序列搜索空间进行预过滤和探索。我们应用我们的工作流程来描述各种定义的色氨酸和非特异性肽序列搜索空间的确切大小,它们在使用RNA表达时的减少,它们在翻译后修饰时的膨胀,以及肽序列多映射到不同非规范起源的频率。此外,我们探索了红杉和spissnake在hla - 1免疫肽集上的应用,从而在面对膨胀的搜索空间时挽救了肽识别的敏感性。总之,Sequoia和spissnake通过揭示数据库大小膨胀和肽和蛋白质序列鉴定的模糊性的后果,为解决大规模详尽的蛋白质基因组学发现的方法的有意义的发展铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular & Cellular Proteomics
Molecular & Cellular Proteomics 生物-生化研究方法
CiteScore
11.50
自引率
4.30%
发文量
131
审稿时长
84 days
期刊介绍: The mission of MCP is to foster the development and applications of proteomics in both basic and translational research. MCP will publish manuscripts that report significant new biological or clinical discoveries underpinned by proteomic observations across all kingdoms of life. Manuscripts must define the biological roles played by the proteins investigated or their mechanisms of action. The journal also emphasizes articles that describe innovative new computational methods and technological advancements that will enable future discoveries. Manuscripts describing such approaches do not have to include a solution to a biological problem, but must demonstrate that the technology works as described, is reproducible and is appropriate to uncover yet unknown protein/proteome function or properties using relevant model systems or publicly available data. Scope: -Fundamental studies in biology, including integrative "omics" studies, that provide mechanistic insights -Novel experimental and computational technologies -Proteogenomic data integration and analysis that enable greater understanding of physiology and disease processes -Pathway and network analyses of signaling that focus on the roles of post-translational modifications -Studies of proteome dynamics and quality controls, and their roles in disease -Studies of evolutionary processes effecting proteome dynamics, quality and regulation -Chemical proteomics, including mechanisms of drug action -Proteomics of the immune system and antigen presentation/recognition -Microbiome proteomics, host-microbe and host-pathogen interactions, and their roles in health and disease -Clinical and translational studies of human diseases -Metabolomics to understand functional connections between genes, proteins and phenotypes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信