将非规范开放阅读框注释为人类蛋白质的高质量肽证据

bioRxiv - Molecular Biology Pub Date : 2024-09-09 DOI:10.1101/2024.09.09.612016

Eric W Deutsch, Leron W Kok, Jonathan M Mudge, Jorge Ruiz-Orera, Ivo Fierro-Monti, Zhi Sun, Jennifer G Abelin, M Mar Alba, Julie L Aspden, Ariel A Bazzini, Elspeth Bruford, Marie A Brunet, Lorenzo Calviello, Steven A Carr, Anne-Ruxandra Carvunis, Sonia Chothani, Jim Clauwaert, Kellie Dean, Pouya Faridi, Adam Frankish, Norbert Hubner, Nicholas Ingolia, Michele Magrane, Maria Jesus Martin, Thomas F Martinez, Gerben Menschaert, Uwe Ohler, Sandra Orchard, Owen Rackham, Xavier Roucou, Sarah A Slavoff, Eivind Valen, Aaron C Wacholder, Jonathan S. Weissman, Wei Wu, Zhi Xie, Jyoti Choudhary, Michal Bassani-Sternberg, Juan Antonio Vizcaino, Nicola Ternette, Robert L. Moritz, John Prensner, Sebastiaan van Heesch

{"title":"将非规范开放阅读框注释为人类蛋白质的高质量肽证据","authors":"Eric W Deutsch, Leron W Kok, Jonathan M Mudge, Jorge Ruiz-Orera, Ivo Fierro-Monti, Zhi Sun, Jennifer G Abelin, M Mar Alba, Julie L Aspden, Ariel A Bazzini, Elspeth Bruford, Marie A Brunet, Lorenzo Calviello, Steven A Carr, Anne-Ruxandra Carvunis, Sonia Chothani, Jim Clauwaert, Kellie Dean, Pouya Faridi, Adam Frankish, Norbert Hubner, Nicholas Ingolia, Michele Magrane, Maria Jesus Martin, Thomas F Martinez, Gerben Menschaert, Uwe Ohler, Sandra Orchard, Owen Rackham, Xavier Roucou, Sarah A Slavoff, Eivind Valen, Aaron C Wacholder, Jonathan S. Weissman, Wei Wu, Zhi Xie, Jyoti Choudhary, Michal Bassani-Sternberg, Juan Antonio Vizcaino, Nicola Ternette, Robert L. Moritz, John Prensner, Sebastiaan van Heesch","doi":"10.1101/2024.09.09.612016","DOIUrl":null,"url":null,"abstract":"A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.","PeriodicalId":501108,"journal":{"name":"bioRxiv - Molecular Biology","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-quality peptide evidence for annotating non-canonical open reading frames as human proteins\",\"authors\":\"Eric W Deutsch, Leron W Kok, Jonathan M Mudge, Jorge Ruiz-Orera, Ivo Fierro-Monti, Zhi Sun, Jennifer G Abelin, M Mar Alba, Julie L Aspden, Ariel A Bazzini, Elspeth Bruford, Marie A Brunet, Lorenzo Calviello, Steven A Carr, Anne-Ruxandra Carvunis, Sonia Chothani, Jim Clauwaert, Kellie Dean, Pouya Faridi, Adam Frankish, Norbert Hubner, Nicholas Ingolia, Michele Magrane, Maria Jesus Martin, Thomas F Martinez, Gerben Menschaert, Uwe Ohler, Sandra Orchard, Owen Rackham, Xavier Roucou, Sarah A Slavoff, Eivind Valen, Aaron C Wacholder, Jonathan S. Weissman, Wei Wu, Zhi Xie, Jyoti Choudhary, Michal Bassani-Sternberg, Juan Antonio Vizcaino, Nicola Ternette, Robert L. Moritz, John Prensner, Sebastiaan van Heesch\",\"doi\":\"10.1101/2024.09.09.612016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.\",\"PeriodicalId\":501108,\"journal\":{\"name\":\"bioRxiv - Molecular Biology\",\"volume\":\"39 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Molecular Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.09.09.612016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Molecular Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.09.612016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质编码基因组是研究人类健康的主要基础，因此研究蛋白质编码基因组的特征是科学研究的主要动力。但根本问题仍然是：之前的基因组分析遗漏了什么？在过去的十年中，人们在各种人类细胞类型和疾病状态中观察到了非规范开放阅读框（ncORFs）的翻译，这对蛋白质组学、基因组学和临床科学产生了重大影响。然而，由于缺乏对 ncORFs 对人类蛋白质组贡献的大规模了解，ncORFs 的影响一直受到限制。在这里，我们报告了蛋白质组学、免疫肽组学、Ribo-seq ORF 发现和基因注释等领域的相关人员共同努力的结果，从而得出了 ncORFs 蛋白水平证据的共识图谱。我们的研究表明，在一组 7,264 个 ncORFs 中，至少有 25% 的 ncORFs 产生了翻译基因产物，在一项泛蛋白质组分析中，来自 95,520 次实验的 38 亿条质谱产生了超过 3,000 个肽段。利用这些数据，我们开发了 ncORFs 注释框架，并通过 GENCODE 和 PeptideAtlas 为研究人员创建了公共工具。这项工作将提供一个平台，推动生物医学发现中的 ncORF 衍生蛋白质，以及在人类之外观察到类似 ncORF 的各种动物和植物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-quality peptide evidence for annotating non-canonical open reading frames as human proteins

A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

bioRxiv - Molecular Biology

自引率

0.00%

发文量