An Approach to Integrate Metagenomics, Metatranscriptomics and Metaproteomics Data in Public Data Resources.

IF 3.4 4区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

Proteomics Pub Date : 2025-04-28 DOI:10.1002/pmic.202500002

Shengbo Wang, Satwant Kaur, Benoit J Kunath, Patrick May, Lorna Richardson, Alexander B Rogers, Paul Wilmes, Robert D Finn, Juan Antonio Vizcaíno

{"title":"An Approach to Integrate Metagenomics, Metatranscriptomics and Metaproteomics Data in Public Data Resources.","authors":"Shengbo Wang, Satwant Kaur, Benoit J Kunath, Patrick May, Lorna Richardson, Alexander B Rogers, Paul Wilmes, Robert D Finn, Juan Antonio Vizcaíno","doi":"10.1002/pmic.202500002","DOIUrl":null,"url":null,"abstract":"<p><p>The availability of public metaproteomics, metagenomics and metatranscriptomics data in public resources such as MGnify (for metagenomics/metatranscriptomics) and the PRIDE database (for metaproteomics), continues to increase. When these omics techniques are applied to the same samples, their integration offers new opportunities to understand the structure (metagenome) and functional expression (metatranscriptome and metaproteome) of the microbiome. Here, we describe a pilot study aimed at integrating public multi-meta-omics datasets from studies based on human gut and marine hatchery samples. Reference search databases (search DBs) were built using assembled metagenomic (and metatranscriptomic, where available) sequence data followed by de novo gene calling, using both data from the same sampling event and from independent samples. The resulting protein sets were evaluated for their utility in metaproteomics analysis. In agreement with previous studies, the highest number of peptide identifications was generally obtained when using search DBs created from the same samples. Data integration of the multi-omics results was performed in MGnify. For that purpose, the MGnify website was extended to enable the visualisation of the resulting peptide/protein information from three reanalysed metaproteomics datasets. A workflow (https://github.com/PRIDE-reanalysis/MetaPUF) has been developed allowing researchers to perform equivalent data integration, using paired multi-omics datasets. This is the first time that a data integration approach for multi-omics datasets has been implemented from public data available in the world-leading MGnify and PRIDE resources.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":" ","pages":"e202500002"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.202500002","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The availability of public metaproteomics, metagenomics and metatranscriptomics data in public resources such as MGnify (for metagenomics/metatranscriptomics) and the PRIDE database (for metaproteomics), continues to increase. When these omics techniques are applied to the same samples, their integration offers new opportunities to understand the structure (metagenome) and functional expression (metatranscriptome and metaproteome) of the microbiome. Here, we describe a pilot study aimed at integrating public multi-meta-omics datasets from studies based on human gut and marine hatchery samples. Reference search databases (search DBs) were built using assembled metagenomic (and metatranscriptomic, where available) sequence data followed by de novo gene calling, using both data from the same sampling event and from independent samples. The resulting protein sets were evaluated for their utility in metaproteomics analysis. In agreement with previous studies, the highest number of peptide identifications was generally obtained when using search DBs created from the same samples. Data integration of the multi-omics results was performed in MGnify. For that purpose, the MGnify website was extended to enable the visualisation of the resulting peptide/protein information from three reanalysed metaproteomics datasets. A workflow (https://github.com/PRIDE-reanalysis/MetaPUF) has been developed allowing researchers to perform equivalent data integration, using paired multi-omics datasets. This is the first time that a data integration approach for multi-omics datasets has been implemented from public data available in the world-leading MGnify and PRIDE resources.

查看原文本刊更多论文

公共数据资源中元基因组学、元转录组学和宏蛋白质组学数据的整合方法

公共资源(如MGnify（用于宏基因组学/元基因组学）和PRIDE数据库（用于宏蛋白质组学）中的公共宏蛋白质组学、宏基因组学和元转录组学数据的可用性继续增加。当这些组学技术应用于相同的样品时，它们的整合为理解微生物组的结构（宏基因组）和功能表达（元转录组和元蛋白质组）提供了新的机会。在这里，我们描述了一项试点研究，旨在整合基于人类肠道和海洋孵化场样本的公共多元组学数据集。参考搜索数据库（search db）使用组装的宏基因组（和元转录组，如果有的话）序列数据，然后使用来自相同采样事件和独立样本的数据重新调用基因。对所得蛋白集在宏蛋白质组学分析中的效用进行了评估。与先前的研究一致，使用从相同样品中创建的搜索db时，通常获得最多的肽鉴定。在MGnify中进行多组学结果的数据集成。为此，MGnify网站进行了扩展，以实现从三个重新分析的宏蛋白质组学数据集获得的肽/蛋白质信息的可视化。已经开发了一个工作流（https://github.com/PRIDE-reanalysis/MetaPUF），允许研究人员使用成对的多组学数据集执行等效的数据集成。这是第一次从世界领先的MGnify和PRIDE资源中的公共数据中实现多组学数据集的数据集成方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proteomics 生物-生化研究方法

CiteScore

6.30

自引率

5.90%

发文量

193

审稿时长

3 months

期刊介绍： PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.