Shengbo Wang, Satwant Kaur, Benoit J Kunath, Patrick May, Lorna Richardson, Alexander B Rogers, Paul Wilmes, Robert D Finn, Juan Antonio Vizcaíno
{"title":"An Approach to Integrate Metagenomics, Metatranscriptomics and Metaproteomics Data in Public Data Resources.","authors":"Shengbo Wang, Satwant Kaur, Benoit J Kunath, Patrick May, Lorna Richardson, Alexander B Rogers, Paul Wilmes, Robert D Finn, Juan Antonio Vizcaíno","doi":"10.1002/pmic.202500002","DOIUrl":null,"url":null,"abstract":"<p><p>The availability of public metaproteomics, metagenomics and metatranscriptomics data in public resources such as MGnify (for metagenomics/metatranscriptomics) and the PRIDE database (for metaproteomics), continues to increase. When these omics techniques are applied to the same samples, their integration offers new opportunities to understand the structure (metagenome) and functional expression (metatranscriptome and metaproteome) of the microbiome. Here, we describe a pilot study aimed at integrating public multi-meta-omics datasets from studies based on human gut and marine hatchery samples. Reference search databases (search DBs) were built using assembled metagenomic (and metatranscriptomic, where available) sequence data followed by de novo gene calling, using both data from the same sampling event and from independent samples. The resulting protein sets were evaluated for their utility in metaproteomics analysis. In agreement with previous studies, the highest number of peptide identifications was generally obtained when using search DBs created from the same samples. Data integration of the multi-omics results was performed in MGnify. For that purpose, the MGnify website was extended to enable the visualisation of the resulting peptide/protein information from three reanalysed metaproteomics datasets. A workflow (https://github.com/PRIDE-reanalysis/MetaPUF) has been developed allowing researchers to perform equivalent data integration, using paired multi-omics datasets. This is the first time that a data integration approach for multi-omics datasets has been implemented from public data available in the world-leading MGnify and PRIDE resources.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":" ","pages":"e202500002"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.202500002","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The availability of public metaproteomics, metagenomics and metatranscriptomics data in public resources such as MGnify (for metagenomics/metatranscriptomics) and the PRIDE database (for metaproteomics), continues to increase. When these omics techniques are applied to the same samples, their integration offers new opportunities to understand the structure (metagenome) and functional expression (metatranscriptome and metaproteome) of the microbiome. Here, we describe a pilot study aimed at integrating public multi-meta-omics datasets from studies based on human gut and marine hatchery samples. Reference search databases (search DBs) were built using assembled metagenomic (and metatranscriptomic, where available) sequence data followed by de novo gene calling, using both data from the same sampling event and from independent samples. The resulting protein sets were evaluated for their utility in metaproteomics analysis. In agreement with previous studies, the highest number of peptide identifications was generally obtained when using search DBs created from the same samples. Data integration of the multi-omics results was performed in MGnify. For that purpose, the MGnify website was extended to enable the visualisation of the resulting peptide/protein information from three reanalysed metaproteomics datasets. A workflow (https://github.com/PRIDE-reanalysis/MetaPUF) has been developed allowing researchers to perform equivalent data integration, using paired multi-omics datasets. This is the first time that a data integration approach for multi-omics datasets has been implemented from public data available in the world-leading MGnify and PRIDE resources.
期刊介绍:
PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.