subMG automates data submission for metagenomics studies.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-06-05 DOI:10.1186/s13040-025-00453-w

Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba

{"title":"subMG automates data submission for metagenomics studies.","authors":"Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba","doi":"10.1186/s13040-025-00453-w","DOIUrl":null,"url":null,"abstract":"Background: Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.Results: subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.Conclusions: By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"38"},"PeriodicalIF":6.1000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12142852/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00453-w","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.

Results: subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.

Conclusions: By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.

Abstract Image

查看原文本刊更多论文

subMG自动提交宏基因组学研究的数据。

背景：公开可用的宏基因组学数据集对于确保科学发现的可重复性和支持当代大规模研究至关重要。然而，提交一个全面的宏基因组数据集既麻烦又耗时。它需要包括样本信息、测序读数、组装、分组组合、宏基因组组装基因组（MAGs）和适当的元数据。因此，宏基因组学研究的发表往往带有不完整的数据集，或者在某些情况下，根本没有任何数据。subMG通过简化和自动化数据提交过程来解决这一挑战，从而鼓励更广泛和更一致的数据共享。结果：subMG简化了向欧洲核苷酸档案馆（ENA）提交宏基因组学研究结果的过程，允许研究人员以单一形式输入他们研究中的文件和元数据，并自动化下游任务，否则需要大量的手工工作和专业知识。该工具附带了全面的文档以及为不同用例量身定制的示例数据，可以通过命令行或图形用户界面（GUI）操作，使其易于部署到广泛的潜在用户。结论：通过简化基因组解析宏基因组学研究数据集的提交，subMG显著减少了研究人员所需的时间、精力和专业知识，从而为未来更多、更全面的数据提交铺平了道路。越来越多的文献完备且公平的数据可用于未来的研究，特别是在荟萃分析和比较研究中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.