Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers.

IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Hajra Qayyum, Muhammad Faheem Raziq, Haseeb Manzoor, Syed Shujaat Ali Zaidi, Amjad Ali, Masood Ur Rehman Kayani
{"title":"Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers.","authors":"Hajra Qayyum, Muhammad Faheem Raziq, Haseeb Manzoor, Syed Shujaat Ali Zaidi, Amjad Ali, Masood Ur Rehman Kayani","doi":"10.1007/s12539-025-00722-6","DOIUrl":null,"url":null,"abstract":"<p><p>De novo assembly and genome binning are fundamental steps for genome-resolved metagenomics analyses. However, the availability of limited computational resources and extensive processing time limit the broader application of these analyses. To address these challenges, the optimization of the parameters employed in these processes can improve the effective utilization of available metagenomics tools. Therefore, this study tested three sets of k-mers (default, reduced, and extended) for their efficiency in metagenome assembly and suitability in recovering metagenome-assembled genomes. The results demonstrate that the reduced set of k-mers outperforms the other two sets in computational efficiency and the quality of results. The assemblies from the default set are comparable with those from the reduced set; however, less complete and highly contaminated metagenome-assembled genomes are obtained at the expense of higher processing time. The extended set of k-mers yields less contiguous but computationally expensive assemblies. This set takes approximately 3-times more processing time than the reduced k-mers and recovers the lowest proportions of high and medium-quality metagenome-assembled genomes. Contrarily, the reduced set produces better assemblies, substantially improving the number and quality of the recovered metagenome-assembled genomes in significantly reduced processing time. Validation of the reduced k-mer set on previously published metagenome datasets further demonstrates its effectiveness not only for human metagenomes but also for the metagenomes of environmental origin. These findings underscore that the reduced k-mer set is optimal for efficient metagenome analyses of varying complexities and origins. This optimization of the k-mer set used in metagenome assemblers significantly reduces computational time while improving the quality of the assemblies and recovered metagenome-assembled genomes. This efficient solution will facilitate the widespread application of genome-resolved analyses, even in resource-limited settings, and help the recovery of better-quality metagenome-assembled genomes for downstream analyses.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-025-00722-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

De novo assembly and genome binning are fundamental steps for genome-resolved metagenomics analyses. However, the availability of limited computational resources and extensive processing time limit the broader application of these analyses. To address these challenges, the optimization of the parameters employed in these processes can improve the effective utilization of available metagenomics tools. Therefore, this study tested three sets of k-mers (default, reduced, and extended) for their efficiency in metagenome assembly and suitability in recovering metagenome-assembled genomes. The results demonstrate that the reduced set of k-mers outperforms the other two sets in computational efficiency and the quality of results. The assemblies from the default set are comparable with those from the reduced set; however, less complete and highly contaminated metagenome-assembled genomes are obtained at the expense of higher processing time. The extended set of k-mers yields less contiguous but computationally expensive assemblies. This set takes approximately 3-times more processing time than the reduced k-mers and recovers the lowest proportions of high and medium-quality metagenome-assembled genomes. Contrarily, the reduced set produces better assemblies, substantially improving the number and quality of the recovered metagenome-assembled genomes in significantly reduced processing time. Validation of the reduced k-mer set on previously published metagenome datasets further demonstrates its effectiveness not only for human metagenomes but also for the metagenomes of environmental origin. These findings underscore that the reduced k-mer set is optimal for efficient metagenome analyses of varying complexities and origins. This optimization of the k-mer set used in metagenome assemblers significantly reduces computational time while improving the quality of the assemblies and recovered metagenome-assembled genomes. This efficient solution will facilitate the widespread application of genome-resolved analyses, even in resource-limited settings, and help the recovery of better-quality metagenome-assembled genomes for downstream analyses.

利用一组简化的k-mers从复杂宏基因组中高效从头组装和恢复微生物基因组。
从头组装和基因组分离是基因组解析宏基因组学分析的基本步骤。然而,有限的计算资源和广泛的处理时间限制了这些分析的广泛应用。为了解决这些挑战,优化这些过程中使用的参数可以提高现有宏基因组学工具的有效利用。因此,本研究测试了三组k-mers(默认、减少和扩展)在宏基因组组装中的效率和在恢复宏基因组组装基因组中的适用性。结果表明,k-mers的约简集在计算效率和结果质量方面优于其他两种集。默认集合中的程序集与约简集合中的程序集具有可比性;然而,较不完整和高度污染的宏基因组组装基因组是以较高的处理时间为代价获得的。k-mers的扩展集产生较少的连续但计算代价昂贵的程序集。该集合需要大约3倍以上的处理时间比减少k-mers和恢复高和中等质量宏基因组组装基因组的最低比例。相反,简化集产生更好的组装,在显著减少处理时间的情况下,大大提高了恢复的宏基因组组装基因组的数量和质量。在先前发表的宏基因组数据集上验证简化的k-mer集进一步证明了它不仅对人类宏基因组有效,而且对环境源宏基因组也有效。这些发现强调,减少k-mer集是最理想的有效的宏基因组分析不同的复杂性和起源。宏基因组组装器中使用的k-mer集的优化大大减少了计算时间,同时提高了组装和恢复的宏基因组组装基因组的质量。这种高效的解决方案将促进基因组解析分析的广泛应用,即使在资源有限的情况下也是如此,并有助于恢复更高质量的宏基因组组装基因组,用于下游分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Interdisciplinary Sciences: Computational Life Sciences
Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
8.60
自引率
4.20%
发文量
55
期刊介绍: Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信