Hajra Qayyum, Muhammad Faheem Raziq, Haseeb Manzoor, Syed Shujaat Ali Zaidi, Amjad Ali, Masood Ur Rehman Kayani
{"title":"Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers.","authors":"Hajra Qayyum, Muhammad Faheem Raziq, Haseeb Manzoor, Syed Shujaat Ali Zaidi, Amjad Ali, Masood Ur Rehman Kayani","doi":"10.1007/s12539-025-00722-6","DOIUrl":null,"url":null,"abstract":"<p><p>De novo assembly and genome binning are fundamental steps for genome-resolved metagenomics analyses. However, the availability of limited computational resources and extensive processing time limit the broader application of these analyses. To address these challenges, the optimization of the parameters employed in these processes can improve the effective utilization of available metagenomics tools. Therefore, this study tested three sets of k-mers (default, reduced, and extended) for their efficiency in metagenome assembly and suitability in recovering metagenome-assembled genomes. The results demonstrate that the reduced set of k-mers outperforms the other two sets in computational efficiency and the quality of results. The assemblies from the default set are comparable with those from the reduced set; however, less complete and highly contaminated metagenome-assembled genomes are obtained at the expense of higher processing time. The extended set of k-mers yields less contiguous but computationally expensive assemblies. This set takes approximately 3-times more processing time than the reduced k-mers and recovers the lowest proportions of high and medium-quality metagenome-assembled genomes. Contrarily, the reduced set produces better assemblies, substantially improving the number and quality of the recovered metagenome-assembled genomes in significantly reduced processing time. Validation of the reduced k-mer set on previously published metagenome datasets further demonstrates its effectiveness not only for human metagenomes but also for the metagenomes of environmental origin. These findings underscore that the reduced k-mer set is optimal for efficient metagenome analyses of varying complexities and origins. This optimization of the k-mer set used in metagenome assemblers significantly reduces computational time while improving the quality of the assemblies and recovered metagenome-assembled genomes. This efficient solution will facilitate the widespread application of genome-resolved analyses, even in resource-limited settings, and help the recovery of better-quality metagenome-assembled genomes for downstream analyses.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-025-00722-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
De novo assembly and genome binning are fundamental steps for genome-resolved metagenomics analyses. However, the availability of limited computational resources and extensive processing time limit the broader application of these analyses. To address these challenges, the optimization of the parameters employed in these processes can improve the effective utilization of available metagenomics tools. Therefore, this study tested three sets of k-mers (default, reduced, and extended) for their efficiency in metagenome assembly and suitability in recovering metagenome-assembled genomes. The results demonstrate that the reduced set of k-mers outperforms the other two sets in computational efficiency and the quality of results. The assemblies from the default set are comparable with those from the reduced set; however, less complete and highly contaminated metagenome-assembled genomes are obtained at the expense of higher processing time. The extended set of k-mers yields less contiguous but computationally expensive assemblies. This set takes approximately 3-times more processing time than the reduced k-mers and recovers the lowest proportions of high and medium-quality metagenome-assembled genomes. Contrarily, the reduced set produces better assemblies, substantially improving the number and quality of the recovered metagenome-assembled genomes in significantly reduced processing time. Validation of the reduced k-mer set on previously published metagenome datasets further demonstrates its effectiveness not only for human metagenomes but also for the metagenomes of environmental origin. These findings underscore that the reduced k-mer set is optimal for efficient metagenome analyses of varying complexities and origins. This optimization of the k-mer set used in metagenome assemblers significantly reduces computational time while improving the quality of the assemblies and recovered metagenome-assembled genomes. This efficient solution will facilitate the widespread application of genome-resolved analyses, even in resource-limited settings, and help the recovery of better-quality metagenome-assembled genomes for downstream analyses.
期刊介绍:
Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology.
The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer.
The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.