Benchmarking short-read metagenomics tools for removing host contamination.

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2025-01-06 DOI:10.1093/gigascience/giaf004

Yunyun Gao, Hao Luo, Hujie Lyu, Haifei Yang, Salsabeel Yousuf, Shi Huang, Yong-Xin Liu

{"title":"Benchmarking short-read metagenomics tools for removing host contamination.","authors":"Yunyun Gao, Hao Luo, Hujie Lyu, Haifei Yang, Salsabeel Yousuf, Shi Huang, Yong-Xin Liu","doi":"10.1093/gigascience/giaf004","DOIUrl":null,"url":null,"abstract":"Background: The rapid evolution of metagenomic sequencing technology offers remarkable opportunities to explore the intricate roles of microbiome in host health and disease, as well as to uncover the unknown structure and functions of microbial communities. However, the swift accumulation of metagenomic data poses substantial challenges for data analysis. Contamination from host DNA can substantially compromise result accuracy and increase additional computational resources by including nontarget sequences.Results: In this study, we assessed the impact of computational host DNA decontamination on downstream analyses, highlighting its importance in producing accurate results efficiently. We also evaluated the performance of conventional tools like KneadData, Bowtie2, BWA, KMCP, Kraken2, and KrakenUniq, each offering unique advantages for different applications. Furthermore, we highlighted the importance of an accurate host reference genome, noting that its absence negatively affected the decontamination performance across all tools.Conclusions: Our findings underscore the need for careful selection of decontamination tools and reference genomes to enhance the accuracy of metagenomic analyses. These insights provide valuable guidance for improving the reliability and reproducibility of microbiome research.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878760/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf004","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The rapid evolution of metagenomic sequencing technology offers remarkable opportunities to explore the intricate roles of microbiome in host health and disease, as well as to uncover the unknown structure and functions of microbial communities. However, the swift accumulation of metagenomic data poses substantial challenges for data analysis. Contamination from host DNA can substantially compromise result accuracy and increase additional computational resources by including nontarget sequences.

Results: In this study, we assessed the impact of computational host DNA decontamination on downstream analyses, highlighting its importance in producing accurate results efficiently. We also evaluated the performance of conventional tools like KneadData, Bowtie2, BWA, KMCP, Kraken2, and KrakenUniq, each offering unique advantages for different applications. Furthermore, we highlighted the importance of an accurate host reference genome, noting that its absence negatively affected the decontamination performance across all tools.

Conclusions: Our findings underscore the need for careful selection of decontamination tools and reference genomes to enhance the accuracy of metagenomic analyses. These insights provide valuable guidance for improving the reliability and reproducibility of microbiome research.

查看原文本刊更多论文

用于去除宿主污染的短读宏基因组学工具的基准测试。

背景：宏基因组测序技术的快速发展为探索微生物组在宿主健康和疾病中的复杂作用以及揭示微生物群落的未知结构和功能提供了难得的机会。然而，宏基因组数据的迅速积累给数据分析带来了巨大的挑战。宿主DNA的污染会大大降低结果的准确性，并通过包含非目标序列增加额外的计算资源。结果：在本研究中，我们评估了计算宿主DNA去污对下游分析的影响，强调了其在高效产生准确结果方面的重要性。我们还评估了传统工具的性能，如KneadData、Bowtie2、BWA、KMCP、Kraken2和KrakenUniq，每种工具都为不同的应用提供了独特的优势。此外，我们强调了准确的宿主参考基因组的重要性，注意到它的缺失会对所有工具的去污性能产生负面影响。结论：我们的研究结果强调需要仔细选择去污工具和参考基因组，以提高宏基因组分析的准确性。这些见解为提高微生物组研究的可靠性和可重复性提供了有价值的指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.