深入评估用于土壤微生物组的元基因组分类器。

IF 6.2 2区环境科学与生态学 Q1 GENETICS & HEREDITY

Environmental Microbiome Pub Date : 2024-03-28 DOI:10.1186/s40793-024-00561-w

Niranjana Rose Edwin, Amy Heather Fitzpatrick, Fiona Brennan, Florence Abram, Orla O'Sullivan

{"title":"深入评估用于土壤微生物组的元基因组分类器。","authors":"Niranjana Rose Edwin, Amy Heather Fitzpatrick, Fiona Brennan, Florence Abram, Orla O'Sullivan","doi":"10.1186/s40793-024-00561-w","DOIUrl":null,"url":null,"abstract":"Background: Recent endeavours in metagenomics, exemplified by projects such as the human microbiome project and TARA Oceans, have illuminated the complexities of microbial biomes. A robust bioinformatic pipeline and meticulous evaluation of their methodology have contributed to the success of these projects. The soil environment, however, with its unique challenges, requires a specialized methodological exploration to maximize microbial insights. A notable limitation in soil microbiome studies is the dearth of soil-specific reference databases available to classifiers that emulate the complexity of soil communities. There is also a lack of in-vitro mock communities derived from soil strains that can be assessed for taxonomic classification accuracy.Results: In this study, we generated a custom in-silico mock community containing microbial genomes commonly observed in the soil microbiome. Using this mock community, we simulated shotgun sequencing data to evaluate the performance of three leading metagenomic classifiers: Kraken2 (supplemented with Bracken, using a custom database derived from GTDB-TK genomes along with its own default database), Kaiju, and MetaPhlAn, utilizing their respective default databases for a robust analysis. Our results highlight the importance of optimizing taxonomic classification parameters, database selection, as well as analysing trimmed reads and contigs. Our study showed that classifiers tailored to the specific taxa present in our samples led to fewer errors compared to broader databases including microbial eukaryotes, protozoa, or human genomes, highlighting the effectiveness of targeted taxonomic classification. Notably, an optimal classifier performance was achieved when applying a relative abundance threshold of 0.001% or 0.005%. The Kraken2 supplemented with bracken, with a custom database demonstrated superior precision, sensitivity, F1 score, and overall sequence classification. Using a custom database, this classifier classified 99% of in-silico reads and 58% of real-world soil shotgun reads, with the latter identifying previously overlooked phyla using a custom database.Conclusion: This study underscores the potential advantages of in-silico methodological optimization in metagenomic analyses, especially when deciphering the complexities of soil microbiomes. We demonstrate that the choice of classifier and database significantly impacts microbial taxonomic profiling. Our findings suggest that employing Kraken2 with Bracken, coupled with a custom database of GTDB-TK genomes and fungal genomes at a relative abundance threshold of 0.001% provides optimal accuracy in soil shotgun metagenome analysis.","PeriodicalId":48553,"journal":{"name":"Environmental Microbiome","volume":"19 1","pages":"19"},"PeriodicalIF":6.2000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10979606/pdf/","citationCount":"0","resultStr":"{\"title\":\"An in-depth evaluation of metagenomic classifiers for soil microbiomes.\",\"authors\":\"Niranjana Rose Edwin, Amy Heather Fitzpatrick, Fiona Brennan, Florence Abram, Orla O'Sullivan\",\"doi\":\"10.1186/s40793-024-00561-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Recent endeavours in metagenomics, exemplified by projects such as the human microbiome project and TARA Oceans, have illuminated the complexities of microbial biomes. A robust bioinformatic pipeline and meticulous evaluation of their methodology have contributed to the success of these projects. The soil environment, however, with its unique challenges, requires a specialized methodological exploration to maximize microbial insights. A notable limitation in soil microbiome studies is the dearth of soil-specific reference databases available to classifiers that emulate the complexity of soil communities. There is also a lack of in-vitro mock communities derived from soil strains that can be assessed for taxonomic classification accuracy.Results: In this study, we generated a custom in-silico mock community containing microbial genomes commonly observed in the soil microbiome. Using this mock community, we simulated shotgun sequencing data to evaluate the performance of three leading metagenomic classifiers: Kraken2 (supplemented with Bracken, using a custom database derived from GTDB-TK genomes along with its own default database), Kaiju, and MetaPhlAn, utilizing their respective default databases for a robust analysis. Our results highlight the importance of optimizing taxonomic classification parameters, database selection, as well as analysing trimmed reads and contigs. Our study showed that classifiers tailored to the specific taxa present in our samples led to fewer errors compared to broader databases including microbial eukaryotes, protozoa, or human genomes, highlighting the effectiveness of targeted taxonomic classification. Notably, an optimal classifier performance was achieved when applying a relative abundance threshold of 0.001% or 0.005%. The Kraken2 supplemented with bracken, with a custom database demonstrated superior precision, sensitivity, F1 score, and overall sequence classification. Using a custom database, this classifier classified 99% of in-silico reads and 58% of real-world soil shotgun reads, with the latter identifying previously overlooked phyla using a custom database.Conclusion: This study underscores the potential advantages of in-silico methodological optimization in metagenomic analyses, especially when deciphering the complexities of soil microbiomes. We demonstrate that the choice of classifier and database significantly impacts microbial taxonomic profiling. Our findings suggest that employing Kraken2 with Bracken, coupled with a custom database of GTDB-TK genomes and fungal genomes at a relative abundance threshold of 0.001% provides optimal accuracy in soil shotgun metagenome analysis.\",\"PeriodicalId\":48553,\"journal\":{\"name\":\"Environmental Microbiome\",\"volume\":\"19 1\",\"pages\":\"19\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10979606/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Microbiome\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1186/s40793-024-00561-w\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Microbiome","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1186/s40793-024-00561-w","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

背景：人类微生物组项目和 TARA 海洋项目等元基因组学方面的最新研究成果揭示了微生物生物群落的复杂性。强大的生物信息学管道和对其方法的细致评估为这些项目的成功做出了贡献。然而，土壤环境具有独特的挑战，需要专门的方法论探索，才能最大限度地深入了解微生物。土壤微生物组研究的一个显著局限是缺乏土壤专用的参考数据库，分类器无法仿效土壤群落的复杂性。此外，还缺乏可评估分类准确性的土壤菌株体外模拟群落：在这项研究中，我们生成了一个定制的模拟体内群落，其中包含土壤微生物群落中常见的微生物基因组。利用该模拟群落，我们模拟了霰弹枪测序数据，以评估三种领先的元基因组分类器的性能：Kraken2（辅以 Bracken，使用源自 GTDB-TK 基因组的定制数据库及其自身的默认数据库）、Kaiju 和 MetaPhlAn，利用各自的默认数据库进行稳健分析。我们的研究结果凸显了优化分类参数、数据库选择以及分析修剪读数和等位基因的重要性。我们的研究表明，与微生物真核生物、原生动物或人类基因组等更广泛的数据库相比，针对样本中存在的特定分类群定制的分类器导致的错误更少，这凸显了有针对性的分类器分类的有效性。值得注意的是，当相对丰度阈值为0.001%或0.005%时，分类器的性能达到最佳。Kraken2 补充了蕨类植物和定制数据库，在精确度、灵敏度、F1 分数和整体序列分类方面都表现出色。使用自定义数据库，该分类器对 99% 的实验室内读数和 58% 的实际土壤猎枪读数进行了分类，后者使用自定义数据库识别了以前被忽视的门类：这项研究强调了在元基因组分析中进行室内方法优化的潜在优势，尤其是在解读复杂的土壤微生物群时。我们证明，分类器和数据库的选择会对微生物分类剖析产生重大影响。我们的研究结果表明，使用 Kraken2 和 Bracken，再加上相对丰度阈值为 0.001% 的 GTDB-TK 基因组和真菌基因组定制数据库，可以为土壤猎枪元基因组分析提供最佳准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An in-depth evaluation of metagenomic classifiers for soil microbiomes.

Background: Recent endeavours in metagenomics, exemplified by projects such as the human microbiome project and TARA Oceans, have illuminated the complexities of microbial biomes. A robust bioinformatic pipeline and meticulous evaluation of their methodology have contributed to the success of these projects. The soil environment, however, with its unique challenges, requires a specialized methodological exploration to maximize microbial insights. A notable limitation in soil microbiome studies is the dearth of soil-specific reference databases available to classifiers that emulate the complexity of soil communities. There is also a lack of in-vitro mock communities derived from soil strains that can be assessed for taxonomic classification accuracy.

Results: In this study, we generated a custom in-silico mock community containing microbial genomes commonly observed in the soil microbiome. Using this mock community, we simulated shotgun sequencing data to evaluate the performance of three leading metagenomic classifiers: Kraken2 (supplemented with Bracken, using a custom database derived from GTDB-TK genomes along with its own default database), Kaiju, and MetaPhlAn, utilizing their respective default databases for a robust analysis. Our results highlight the importance of optimizing taxonomic classification parameters, database selection, as well as analysing trimmed reads and contigs. Our study showed that classifiers tailored to the specific taxa present in our samples led to fewer errors compared to broader databases including microbial eukaryotes, protozoa, or human genomes, highlighting the effectiveness of targeted taxonomic classification. Notably, an optimal classifier performance was achieved when applying a relative abundance threshold of 0.001% or 0.005%. The Kraken2 supplemented with bracken, with a custom database demonstrated superior precision, sensitivity, F1 score, and overall sequence classification. Using a custom database, this classifier classified 99% of in-silico reads and 58% of real-world soil shotgun reads, with the latter identifying previously overlooked phyla using a custom database.

Conclusion: This study underscores the potential advantages of in-silico methodological optimization in metagenomic analyses, especially when deciphering the complexities of soil microbiomes. We demonstrate that the choice of classifier and database significantly impacts microbial taxonomic profiling. Our findings suggest that employing Kraken2 with Bracken, coupled with a custom database of GTDB-TK genomes and fungal genomes at a relative abundance threshold of 0.001% provides optimal accuracy in soil shotgun metagenome analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Environmental Microbiome Immunology and Microbiology-Microbiology

CiteScore

7.40

自引率

2.50%

发文量

审稿时长

13 weeks

期刊介绍： Microorganisms, omnipresent across Earth's diverse environments, play a crucial role in adapting to external changes, influencing Earth's systems and cycles, and contributing significantly to agricultural practices. Through applied microbiology, they offer solutions to various everyday needs. Environmental Microbiome recognizes the universal presence and significance of microorganisms, inviting submissions that explore the diverse facets of environmental and applied microbiological research.