{"title":"仙子:多样本元基因组分选的快速近似覆盖。","authors":"Jim Shaw, Yun William Yu","doi":"10.1186/s40168-024-01861-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.</p><p><strong>Results: </strong>We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be <math><mrow><mo>></mo> <mn>250</mn> <mo>×</mo></mrow> </math> faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers <math><mrow><mn>98.5</mn> <mo>%</mo></mrow> </math> of MAGs with <math><mrow><mo>></mo> <mn>50</mn> <mo>%</mo></mrow> </math> completeness and <math><mrow><mo><</mo> <mn>5</mn> <mo>%</mo></mrow> </math> contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( <math><mrow><mo>></mo> <mn>1.5</mn> <mo>×</mo></mrow> </math> more <math><mrow><mo>></mo> <mn>50</mn> <mo>%</mo></mrow> </math> complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment.</p><p><strong>Conclusions: </strong>Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.</p>","PeriodicalId":18447,"journal":{"name":"Microbiome","volume":null,"pages":null},"PeriodicalIF":13.8000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323348/pdf/","citationCount":"0","resultStr":"{\"title\":\"Fairy: fast approximate coverage for multi-sample metagenomic binning.\",\"authors\":\"Jim Shaw, Yun William Yu\",\"doi\":\"10.1186/s40168-024-01861-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.</p><p><strong>Results: </strong>We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be <math><mrow><mo>></mo> <mn>250</mn> <mo>×</mo></mrow> </math> faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers <math><mrow><mn>98.5</mn> <mo>%</mo></mrow> </math> of MAGs with <math><mrow><mo>></mo> <mn>50</mn> <mo>%</mo></mrow> </math> completeness and <math><mrow><mo><</mo> <mn>5</mn> <mo>%</mo></mrow> </math> contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( <math><mrow><mo>></mo> <mn>1.5</mn> <mo>×</mo></mrow> </math> more <math><mrow><mo>></mo> <mn>50</mn> <mo>%</mo></mrow> </math> complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment.</p><p><strong>Conclusions: </strong>Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.</p>\",\"PeriodicalId\":18447,\"journal\":{\"name\":\"Microbiome\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":13.8000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323348/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microbiome\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s40168-024-01861-6\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbiome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40168-024-01861-6","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
Fairy: fast approximate coverage for multi-sample metagenomic binning.
Background: Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.
Results: We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers of MAGs with completeness and contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( more complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment.
Conclusions: Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.
期刊介绍:
Microbiome is a journal that focuses on studies of microbiomes in humans, animals, plants, and the environment. It covers both natural and manipulated microbiomes, such as those in agriculture. The journal is interested in research that uses meta-omics approaches or novel bioinformatics tools and emphasizes the community/host interaction and structure-function relationship within the microbiome. Studies that go beyond descriptive omics surveys and include experimental or theoretical approaches will be considered for publication. The journal also encourages research that establishes cause and effect relationships and supports proposed microbiome functions. However, studies of individual microbial isolates/species without exploring their impact on the host or the complex microbiome structures and functions will not be considered for publication. Microbiome is indexed in BIOSIS, Current Contents, DOAJ, Embase, MEDLINE, PubMed, PubMed Central, and Science Citations Index Expanded.