Dandan Zhao, Dayana E Salas-Leiva, Shelby K Williams, Katherine A Dunn, Jason D Shao, Andrew J Roger
{"title":"Eukfinder:从宏基因组数据中检索微生物真核生物基因组序列的管道。","authors":"Dandan Zhao, Dayana E Salas-Leiva, Shelby K Williams, Katherine A Dunn, Jason D Shao, Andrew J Roger","doi":"10.1128/mbio.00699-25","DOIUrl":null,"url":null,"abstract":"<p><p>Whole-genome shotgun (WGS) metagenomic sequencing of microbial communities enables the discovery of the functions, physiologies, and evolutionary histories of prokaryotic and eukaryotic microbes. However, metagenomic studies of microbial eukaryotes lag due to challenges in identifying and assembling high-quality genomes from WGS data. To address this problem, we developed Eukfinder, a bioinformatics pipeline that identifies potential eukaryotic sequences from WGS metagenomic data, with a complementary binning workflow for recovering nuclear and mitochondrial genomes. Eukfinder uses two specialized databases for read/contig classification, customizable to specific data sets or environments. We tested Eukfinder on simulated gut microbiome data sets which included varying numbers of reads from the protist <i>Blastocystis</i>, a human gut commensal. We also applied Eukfinder to previously published human gut microbiome WGS metagenomic data to recover new genomes of <i>Blastocystis</i>. Compared to other workflows, Eukfinder offers the potential to recover high-quality, near-complete genomes of diverse eukaryotes, including different <i>Blastocystis</i> subtypes, without relying on a reference genome. With sufficient sequencing depth, Eukfinder outperforms similar tools for recovering eukaryotic genomes from metagenomic data. Eukfinder is a valuable tool for reference-independent and cultivation-free studies of eukaryotic microbial genomes from environmental WGS metagenomic samples.</p><p><strong>Importance: </strong>Advancements in next-generation sequencing have made whole-genome shotgun (WGS) metagenomic sequencing an efficient method for <i>de novo</i> reconstruction of microbial genomes from various environments. Thousands of new prokaryotic genomes have been characterized; however, the large size and complexity of protistan genomes have hindered the use of WGS metagenomics to sample microbial eukaryotic diversity. Eukfinder enables the recovery of eukaryotic microbial genomes from environmental WGS metagenomic samples. Retrieval of high-quality protistan genomes from diverse metagenomic samples increases the number of reference genomes available. This aids future metagenomic investigations into the functions, physiologies, and evolutionary histories of eukaryotic microbes in the gut microbiome and other ecosystems.</p>","PeriodicalId":18315,"journal":{"name":"mBio","volume":"16 5","pages":"e0069925"},"PeriodicalIF":5.1000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077102/pdf/","citationCount":"0","resultStr":"{\"title\":\"Eukfinder: a pipeline to retrieve microbial eukaryote genome sequences from metagenomic data.\",\"authors\":\"Dandan Zhao, Dayana E Salas-Leiva, Shelby K Williams, Katherine A Dunn, Jason D Shao, Andrew J Roger\",\"doi\":\"10.1128/mbio.00699-25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Whole-genome shotgun (WGS) metagenomic sequencing of microbial communities enables the discovery of the functions, physiologies, and evolutionary histories of prokaryotic and eukaryotic microbes. However, metagenomic studies of microbial eukaryotes lag due to challenges in identifying and assembling high-quality genomes from WGS data. To address this problem, we developed Eukfinder, a bioinformatics pipeline that identifies potential eukaryotic sequences from WGS metagenomic data, with a complementary binning workflow for recovering nuclear and mitochondrial genomes. Eukfinder uses two specialized databases for read/contig classification, customizable to specific data sets or environments. We tested Eukfinder on simulated gut microbiome data sets which included varying numbers of reads from the protist <i>Blastocystis</i>, a human gut commensal. We also applied Eukfinder to previously published human gut microbiome WGS metagenomic data to recover new genomes of <i>Blastocystis</i>. Compared to other workflows, Eukfinder offers the potential to recover high-quality, near-complete genomes of diverse eukaryotes, including different <i>Blastocystis</i> subtypes, without relying on a reference genome. With sufficient sequencing depth, Eukfinder outperforms similar tools for recovering eukaryotic genomes from metagenomic data. Eukfinder is a valuable tool for reference-independent and cultivation-free studies of eukaryotic microbial genomes from environmental WGS metagenomic samples.</p><p><strong>Importance: </strong>Advancements in next-generation sequencing have made whole-genome shotgun (WGS) metagenomic sequencing an efficient method for <i>de novo</i> reconstruction of microbial genomes from various environments. Thousands of new prokaryotic genomes have been characterized; however, the large size and complexity of protistan genomes have hindered the use of WGS metagenomics to sample microbial eukaryotic diversity. Eukfinder enables the recovery of eukaryotic microbial genomes from environmental WGS metagenomic samples. Retrieval of high-quality protistan genomes from diverse metagenomic samples increases the number of reference genomes available. This aids future metagenomic investigations into the functions, physiologies, and evolutionary histories of eukaryotic microbes in the gut microbiome and other ecosystems.</p>\",\"PeriodicalId\":18315,\"journal\":{\"name\":\"mBio\",\"volume\":\"16 5\",\"pages\":\"e0069925\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2025-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077102/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mBio\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/mbio.00699-25\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mBio","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/mbio.00699-25","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/10 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
Eukfinder: a pipeline to retrieve microbial eukaryote genome sequences from metagenomic data.
Whole-genome shotgun (WGS) metagenomic sequencing of microbial communities enables the discovery of the functions, physiologies, and evolutionary histories of prokaryotic and eukaryotic microbes. However, metagenomic studies of microbial eukaryotes lag due to challenges in identifying and assembling high-quality genomes from WGS data. To address this problem, we developed Eukfinder, a bioinformatics pipeline that identifies potential eukaryotic sequences from WGS metagenomic data, with a complementary binning workflow for recovering nuclear and mitochondrial genomes. Eukfinder uses two specialized databases for read/contig classification, customizable to specific data sets or environments. We tested Eukfinder on simulated gut microbiome data sets which included varying numbers of reads from the protist Blastocystis, a human gut commensal. We also applied Eukfinder to previously published human gut microbiome WGS metagenomic data to recover new genomes of Blastocystis. Compared to other workflows, Eukfinder offers the potential to recover high-quality, near-complete genomes of diverse eukaryotes, including different Blastocystis subtypes, without relying on a reference genome. With sufficient sequencing depth, Eukfinder outperforms similar tools for recovering eukaryotic genomes from metagenomic data. Eukfinder is a valuable tool for reference-independent and cultivation-free studies of eukaryotic microbial genomes from environmental WGS metagenomic samples.
Importance: Advancements in next-generation sequencing have made whole-genome shotgun (WGS) metagenomic sequencing an efficient method for de novo reconstruction of microbial genomes from various environments. Thousands of new prokaryotic genomes have been characterized; however, the large size and complexity of protistan genomes have hindered the use of WGS metagenomics to sample microbial eukaryotic diversity. Eukfinder enables the recovery of eukaryotic microbial genomes from environmental WGS metagenomic samples. Retrieval of high-quality protistan genomes from diverse metagenomic samples increases the number of reference genomes available. This aids future metagenomic investigations into the functions, physiologies, and evolutionary histories of eukaryotic microbes in the gut microbiome and other ecosystems.
期刊介绍:
mBio® is ASM''s first broad-scope, online-only, open access journal. mBio offers streamlined review and publication of the best research in microbiology and allied fields.