Laura A Katz, Marie Leleu, Godwin Ani, Rebecca Gawron, Auden Cote-L'Heureux
{"title":"用EukPhylo v.1.0重新思考大规模系统基因组学,这是一个灵活的工具包,可以实现系统发育信息的数据管理和分析不同的真核生物谱系。","authors":"Laura A Katz, Marie Leleu, Godwin Ani, Rebecca Gawron, Auden Cote-L'Heureux","doi":"10.1128/mbio.01770-25","DOIUrl":null,"url":null,"abstract":"<p><p>Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through 'omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the \"Hook Database\" of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of 'omics data for future studies.IMPORTANCEIlluminating the diversity of microbial lineages is essential for estimating the tree of life and characterizing principles of genome evolution. However, analyses of microbial eukaryotes (e.g., flagellates, amoebae) are complicated by both the paucity of reference genomes and the prevalence of contamination (e.g., by symbionts, microbiomes). EukPhylo v.1.0 enables taxon-rich analyses \"on the fly\" as users can choose optimal gene families for their focal taxa and then use replicable approaches to curate data in estimating both gene and species trees. With multiple entry points and curated data sets from up to 15,000 gene families from 1,000 taxa ready for use, EukPhylo provides a powerful launching point for researchers interested in the evolution of eukaryotes.</p>","PeriodicalId":18315,"journal":{"name":"mBio","volume":" ","pages":"e0177025"},"PeriodicalIF":4.7000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rethinking large-scale phylogenomics with EukPhylo v.1.0, a flexible toolkit to enable phylogeny-informed data curation and analyses of diverse eukaryotic lineages.\",\"authors\":\"Laura A Katz, Marie Leleu, Godwin Ani, Rebecca Gawron, Auden Cote-L'Heureux\",\"doi\":\"10.1128/mbio.01770-25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through 'omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the \\\"Hook Database\\\" of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of 'omics data for future studies.IMPORTANCEIlluminating the diversity of microbial lineages is essential for estimating the tree of life and characterizing principles of genome evolution. However, analyses of microbial eukaryotes (e.g., flagellates, amoebae) are complicated by both the paucity of reference genomes and the prevalence of contamination (e.g., by symbionts, microbiomes). EukPhylo v.1.0 enables taxon-rich analyses \\\"on the fly\\\" as users can choose optimal gene families for their focal taxa and then use replicable approaches to curate data in estimating both gene and species trees. With multiple entry points and curated data sets from up to 15,000 gene families from 1,000 taxa ready for use, EukPhylo provides a powerful launching point for researchers interested in the evolution of eukaryotes.</p>\",\"PeriodicalId\":18315,\"journal\":{\"name\":\"mBio\",\"volume\":\" \",\"pages\":\"e0177025\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mBio\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/mbio.01770-25\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mBio","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/mbio.01770-25","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
Rethinking large-scale phylogenomics with EukPhylo v.1.0, a flexible toolkit to enable phylogeny-informed data curation and analyses of diverse eukaryotic lineages.
Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through 'omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the "Hook Database" of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of 'omics data for future studies.IMPORTANCEIlluminating the diversity of microbial lineages is essential for estimating the tree of life and characterizing principles of genome evolution. However, analyses of microbial eukaryotes (e.g., flagellates, amoebae) are complicated by both the paucity of reference genomes and the prevalence of contamination (e.g., by symbionts, microbiomes). EukPhylo v.1.0 enables taxon-rich analyses "on the fly" as users can choose optimal gene families for their focal taxa and then use replicable approaches to curate data in estimating both gene and species trees. With multiple entry points and curated data sets from up to 15,000 gene families from 1,000 taxa ready for use, EukPhylo provides a powerful launching point for researchers interested in the evolution of eukaryotes.
期刊介绍:
mBio® is ASM''s first broad-scope, online-only, open access journal. mBio offers streamlined review and publication of the best research in microbiology and allied fields.