Rethinking large-scale phylogenomics with EukPhylo v.1.0, a flexible toolkit to enable phylogeny-informed data curation and analyses of diverse eukaryotic lineages.

IF 4.7 1区 生物学 Q1 MICROBIOLOGY
mBio Pub Date : 2025-10-08 Epub Date: 2025-08-27 DOI:10.1128/mbio.01770-25
Laura A Katz, Marie Leleu, Godwin Ani, Rebecca Gawron, Auden Cote-L'Heureux
{"title":"Rethinking large-scale phylogenomics with EukPhylo v.1.0, a flexible toolkit to enable phylogeny-informed data curation and analyses of diverse eukaryotic lineages.","authors":"Laura A Katz, Marie Leleu, Godwin Ani, Rebecca Gawron, Auden Cote-L'Heureux","doi":"10.1128/mbio.01770-25","DOIUrl":null,"url":null,"abstract":"<p><p>Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through 'omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the \"Hook Database\" of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of 'omics data for future studies.IMPORTANCEIlluminating the diversity of microbial lineages is essential for estimating the tree of life and characterizing principles of genome evolution. However, analyses of microbial eukaryotes (e.g., flagellates, amoebae) are complicated by both the paucity of reference genomes and the prevalence of contamination (e.g., by symbionts, microbiomes). EukPhylo v.1.0 enables taxon-rich analyses \"on the fly\" as users can choose optimal gene families for their focal taxa and then use replicable approaches to curate data in estimating both gene and species trees. With multiple entry points and curated data sets from up to 15,000 gene families from 1,000 taxa ready for use, EukPhylo provides a powerful launching point for researchers interested in the evolution of eukaryotes.</p>","PeriodicalId":18315,"journal":{"name":"mBio","volume":" ","pages":"e0177025"},"PeriodicalIF":4.7000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mBio","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/mbio.01770-25","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through 'omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the "Hook Database" of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of 'omics data for future studies.IMPORTANCEIlluminating the diversity of microbial lineages is essential for estimating the tree of life and characterizing principles of genome evolution. However, analyses of microbial eukaryotes (e.g., flagellates, amoebae) are complicated by both the paucity of reference genomes and the prevalence of contamination (e.g., by symbionts, microbiomes). EukPhylo v.1.0 enables taxon-rich analyses "on the fly" as users can choose optimal gene families for their focal taxa and then use replicable approaches to curate data in estimating both gene and species trees. With multiple entry points and curated data sets from up to 15,000 gene families from 1,000 taxa ready for use, EukPhylo provides a powerful launching point for researchers interested in the evolution of eukaryotes.

用EukPhylo v.1.0重新思考大规模系统基因组学,这是一个灵活的工具包,可以实现系统发育信息的数据管理和分析不同的真核生物谱系。
真核生物的多样性主要是微生物,宏观谱系(植物、动物和真菌)在大量不同的原生生物中筑巢。通过组学分析,我们对真核生物之间的进化关系的理解正在迅速推进,但系统基因组学分析对微真核生物,特别是不可培养的谱系具有挑战性,因为单细胞测序方法会产生来自宿主、相关微生物组和污染物的混合序列。此外,许多真核基因家族和系统发育的分析依赖于精品数据集和方法,这对其他研究小组来说是具有挑战性的。为了应对这些挑战,我们提出了EukPhylo v.1.0,这是一个模块化的,用户友好的管道,可以通过系统发育信息的污染去除,同源基因家族(GFs)的估计以及多序列比对和基因树的生成来实现有效的数据管理。对于GF的分配,我们提供了大约15,000个古老GF的“Hook数据库”,用户可以很容易地将其替换为一组感兴趣的基因家族。我们展示了EukPhylo的力量,包括一套独立的实用程序,通过从1000种不同的真核生物、细菌和古细菌中取样的500个保守的GFs进行系统基因组分析。我们展示了对真核生物树的估计的改进,通过使用真核污染环的连续几轮管理,恢复了文献中已经建立的分支。最后的树证实了文献中的许多假设(例如,Opisthokonta, Rhizaria, Amoebozoa),同时挑战了其他假设(例如,CRuMs, Obazoa, Diaphoretickes)。EukPhylo的灵活性和透明度为未来研究的组学数据管理设定了新的标准。阐明微生物谱系的多样性对于估计生命之树和描述基因组进化原理至关重要。然而,微生物真核生物(如鞭毛虫、变形虫)的分析由于缺乏参考基因组和普遍的污染(如共生体、微生物组)而变得复杂。EukPhylo v.1.0允许“在飞行中”进行丰富的分类群分析,因为用户可以为他们的重点分类群选择最佳的基因家族,然后使用可复制的方法来管理估计基因和物种树的数据。EukPhylo拥有多个入口点和来自1,000个分类群的多达15,000个基因家族的精心策划的数据集,为对真核生物进化感兴趣的研究人员提供了一个强大的出发点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
mBio
mBio MICROBIOLOGY-
CiteScore
10.50
自引率
3.10%
发文量
762
审稿时长
1 months
期刊介绍: mBio® is ASM''s first broad-scope, online-only, open access journal. mBio offers streamlined review and publication of the best research in microbiology and allied fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信