利用系统发育学联合分析的整体成功解析来量化基因树与多面体之间的一致性。

IF 3.9 2区 生物学 Q1 EVOLUTIONARY BIOLOGY
Cladistics Pub Date : 2023-04-25 DOI:10.1111/cla.12540
Mark P. Simmons, Pablo A. Goloboff, Ben C. Stöver, Mark S. Springer, John Gatesy
{"title":"利用系统发育学联合分析的整体成功解析来量化基因树与多面体之间的一致性。","authors":"Mark P. Simmons,&nbsp;Pablo A. Goloboff,&nbsp;Ben C. Stöver,&nbsp;Mark S. Springer,&nbsp;John Gatesy","doi":"10.1111/cla.12540","DOIUrl":null,"url":null,"abstract":"<p>Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (<span>congsort</span>), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson–Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.</p>","PeriodicalId":50688,"journal":{"name":"Cladistics","volume":"39 5","pages":"418-436"},"PeriodicalIF":3.9000,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cla.12540","citationCount":"1","resultStr":"{\"title\":\"Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses\",\"authors\":\"Mark P. Simmons,&nbsp;Pablo A. Goloboff,&nbsp;Ben C. Stöver,&nbsp;Mark S. Springer,&nbsp;John Gatesy\",\"doi\":\"10.1111/cla.12540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (<span>congsort</span>), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson–Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.</p>\",\"PeriodicalId\":50688,\"journal\":{\"name\":\"Cladistics\",\"volume\":\"39 5\",\"pages\":\"418-436\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2023-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cla.12540\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cladistics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/cla.12540\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cladistics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cla.12540","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 1

摘要

在系统发育综合分析中,基因树推断错误会导致种树推断假象。在这里,我们整合了两种适应这些推断错误的方法:折叠任意或可疑解析的基因树分支,以及基于它们的成对同余对基因树进行二次采样。我们在似然分析和严格一致树的简约性中,用0%近似似然比检验(SH-like aLRT)支持测试了折叠基因树分支的效果,然后基于不惩罚多面体的一致性度量对那些部分解析的树进行了二次采样。为此,我们开发了一种新的TNT同余排序脚本(congsort),并使用它来计算八个系统发育组数据集的拓扑不一致性,使用三个距离度量:标准Robinson Foulds(RF)距离;整体解决成功率(OSR),其基于对匹配分支和矛盾分支的计数;和RF矛盾,只计算矛盾的分支。正如预期的那样,我们发现基因树的不一致性通常集中在任意或可疑解决的分支中,并且部分折叠的基因树与从这些基因推断出的合并和连接拓扑之间存在更大的一致性。聚结分支长度通常会随着最不协调的基因树被排除在外而增加,尽管分支支持通常不会。我们研究了两种成功且互补的方法来优先考虑基因,以研究比对或同源性错误。与级联分支相矛盾的聚结分支对基因树子采样的鲁棒性通常不如全等分支。在分支长度、支持度和一致性方面,我们首选的折叠可能性基因树分支(0%SH样aLRT支持)和对这些树进行二次采样(OSR)的方法通常优于大型真菌数据集的竞争方法。我们建议广泛应用这种方法(以及基于简约分析的严格一致树),以改进基因树一致性/冲突的量化,估计合并分支长度,测试合并分析对基因树估计误差的稳健性,以及提高汇总合并分析的拓扑稳健性。即使对于庞大的数据集,这种方法也很容易实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses

Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses

Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson–Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cladistics
Cladistics 生物-进化生物学
CiteScore
8.60
自引率
5.60%
发文量
34
期刊介绍: Cladistics publishes high quality research papers on systematics, encouraging debate on all aspects of the field, from philosophy, theory and methodology to empirical studies and applications in biogeography, coevolution, conservation biology, ontogeny, genomics and paleontology. Cladistics is read by scientists working in the research fields of evolution, systematics and integrative biology and enjoys a consistently high position in the ISI® rankings for evolutionary biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信