TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees

Sarah A. Christensen, Erin K. Molloy, P. Vachaspati, T. Warnow
{"title":"TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees","authors":"Sarah A. Christensen, Erin K. Molloy, P. Vachaspati, T. Warnow","doi":"10.4230/LIPIcs.WABI.2019.4","DOIUrl":null,"url":null,"abstract":"Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"184 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Algorithms in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.WABI.2019.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.
牵引:快速非参数改进估计的基因树
基因树校正旨在通过使用计算技术以及参考树(在某些情况下可用的序列数据)来提高基因树的准确性。由于基因树的重复和丢失(GDL)导致的异质性是一个活跃的研究领域。在这里,我们研究了基因树校正问题,其中基因树异质性是由于不完全谱系分类(ILS,真核生物系统发育中常见的问题)和水平基因转移(HGT,细菌系统发育中常见的问题)。我们引入了TRACTION,一种简单的多项式时间方法,可证明地找到RF最优树改进和补全问题的最优解,该问题寻求输入树t相对于给定二叉树t的改进和补全,以最小化Robinson-Foulds (RF)距离。我们提出了一项广泛的模拟研究的结果,评估了68,000个估计的基因树的基因树校正管道中的TRACTION,使用估计的物种树作为参考树。我们探索了由于ILS和HGT而导致的基因树异质性水平不同的条件下的准确性。我们发现,在HGT和ILS条件下,TRACTION匹配或提高了GDL文献中成熟方法的准确性,并且在仅ILS条件下达到最佳。此外,在这些数据集上,TRACTION是最快的。TRACTION可在https://github.com/pranjalv123/TRACTION-RF上获得,研究数据集可在https://doi.org/10.13012/B2IDB-1747658_V1上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信