Pattern-Based Phylogenetic Distance Estimation and Tree Reconstruction

M. Höhl, I. Rigoutsos, M. Ragan
{"title":"Pattern-Based Phylogenetic Distance Estimation and Tree Reconstruction","authors":"M. Höhl, I. Rigoutsos, M. Ragan","doi":"10.1177/117693430600200016","DOIUrl":null,"url":null,"abstract":"We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au), we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics Online","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/117693430600200016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49

Abstract

We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au), we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.
基于模式的系统发育距离估计与树重建
我们开发了一种无对齐方法,该方法使用最大似然方法计算系统发育距离,用于在未对齐序列中发现的模式的序列变化模型。为了评估我们方法的系统发育准确性,并对现有的无比对方法进行全面比较(可以在http://www.bioinformatics.org.au上免费获得Python包decaf+py),我们创建了一个涵盖广泛系统发育距离的参考树数据集。氨基酸序列沿着树进化并输入到测试方法中;根据计算出的距离,我们推断出拓扑结构与参考树相比较的树。我们发现我们的基于模式的方法在统计上优于所有其他经过测试的无对齐方法。我们还证明了当序列违反共线性假设时,无对齐方法比基于自动对齐的方法具有一般优势。同样,我们比较了来自现有校准基准集的经验数据的方法,我们使用该基准集来获得参考距离和树木。我们基于模式的方法产生的距离与参考距离在比其他无对齐方法更长的范围内显示线性关系。基于模式的方法优于无对齐方法,其系统发育精度在统计上与基于对齐的距离难以区分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信