Starless bias and parameter-estimation bias in the likelihood-based phylogenetic method

X. Xia
{"title":"Starless bias and parameter-estimation bias in the likelihood-based phylogenetic method","authors":"X. Xia","doi":"10.3934/genet.2018.4.212","DOIUrl":null,"url":null,"abstract":"Abstract I analyzed various site pattern combinations in a 4-OTU case to identify sources of starless bias and parameter-estimation bias in likelihood-based phylogenetic methods, and reported three significant contributions. First, the likelihood method is counterintuitive in that it may not generate a star tree with sequences that are equidistant from each other. This behaviour, dubbed starless bias, happens in a 4-OTU tree when there is an excess (i.e., more than expected from a star tree and a substitution model) of conflicting phylogenetic signals supporting the three resolved topologies equally. Special site pattern combinations leading to rejection of a star tree, when sequences are equidistant from each other, were identified. Second, fitting gamma distribution to model rate heterogeneity over sites is strongly confounded with tree topology, especially in conjunction with the starless bias. I present examples to show dramatic differences in the estimated shape parameter Α between a star tree and a resolved tree. There may be no rate heterogeneity over sites (with the estimated Α > 10000) when a star tree is imposed, but Α < 1 (suggesting strong rate heterogeneity over sites) when an (incorrect) resolved tree is imposed. Thus, the dependence of “rate heterogeneity” on tree topology implies that “rate heterogeneity” is not a sequence-specific feature, cautioning against interpreting a small Α to mean that some sites are under strong purifying selection and others not. Thirdly, because there is no existing (and working) likelihood method for evaluating a star tree with continuous gamma-distributed rate, I have implemented the method for JC69 in a self-contained R script for a four-OTU tree (star or resolved), in addition to another R script assuming a constant rate over sites. These R scripts should be useful for teaching and exploring likelihood methods in phylogenetics.","PeriodicalId":43477,"journal":{"name":"AIMS Genetics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/genet.2018.4.212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract I analyzed various site pattern combinations in a 4-OTU case to identify sources of starless bias and parameter-estimation bias in likelihood-based phylogenetic methods, and reported three significant contributions. First, the likelihood method is counterintuitive in that it may not generate a star tree with sequences that are equidistant from each other. This behaviour, dubbed starless bias, happens in a 4-OTU tree when there is an excess (i.e., more than expected from a star tree and a substitution model) of conflicting phylogenetic signals supporting the three resolved topologies equally. Special site pattern combinations leading to rejection of a star tree, when sequences are equidistant from each other, were identified. Second, fitting gamma distribution to model rate heterogeneity over sites is strongly confounded with tree topology, especially in conjunction with the starless bias. I present examples to show dramatic differences in the estimated shape parameter Α between a star tree and a resolved tree. There may be no rate heterogeneity over sites (with the estimated Α > 10000) when a star tree is imposed, but Α < 1 (suggesting strong rate heterogeneity over sites) when an (incorrect) resolved tree is imposed. Thus, the dependence of “rate heterogeneity” on tree topology implies that “rate heterogeneity” is not a sequence-specific feature, cautioning against interpreting a small Α to mean that some sites are under strong purifying selection and others not. Thirdly, because there is no existing (and working) likelihood method for evaluating a star tree with continuous gamma-distributed rate, I have implemented the method for JC69 in a self-contained R script for a four-OTU tree (star or resolved), in addition to another R script assuming a constant rate over sites. These R scripts should be useful for teaching and exploring likelihood methods in phylogenetics.
基于似然的系统发育方法中的无星偏差和参数估计偏差
摘要我分析了一个4-OTU病例中的各种位点模式组合,以确定基于似然的系统发育方法中无星偏倚和参数估计偏倚的来源,并报告了三个重要贡献。首先,似然法是违反直觉的,因为它可能不会生成序列彼此等距的星树。这种行为被称为无星偏倚,发生在4-OTU树中,当存在过量(即,来自星树和替代模型的超出预期的)冲突的系统发育信号时,这些信号同样支持三种已解决的拓扑结构。当序列彼此等距时,识别出了导致星树被拒绝的特殊位点模式组合。其次,拟合伽马分布以模拟站点上的速率异质性与树拓扑结构非常混淆,尤其是与无星偏差相结合。我举了一些例子来说明星形树和解析树之间估计的形状参数Α的显著差异。当使用星型树时,可能不存在站点间的速率异质性(估计的Α>10000),但当使用(不正确的)解析树时,Α<1(表明站点间的强烈速率异质性)。因此,“速率异质性”对树拓扑结构的依赖性意味着“速率异构性”不是序列特异性特征,提醒不要将小的Α解释为一些位点处于强纯化选择之下,而另一些位点则没有。第三,由于没有现有的(有效的)似然方法来评估具有连续伽玛分布速率的星树,除了假设站点上的速率不变的另一个R脚本外,我还在四个OTU树(星或已解析)的独立R脚本中实现了JC69的方法。这些R脚本应该对系统发育学中的可能性方法的教学和探索有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
AIMS Genetics
AIMS Genetics GENETICS & HEREDITY-
自引率
0.00%
发文量
0
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信