On computing the nearest neighbor interchange distance

B. Dasgupta, Xin He, Tao Jiang, Ming Li, J. Tromp, Louxin Zhang
{"title":"On computing the nearest neighbor interchange distance","authors":"B. Dasgupta, Xin He, Tao Jiang, Ming Li, J. Tromp, Louxin Zhang","doi":"10.1090/dimacs/055/09","DOIUrl":null,"url":null,"abstract":"In the practice of molecular evolution, different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [24] or from different genes [15, 16, 17, 18, 14]. Comparing these trees to find their similarities (e.g. agreement or consensus) and dissimilarities, i.e. distance, is thus an important issue in computational molecular biology. The nearest neighbor interchange (nni) distance [29, 28, 34, 3, 6, 2, 19, 20, 23, 33, 22, 21, 26] is a natural distance metric that has been extensively studied. Despite its many appealing aspects such as simplicity and sensitivity to tree topologies, computing this distance has remained very challenging, and many algorithmic and complexity issues about computing this distance have remained unresolved. This paper studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of this distance on weighted phylogenies. The following results answer many open questions about the nni distance posed in the literature. 1. Computing the nni distance between two labeled trees is NP-complete. This solves a 25 year old open question appearing again and again in, for example, [29, 34, 3, 6, 2, 19, 20, 23, 22, 21, 26]. 2. Computing the nni distance between two unlabeled trees is also NPcomplete. This answers an open question in [3] for which an erroneous proof appeared in [23]. 3. Biological applications motivate us to extend the nni distance to weighted phylogenies, where edge weights indicate the time-span of evolution along each edge. We present an O(n2) time approximation algorithm for computing the nni distance on weighted phylogenies with a performance ratio of 4 logn+ 4, where n is the number of leaves in the phylogenies. We also observe that the nni distance is in fact identical to the linear-cost subtree-transfer distance on unweighted phylogenies discussed in [4, 5]. Some consequences of this observation are also discussed. 1991 Mathematics Subject Classification. Primary 68Q17, 68W40; Secondary 68Q25. The results reported here also form a subset of the results that appeared in Proc. 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 1997, pp. 427-436 [4]. The remaining results of the conference paper which do not appear in this paper appeared separately in Algorithmica, Vol. 25, No. 2, pp. 176-195, 1999. The first author was supported by an CGAT (Canadian Genome Analysis and Technology) grant. The second author was supported in part by CGAT and NSF grant 9205982. The third author was supported in part by NSERC Operating Grant OGP0046613 and CGAT. The fourth author was supported by NSERC Operating Grant OGP0046506 and CGAT. The fifth author was supported by an NSERC International Fellowship and CGAT. . Work done while the first author was at University of Waterloo and McMaster University, the second author was visiting at University of Waterloo, the third author was visiting University of Washington, and the fifth and the sixth authors were at University of Waterloo.","PeriodicalId":277768,"journal":{"name":"Discrete Mathematical Problems with Medical Applications","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discrete Mathematical Problems with Medical Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1090/dimacs/055/09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 63

Abstract

In the practice of molecular evolution, different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [24] or from different genes [15, 16, 17, 18, 14]. Comparing these trees to find their similarities (e.g. agreement or consensus) and dissimilarities, i.e. distance, is thus an important issue in computational molecular biology. The nearest neighbor interchange (nni) distance [29, 28, 34, 3, 6, 2, 19, 20, 23, 33, 22, 21, 26] is a natural distance metric that has been extensively studied. Despite its many appealing aspects such as simplicity and sensitivity to tree topologies, computing this distance has remained very challenging, and many algorithmic and complexity issues about computing this distance have remained unresolved. This paper studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of this distance on weighted phylogenies. The following results answer many open questions about the nni distance posed in the literature. 1. Computing the nni distance between two labeled trees is NP-complete. This solves a 25 year old open question appearing again and again in, for example, [29, 34, 3, 6, 2, 19, 20, 23, 22, 21, 26]. 2. Computing the nni distance between two unlabeled trees is also NPcomplete. This answers an open question in [3] for which an erroneous proof appeared in [23]. 3. Biological applications motivate us to extend the nni distance to weighted phylogenies, where edge weights indicate the time-span of evolution along each edge. We present an O(n2) time approximation algorithm for computing the nni distance on weighted phylogenies with a performance ratio of 4 logn+ 4, where n is the number of leaves in the phylogenies. We also observe that the nni distance is in fact identical to the linear-cost subtree-transfer distance on unweighted phylogenies discussed in [4, 5]. Some consequences of this observation are also discussed. 1991 Mathematics Subject Classification. Primary 68Q17, 68W40; Secondary 68Q25. The results reported here also form a subset of the results that appeared in Proc. 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 1997, pp. 427-436 [4]. The remaining results of the conference paper which do not appear in this paper appeared separately in Algorithmica, Vol. 25, No. 2, pp. 176-195, 1999. The first author was supported by an CGAT (Canadian Genome Analysis and Technology) grant. The second author was supported in part by CGAT and NSF grant 9205982. The third author was supported in part by NSERC Operating Grant OGP0046613 and CGAT. The fourth author was supported by NSERC Operating Grant OGP0046506 and CGAT. The fifth author was supported by an NSERC International Fellowship and CGAT. . Work done while the first author was at University of Waterloo and McMaster University, the second author was visiting at University of Waterloo, the third author was visiting University of Washington, and the fifth and the sixth authors were at University of Waterloo.
计算最近邻交换距离
在分子进化实践中,同一物种群的不同系统发育树通常是由使用不同最优标准的程序[24]或由不同的基因[15,16,17,18,14]产生的。比较这些树来发现它们的相似之处(如一致或一致)和不同之处,即距离,因此是计算分子生物学中的一个重要问题。最近邻互通(nni)距离[29,28,34,3,6,2,19,20,23,33,22,21,26]是一种被广泛研究的自然距离度量。尽管它有许多吸引人的方面,比如对树拓扑的简单性和敏感性,但计算这个距离仍然非常具有挑战性,关于计算这个距离的许多算法和复杂性问题仍然没有解决。本文研究了计算nni距离的复杂度和有效的近似算法,以及该距离在加权系统发育上的自然扩展。以下结果回答了文献中提出的关于nni距离的许多悬而未决的问题。1. 计算两棵标记树之间的nni距离是np完全的。这解决了一个25年前反复出现的开放性问题,例如,[29,34,3,6,2,19,20,23,22,21,26]。2. 计算两棵未标记树之间的nni距离也是NPcomplete。这回答了[3]中的一个开放性问题,而[23]中出现了一个错误的证明。3.生物学应用促使我们将nni距离扩展到加权系统发生,其中边权表示沿每条边的进化时间跨度。我们提出了一种O(n2)时间近似算法来计算加权系统发生的nni距离,其性能比为4 logn+ 4,其中n为系统发生中的叶子数。我们还观察到,nni距离实际上与[4,5]中讨论的未加权系统发育上的线性代价子树传递距离相同。本文还讨论了这一观察结果的一些后果。1991数学学科分类。初级68Q17, 68W40;二次68 q25。这里报告的结果也形成了出现在Proc.第八届ACM-SIAM离散算法研讨会上的结果的子集,1997,pp. 427-436[4]。会议论文未在本文中出现的其余结果分别出现在Algorithmica, Vol. 25, No. 2, pp. 176-195, 1999。第一作者获得了加拿大基因组分析与技术(CGAT)资助。第二作者得到了CGAT和NSF基金9205982的部分支持。第三作者得到了NSERC运营基金OGP0046613和CGAT的部分支持。第四作者由NSERC运营基金OGP0046506和CGAT资助。第五作者获得了NSERC国际奖学金和CGAT的资助。第一作者在滑铁卢大学和麦克马斯特大学期间完成的工作,第二作者在滑铁卢大学访问,第三作者在华盛顿大学访问,第五和第六作者在滑铁卢大学。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信