On Two Measures of Distance between Fully-Labelled Trees

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2020-02-13 DOI:10.4230/LIPIcs.CPM.2020.6

G. Bernardini, P. Bonizzoni, Paweł Gawrychowski

{"title":"On Two Measures of Distance between Fully-Labelled Trees","authors":"G. Bernardini, P. Bonizzoni, Paweł Gawrychowski","doi":"10.4230/LIPIcs.CPM.2020.6","DOIUrl":null,"url":null,"abstract":"The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard. \nWe answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on $n$ nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an $O(n^{4/3+o(1)})$ time algorithm for computing the permutation distance between two trees on $n$ nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2020.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard. We answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on $n$ nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an $O(n^{4/3+o(1)})$ time algorithm for computing the permutation distance between two trees on $n$ nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees.

查看原文本刊更多论文

全标记树间距离的两种度量方法

在过去的十年中，数据量的显著增加和各种新的推断方法用于重建各种癌症的详细进化历史。这就需要设计有效的程序来比较代表肿瘤系统发育中突变进化的根树。Bernardini等人[CPM 2019]最近引入了一种基于这种必要性的完全标记树的重排距离概念。这个概念源于两个操作:一个是排列节点的标签，另一个是影响树的拓扑结构。每个操作单独定义了一个可以在多项式时间内计算的距离，而实际的重排距离，将两者结合起来，被证明是np困难的。我们回答了之前工作没有回答的两个开放性问题。首先，计算排列距离的复杂度是多少?第二，是否存在一种常因子近似算法来估计任意两棵树之间的重排距离?我们通过双向约简来回答第一个问题，即计算两棵树在$n$个节点上的排列距离相当于在一个稀疏的二部图中找到最大的基数匹配，直到多对数因子。特别地，通过插入Liu和Sidford [ArXiv 2020]的算法，我们得到了$O(n^{4/3+ O(1)})$ time算法，用于计算$n$个节点上两棵树之间的排列距离。然后，我们对第二个问题进行了肯定的回答，并设计了一个线性时间常数因子近似算法，该算法不需要对树进行任何假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annual Symposium on Combinatorial Pattern Matching

自引率

0.00%

发文量