Ant'onio Pedro Branco, Cátia Vaz, Alexandre P. Francisco
{"title":"在简洁表征上计算射频树距离","authors":"Ant'onio Pedro Branco, Cátia Vaz, Alexandre P. Francisco","doi":"10.3390/a17010015","DOIUrl":null,"url":null,"abstract":"There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools rely on several inference methods available to produce phylogenetic trees, with resulting trees not being unique. Thus, methods for comparing phylogenies that are capable of revealing where two phylogenetic trees agree or differ are required. An approach is then proposed to compute a similarity or dissimilarity measure between trees, with the Robinson–Foulds distance being one of the most used, and which can be computed in linear time and space. Nevertheless, given the large and increasing volume of phylogenetic data, phylogenetic trees are becoming very large with hundreds of thousands of leaves. In this context, space requirements become an issue both while computing tree distances and while storing trees. We propose then an efficient implementation of the Robinson–Foulds distance over tree succinct representations. Our implementation also generalizes the Robinson–Foulds distances to labelled phylogenetic trees, i.e., trees containing labels on all nodes, instead of only on leaves. Experimental results show that we are able to still achieve linear time while requiring less space. Our implementation in C++ is available as an open-source tool.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"20 s9","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Computing RF Tree Distance over Succinct Representations\",\"authors\":\"Ant'onio Pedro Branco, Cátia Vaz, Alexandre P. Francisco\",\"doi\":\"10.3390/a17010015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools rely on several inference methods available to produce phylogenetic trees, with resulting trees not being unique. Thus, methods for comparing phylogenies that are capable of revealing where two phylogenetic trees agree or differ are required. An approach is then proposed to compute a similarity or dissimilarity measure between trees, with the Robinson–Foulds distance being one of the most used, and which can be computed in linear time and space. Nevertheless, given the large and increasing volume of phylogenetic data, phylogenetic trees are becoming very large with hundreds of thousands of leaves. In this context, space requirements become an issue both while computing tree distances and while storing trees. We propose then an efficient implementation of the Robinson–Foulds distance over tree succinct representations. Our implementation also generalizes the Robinson–Foulds distances to labelled phylogenetic trees, i.e., trees containing labels on all nodes, instead of only on leaves. Experimental results show that we are able to still achieve linear time while requiring less space. Our implementation in C++ is available as an open-source tool.\",\"PeriodicalId\":7636,\"journal\":{\"name\":\"Algorithms\",\"volume\":\"20 s9\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/a17010015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17010015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
有几种工具可用于推断系统发生树,系统发生树描述了生物实体之间的进化关系,如传染病爆发中的病毒和细菌菌株或肿瘤进展树中的癌细胞。这些工具依赖多种推断方法来生成系统发生树,但生成的系统发生树并不是唯一的。因此,需要能够揭示两棵系统发生树相同或不同之处的系统发生比较方法。罗宾逊-富尔德距离(Robinson-Foulds distance)是最常用的方法之一,可以在线性时间和空间内计算。然而,由于系统发育数据量巨大且不断增加,系统发育树变得非常庞大,树叶多达数十万片。在这种情况下,无论是计算系统树距离还是存储系统树,空间需求都成为一个问题。因此,我们提出了一种在树简洁表示上高效实现 Robinson-Foulds 距离的方法。我们的实现方法还将罗宾逊-福尔斯距离推广到带标签的系统发育树,即在所有节点上都包含标签的树,而不是只在叶子上包含标签的树。实验结果表明,我们仍能实现线性时间,同时所需的空间更少。我们的 C++ 实现是一个开源工具。
Computing RF Tree Distance over Succinct Representations
There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools rely on several inference methods available to produce phylogenetic trees, with resulting trees not being unique. Thus, methods for comparing phylogenies that are capable of revealing where two phylogenetic trees agree or differ are required. An approach is then proposed to compute a similarity or dissimilarity measure between trees, with the Robinson–Foulds distance being one of the most used, and which can be computed in linear time and space. Nevertheless, given the large and increasing volume of phylogenetic data, phylogenetic trees are becoming very large with hundreds of thousands of leaves. In this context, space requirements become an issue both while computing tree distances and while storing trees. We propose then an efficient implementation of the Robinson–Foulds distance over tree succinct representations. Our implementation also generalizes the Robinson–Foulds distances to labelled phylogenetic trees, i.e., trees containing labels on all nodes, instead of only on leaves. Experimental results show that we are able to still achieve linear time while requiring less space. Our implementation in C++ is available as an open-source tool.