Computing RF Tree Distance over Succinct Representations

IF 1.8 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Algorithms Pub Date : 2023-12-28 DOI:10.3390/a17010015
Ant'onio Pedro Branco, Cátia Vaz, Alexandre P. Francisco
{"title":"Computing RF Tree Distance over Succinct Representations","authors":"Ant'onio Pedro Branco, Cátia Vaz, Alexandre P. Francisco","doi":"10.3390/a17010015","DOIUrl":null,"url":null,"abstract":"There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools rely on several inference methods available to produce phylogenetic trees, with resulting trees not being unique. Thus, methods for comparing phylogenies that are capable of revealing where two phylogenetic trees agree or differ are required. An approach is then proposed to compute a similarity or dissimilarity measure between trees, with the Robinson–Foulds distance being one of the most used, and which can be computed in linear time and space. Nevertheless, given the large and increasing volume of phylogenetic data, phylogenetic trees are becoming very large with hundreds of thousands of leaves. In this context, space requirements become an issue both while computing tree distances and while storing trees. We propose then an efficient implementation of the Robinson–Foulds distance over tree succinct representations. Our implementation also generalizes the Robinson–Foulds distances to labelled phylogenetic trees, i.e., trees containing labels on all nodes, instead of only on leaves. Experimental results show that we are able to still achieve linear time while requiring less space. Our implementation in C++ is available as an open-source tool.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"20 s9","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17010015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools rely on several inference methods available to produce phylogenetic trees, with resulting trees not being unique. Thus, methods for comparing phylogenies that are capable of revealing where two phylogenetic trees agree or differ are required. An approach is then proposed to compute a similarity or dissimilarity measure between trees, with the Robinson–Foulds distance being one of the most used, and which can be computed in linear time and space. Nevertheless, given the large and increasing volume of phylogenetic data, phylogenetic trees are becoming very large with hundreds of thousands of leaves. In this context, space requirements become an issue both while computing tree distances and while storing trees. We propose then an efficient implementation of the Robinson–Foulds distance over tree succinct representations. Our implementation also generalizes the Robinson–Foulds distances to labelled phylogenetic trees, i.e., trees containing labels on all nodes, instead of only on leaves. Experimental results show that we are able to still achieve linear time while requiring less space. Our implementation in C++ is available as an open-source tool.
在简洁表征上计算射频树距离
有几种工具可用于推断系统发生树,系统发生树描述了生物实体之间的进化关系,如传染病爆发中的病毒和细菌菌株或肿瘤进展树中的癌细胞。这些工具依赖多种推断方法来生成系统发生树,但生成的系统发生树并不是唯一的。因此,需要能够揭示两棵系统发生树相同或不同之处的系统发生比较方法。罗宾逊-富尔德距离(Robinson-Foulds distance)是最常用的方法之一,可以在线性时间和空间内计算。然而,由于系统发育数据量巨大且不断增加,系统发育树变得非常庞大,树叶多达数十万片。在这种情况下,无论是计算系统树距离还是存储系统树,空间需求都成为一个问题。因此,我们提出了一种在树简洁表示上高效实现 Robinson-Foulds 距离的方法。我们的实现方法还将罗宾逊-福尔斯距离推广到带标签的系统发育树,即在所有节点上都包含标签的树,而不是只在叶子上包含标签的树。实验结果表明,我们仍能实现线性时间,同时所需的空间更少。我们的 C++ 实现是一个开源工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Algorithms
Algorithms Mathematics-Numerical Analysis
CiteScore
4.10
自引率
4.30%
发文量
394
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信