卷积神经网络中Robinson和Foulds距离矩阵的不变变换。

IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Nadia Tahiri, Andrey Veriga, Aleksandr Koshkarov, Boris Morozov
{"title":"卷积神经网络中Robinson和Foulds距离矩阵的不变变换。","authors":"Nadia Tahiri,&nbsp;Andrey Veriga,&nbsp;Aleksandr Koshkarov,&nbsp;Boris Morozov","doi":"10.1142/S0219720022500123","DOIUrl":null,"url":null,"abstract":"<p><p>The evolutionary histories of genes are susceptible of differing greatly from each other which could be explained by evolutionary variations in horizontal gene transfers or biological recombinations. A phylogenetic tree would therefore represent the evolutionary history of each gene, which may present different patterns from the species tree that defines the main evolutionary patterns. In addition, phylogenetic trees of closely related species should be merged, thus minimizing the topological conflicts they present and obtaining consensus trees (in the case of homogeneous data) or supertrees (in the case of heterogeneous data). The traditional approaches are consensus tree inference (if the set of trees contains the same set of species) or supertrees (if the set of trees contains different, but overlapping sets of species). Consensus trees and supertrees are constructed to produce unique trees. However, these methods lose precision with respect to different evolutionary variability. Other approaches have been implemented to preserve this variability using the [Formula: see text]-means algorithm or the [Formula: see text]-medoids algorithm. Using a new method, we determine all possible consensus trees and supertrees that best represent the most significant evolutionary models in a set of phylogenetic trees, thereby increasing the precision of the results and decreasing the time required. <b>Results:</b> This paper presents in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (called CNNTrees) for multiple tree classification. This new strategy returns a number of clusters of the input phylogenetic trees for different-size sets of trees, which makes the new approach more stable and more robust. The paper provides an in-depth analysis of the relevant, but very difficult, problem of constructing alternative supertrees using phylogenies with different but overlapping sets of taxa. This new model will play an important role in the inference of Trees of Life (ToL). <b>Availability and implementation:</b> CNNTrees is available through a web server at https://tahirinadia.github.io/. The source code, data and information about installation procedures are also available at https://github.com/TahiriNadia/CNNTrees. <b>Supplementary information:</b> Supplementary data are available on GitHub platform. The evolutionary history of species is not unique, but is specific to sets of genes. Indeed, each gene has its own evolutionary history that differs considerably from one gene to another. For example, some individual genes or operons may be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene must be represented by its own phylogenetic tree, which may exhibit different evolutionary patterns than the species tree that accounts for the major vertical descent patterns. The result of traditional consensus tree or supertree inference methods is a single consensus tree or supertree. In this paper, we present in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (CNNTrees) to construct multiple tree classification. This new strategy returns a number of clusters in the order of the input trees, which allows this new approach to be more stable and also more robust.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.\",\"authors\":\"Nadia Tahiri,&nbsp;Andrey Veriga,&nbsp;Aleksandr Koshkarov,&nbsp;Boris Morozov\",\"doi\":\"10.1142/S0219720022500123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The evolutionary histories of genes are susceptible of differing greatly from each other which could be explained by evolutionary variations in horizontal gene transfers or biological recombinations. A phylogenetic tree would therefore represent the evolutionary history of each gene, which may present different patterns from the species tree that defines the main evolutionary patterns. In addition, phylogenetic trees of closely related species should be merged, thus minimizing the topological conflicts they present and obtaining consensus trees (in the case of homogeneous data) or supertrees (in the case of heterogeneous data). The traditional approaches are consensus tree inference (if the set of trees contains the same set of species) or supertrees (if the set of trees contains different, but overlapping sets of species). Consensus trees and supertrees are constructed to produce unique trees. However, these methods lose precision with respect to different evolutionary variability. Other approaches have been implemented to preserve this variability using the [Formula: see text]-means algorithm or the [Formula: see text]-medoids algorithm. Using a new method, we determine all possible consensus trees and supertrees that best represent the most significant evolutionary models in a set of phylogenetic trees, thereby increasing the precision of the results and decreasing the time required. <b>Results:</b> This paper presents in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (called CNNTrees) for multiple tree classification. This new strategy returns a number of clusters of the input phylogenetic trees for different-size sets of trees, which makes the new approach more stable and more robust. The paper provides an in-depth analysis of the relevant, but very difficult, problem of constructing alternative supertrees using phylogenies with different but overlapping sets of taxa. This new model will play an important role in the inference of Trees of Life (ToL). <b>Availability and implementation:</b> CNNTrees is available through a web server at https://tahirinadia.github.io/. The source code, data and information about installation procedures are also available at https://github.com/TahiriNadia/CNNTrees. <b>Supplementary information:</b> Supplementary data are available on GitHub platform. The evolutionary history of species is not unique, but is specific to sets of genes. Indeed, each gene has its own evolutionary history that differs considerably from one gene to another. For example, some individual genes or operons may be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene must be represented by its own phylogenetic tree, which may exhibit different evolutionary patterns than the species tree that accounts for the major vertical descent patterns. The result of traditional consensus tree or supertree inference methods is a single consensus tree or supertree. In this paper, we present in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (CNNTrees) to construct multiple tree classification. This new strategy returns a number of clusters in the order of the input trees, which allows this new approach to be more stable and also more robust.</p>\",\"PeriodicalId\":48910,\"journal\":{\"name\":\"Journal of Bioinformatics and Computational Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Bioinformatics and Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1142/S0219720022500123\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Bioinformatics and Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1142/S0219720022500123","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 2

摘要

基因的进化史可能彼此之间有很大的差异,这可以用水平基因转移或生物重组的进化变化来解释。因此,系统发育树将代表每个基因的进化史,它可能呈现与定义主要进化模式的物种树不同的模式。此外,应该合并密切相关物种的系统发育树,从而最大限度地减少它们所呈现的拓扑冲突,并获得共识树(在同类数据的情况下)或超树(在异构数据的情况下)。传统的方法是共识树推理(如果树集包含相同的物种集)或超树(如果树集包含不同但重叠的物种集)。共识树和超树的构造是为了产生唯一树。然而,这些方法相对于不同的进化变异性失去了精度。已经实现了其他方法来保持这种可变性,使用[公式:见文本]-means算法或[公式:见文本]- medioids算法。利用一种新的方法,我们确定了一组系统发育树中最能代表最重要进化模型的所有可能的共识树和超树,从而提高了结果的精度并减少了所需的时间。结果:本文详细介绍了一种利用卷积神经网络(CNN)预测Robinson and Foulds (RF)距离矩阵中簇数的新方法。我们开发了一种新的CNN方法(称为CNNTrees)用于多树分类。这种新策略为不同大小的树集返回许多输入系统发育树的簇,这使得新方法更加稳定和健壮。本文深入分析了利用不同但重叠的分类群系统发育构建替代超树的相关但非常困难的问题。这一新模型将在生命之树(ToL)的推理中发挥重要作用。可用性和实现:CNNTrees可通过web服务器访问https://tahirinadia.github.io/。有关安装过程的源代码、数据和信息也可在https://github.com/TahiriNadia/CNNTrees上获得。补充信息:在GitHub平台上提供补充数据。物种的进化史不是独一无二的,而是特定于一组基因的。事实上,每个基因都有自己的进化历史,而且每个基因之间的差异很大。例如,某些个体基因或操纵子可能受到特定水平基因转移和重组事件的影响。因此,每个基因的进化史必须由它自己的系统发育树来表示,这可能表现出不同的进化模式,而不是物种树,说明主要的垂直下降模式。传统的共识树或超树推理方法的结果是一个单一的共识树或超树。在本文中,我们详细提出了一种使用卷积神经网络(CNN)预测Robinson and Foulds (RF)距离矩阵中簇数的新方法。我们开发了一种新的CNN方法(CNNTrees)来构建多树分类。这种新策略按照输入树的顺序返回许多簇,这使得这种新方法更稳定,也更健壮。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.

The evolutionary histories of genes are susceptible of differing greatly from each other which could be explained by evolutionary variations in horizontal gene transfers or biological recombinations. A phylogenetic tree would therefore represent the evolutionary history of each gene, which may present different patterns from the species tree that defines the main evolutionary patterns. In addition, phylogenetic trees of closely related species should be merged, thus minimizing the topological conflicts they present and obtaining consensus trees (in the case of homogeneous data) or supertrees (in the case of heterogeneous data). The traditional approaches are consensus tree inference (if the set of trees contains the same set of species) or supertrees (if the set of trees contains different, but overlapping sets of species). Consensus trees and supertrees are constructed to produce unique trees. However, these methods lose precision with respect to different evolutionary variability. Other approaches have been implemented to preserve this variability using the [Formula: see text]-means algorithm or the [Formula: see text]-medoids algorithm. Using a new method, we determine all possible consensus trees and supertrees that best represent the most significant evolutionary models in a set of phylogenetic trees, thereby increasing the precision of the results and decreasing the time required. Results: This paper presents in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (called CNNTrees) for multiple tree classification. This new strategy returns a number of clusters of the input phylogenetic trees for different-size sets of trees, which makes the new approach more stable and more robust. The paper provides an in-depth analysis of the relevant, but very difficult, problem of constructing alternative supertrees using phylogenies with different but overlapping sets of taxa. This new model will play an important role in the inference of Trees of Life (ToL). Availability and implementation: CNNTrees is available through a web server at https://tahirinadia.github.io/. The source code, data and information about installation procedures are also available at https://github.com/TahiriNadia/CNNTrees. Supplementary information: Supplementary data are available on GitHub platform. The evolutionary history of species is not unique, but is specific to sets of genes. Indeed, each gene has its own evolutionary history that differs considerably from one gene to another. For example, some individual genes or operons may be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene must be represented by its own phylogenetic tree, which may exhibit different evolutionary patterns than the species tree that accounts for the major vertical descent patterns. The result of traditional consensus tree or supertree inference methods is a single consensus tree or supertree. In this paper, we present in detail a new method for predicting the number of clusters in a Robinson and Foulds (RF) distance matrix using a convolutional neural network (CNN). We developed a new CNN approach (CNNTrees) to construct multiple tree classification. This new strategy returns a number of clusters in the order of the input trees, which allows this new approach to be more stable and also more robust.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Bioinformatics and Computational Biology
Journal of Bioinformatics and Computational Biology MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
2.10
自引率
0.00%
发文量
57
期刊介绍: The Journal of Bioinformatics and Computational Biology aims to publish high quality, original research articles, expository tutorial papers and review papers as well as short, critical comments on technical issues associated with the analysis of cellular information. The research papers will be technical presentations of new assertions, discoveries and tools, intended for a narrower specialist community. The tutorials, reviews and critical commentary will be targeted at a broader readership of biologists who are interested in using computers but are not knowledgeable about scientific computing, and equally, computer scientists who have an interest in biology but are not familiar with current thrusts nor the language of biology. Such carefully chosen tutorials and articles should greatly accelerate the rate of entry of these new creative scientists into the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信