Consensus methods using phylogenetic databases

2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05) Pub Date : 2005-08-08 DOI:10.1109/CSBW.2005.43

M. Kulkarni, Bernard M. E. Moret

{"title":"Consensus methods using phylogenetic databases","authors":"M. Kulkarni, Bernard M. E. Moret","doi":"10.1109/CSBW.2005.43","DOIUrl":null,"url":null,"abstract":"With the increasing use and size of phytogenies, the output of reconstruction programs must be stored for future reference, in which case post-tree analyses such as consensus must be run from a database. We set out to determine whether such analyses can be run at a reasonable cost; we chose consensus (which summarizes the information from many trees into a single tree) because of its general applicability and because it creates a severe demand on the database by requiring examination of every edge of every tree. We preprocess the data (trees) to create tables that support consensus computations, using our own extensions to the PhyloDB schema of Nakhleh et al. For each of the three consensus methods (strict, majority, and greedy), we compare the database computation with the memory-resident computation using the Phylip consensus programs. We use a large selection of datasets of varying sizes (up to 1,000 trees of up to 1,500 taxa each) and of varying degrees of commonality. The computations from the database are very practical: they often run faster, and never run more than 5 times slower, than the computations in main memory using Phylip. The additional storage costs are easily handled by any database system, while the preprocessing costs remain reasonable. Thus suitable preprocessing of phylogenetic data allows post-tree analyses to be run directly from the database at much the same cost as current memory-resident analyses.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSBW.2005.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

With the increasing use and size of phytogenies, the output of reconstruction programs must be stored for future reference, in which case post-tree analyses such as consensus must be run from a database. We set out to determine whether such analyses can be run at a reasonable cost; we chose consensus (which summarizes the information from many trees into a single tree) because of its general applicability and because it creates a severe demand on the database by requiring examination of every edge of every tree. We preprocess the data (trees) to create tables that support consensus computations, using our own extensions to the PhyloDB schema of Nakhleh et al. For each of the three consensus methods (strict, majority, and greedy), we compare the database computation with the memory-resident computation using the Phylip consensus programs. We use a large selection of datasets of varying sizes (up to 1,000 trees of up to 1,500 taxa each) and of varying degrees of commonality. The computations from the database are very practical: they often run faster, and never run more than 5 times slower, than the computations in main memory using Phylip. The additional storage costs are easily handled by any database system, while the preprocessing costs remain reasonable. Thus suitable preprocessing of phylogenetic data allows post-tree analyses to be run directly from the database at much the same cost as current memory-resident analyses.

查看原文本刊更多论文

使用系统发育数据库的共识方法

随着植物源的使用和规模的增加，重建程序的输出必须被存储以供将来参考，在这种情况下，必须从数据库中运行后树分析，如共识。我们开始确定这些分析是否可以在合理的成本下进行;我们选择了共识(将许多树的信息总结到一棵树中)，因为它具有普遍的适用性，并且因为它需要检查每棵树的每条边，从而对数据库产生了严格的要求。我们对数据(树)进行预处理，以创建支持共识计算的表，使用我们自己对Nakhleh等人的PhyloDB模式的扩展。对于三种共识方法(严格、多数和贪婪)中的每一种，我们使用phillip共识程序将数据库计算与内存驻留计算进行比较。我们使用了大量不同大小的数据集(多达1000棵树，每棵树多达1500个分类群)和不同程度的共性。来自数据库的计算非常实用:它们通常运行得更快，并且运行速度不会比使用phillip的主内存中的计算慢5倍以上。额外的存储成本很容易被任何数据库系统处理，而预处理成本仍然是合理的。因此，系统发育数据的适当预处理允许直接从数据库运行后树分析，其成本与当前内存驻留分析的成本大致相同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)

自引率

0.00%

发文量