Resolving the Gene Tree and Species Tree Problem by Phylogenetic Mining

Proceedings of the ... Asia-Pacific bioinformatics conference Pub Date : 2005-12-01 DOI:10.1142/9781860947292_0032

Xiaoxu Han

引用次数: 1

Abstract

The gene tree and species tree problem remains a central problem in phylogenomics. To overcome this problem, gene concatenation approaches have been used to combine a certain number of genes randomly from a set of widely distributed orthologous genes selected from genome data to conduct phylogenetic analysis. The random concatenation mechanism prevents us from the further investigations of the inner structures of the gene data set employed to infer the phylogenetic trees and locates the most phylogenetically informative genes. In this work, a phylogenomic mining approach is described to gain knowledge from a gene data set by clustering genes in the gene set through a self-organizing map (SOM) to explore the gene dataset inner structures. From this, the most phylogenetically informative gene set is created by picking the maximum entropy gene from each cluster to infer phylogenetic trees by phylogenetic analysis. Using the same data set, the phylogenetic mining approach performs better than the random gene concatenation approach.

查看原文本刊更多论文

用系统发育挖掘解决基因树和物种树问题

基因树和物种树问题仍然是系统基因组学的核心问题。为了克服这一问题，人们采用基因串联方法，从基因组数据中选择一组分布广泛的同源基因，随机组合一定数量的基因进行系统发育分析。随机连接机制使我们无法进一步研究用于推断系统发育树和定位最具系统发育信息基因的基因数据集的内部结构。在这项工作中，描述了一种系统基因组挖掘方法，通过自组织图谱(SOM)对基因集中的基因进行聚类，以探索基因数据集的内部结构，从而从基因数据集中获得知识。在此基础上，从每个聚类中选取熵值最大的基因，通过系统发育分析推断出系统发育树，从而得到系统发育信息量最大的基因集。使用相同的数据集，系统发育挖掘方法比随机基因连接方法性能更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... Asia-Pacific bioinformatics conference

自引率

0.00%

发文量