{"title":"Reconciliation with non-binary species trees.","authors":"B Vernot, M Stolzer, A Goldman, D Durand","doi":"10.1142/9781860948732_0044","DOIUrl":null,"url":null,"abstract":"<p><p>Reconciliation is the process of resolving disagreement between gene and species trees, by invoking gene duplications and losses to explain topological incongruence. The resulting inferred duplication histories are a valuable source of information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. Reconciliation for binary trees is a tractable and well studied problem. However, a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and deep coalescence. We present the first formal algorithm for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Using a space efficient mapping from gene to species tree, our algorithm infers the minimum number of duplications and losses in O(|V(G)| . (k(S) + h(S))) time, where V(G) is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the width of its largest multifurcation. We also present a dynamic programming algorithm for a combined loss model, in which losses in sibling species may be represented as a single loss in the common ancestor. Our algorithms have been implemented in NOTUNG, a robust, production quality tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"441-52"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"73","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860948732_0044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 73
Abstract
Reconciliation is the process of resolving disagreement between gene and species trees, by invoking gene duplications and losses to explain topological incongruence. The resulting inferred duplication histories are a valuable source of information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. Reconciliation for binary trees is a tractable and well studied problem. However, a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and deep coalescence. We present the first formal algorithm for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Using a space efficient mapping from gene to species tree, our algorithm infers the minimum number of duplications and losses in O(|V(G)| . (k(S) + h(S))) time, where V(G) is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the width of its largest multifurcation. We also present a dynamic programming algorithm for a combined loss model, in which losses in sibling species may be represented as a single loss in the common ancestor. Our algorithms have been implemented in NOTUNG, a robust, production quality tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.