{"title":"Species Tree and Reconciliation Estimation under a Duplication-Loss-Coalescence Model","authors":"Peng Du, Luay K. Nakhleh","doi":"10.1145/3233547.3233600","DOIUrl":null,"url":null,"abstract":"Gene duplication and loss are two evolutionary processes that occur across all three domains of life. These two processes result in different loci, across a set of related genomes, having different gene trees. Inferring the phylogeny of the genomes from data sets of such gene trees is a central task in phylogenomics. Furthermore, when the evolutionary history of the genomes includes short branches, deep coalescence, or incomplete lineage sorting (ILS), could be at play, in addition to duplication and loss, further adding to the complexity of gene/genome relationships. Recently, researchers have developed methods to infer these evolutionary processes by simultaneously modeling gene duplication, loss, and incomplete lineage sorting with respect to a given (fixed) species tree. In this work, we focused on the task of inferring species trees, as well as locus and gene trees, from sequence data in the presence of all three processes. We developed a search heuristic for estimating the maximum a posteriori species/locus/gene tree triad, as well as their associated parameters, from the sequence data of independent gene families. We demonstrate the performance of our method on simulated data and a data set of 200 gene families from six yeast genomes. Our work enables new statistical phylogenomic analyses, particularly when hidden paralogy and incomplete lineage sorting could be simultaneously at play.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Gene duplication and loss are two evolutionary processes that occur across all three domains of life. These two processes result in different loci, across a set of related genomes, having different gene trees. Inferring the phylogeny of the genomes from data sets of such gene trees is a central task in phylogenomics. Furthermore, when the evolutionary history of the genomes includes short branches, deep coalescence, or incomplete lineage sorting (ILS), could be at play, in addition to duplication and loss, further adding to the complexity of gene/genome relationships. Recently, researchers have developed methods to infer these evolutionary processes by simultaneously modeling gene duplication, loss, and incomplete lineage sorting with respect to a given (fixed) species tree. In this work, we focused on the task of inferring species trees, as well as locus and gene trees, from sequence data in the presence of all three processes. We developed a search heuristic for estimating the maximum a posteriori species/locus/gene tree triad, as well as their associated parameters, from the sequence data of independent gene families. We demonstrate the performance of our method on simulated data and a data set of 200 gene families from six yeast genomes. Our work enables new statistical phylogenomic analyses, particularly when hidden paralogy and incomplete lineage sorting could be simultaneously at play.