Species Tree and Reconciliation Estimation under a Duplication-Loss-Coalescence Model

Peng Du, Luay K. Nakhleh
{"title":"Species Tree and Reconciliation Estimation under a Duplication-Loss-Coalescence Model","authors":"Peng Du, Luay K. Nakhleh","doi":"10.1145/3233547.3233600","DOIUrl":null,"url":null,"abstract":"Gene duplication and loss are two evolutionary processes that occur across all three domains of life. These two processes result in different loci, across a set of related genomes, having different gene trees. Inferring the phylogeny of the genomes from data sets of such gene trees is a central task in phylogenomics. Furthermore, when the evolutionary history of the genomes includes short branches, deep coalescence, or incomplete lineage sorting (ILS), could be at play, in addition to duplication and loss, further adding to the complexity of gene/genome relationships. Recently, researchers have developed methods to infer these evolutionary processes by simultaneously modeling gene duplication, loss, and incomplete lineage sorting with respect to a given (fixed) species tree. In this work, we focused on the task of inferring species trees, as well as locus and gene trees, from sequence data in the presence of all three processes. We developed a search heuristic for estimating the maximum a posteriori species/locus/gene tree triad, as well as their associated parameters, from the sequence data of independent gene families. We demonstrate the performance of our method on simulated data and a data set of 200 gene families from six yeast genomes. Our work enables new statistical phylogenomic analyses, particularly when hidden paralogy and incomplete lineage sorting could be simultaneously at play.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Gene duplication and loss are two evolutionary processes that occur across all three domains of life. These two processes result in different loci, across a set of related genomes, having different gene trees. Inferring the phylogeny of the genomes from data sets of such gene trees is a central task in phylogenomics. Furthermore, when the evolutionary history of the genomes includes short branches, deep coalescence, or incomplete lineage sorting (ILS), could be at play, in addition to duplication and loss, further adding to the complexity of gene/genome relationships. Recently, researchers have developed methods to infer these evolutionary processes by simultaneously modeling gene duplication, loss, and incomplete lineage sorting with respect to a given (fixed) species tree. In this work, we focused on the task of inferring species trees, as well as locus and gene trees, from sequence data in the presence of all three processes. We developed a search heuristic for estimating the maximum a posteriori species/locus/gene tree triad, as well as their associated parameters, from the sequence data of independent gene families. We demonstrate the performance of our method on simulated data and a data set of 200 gene families from six yeast genomes. Our work enables new statistical phylogenomic analyses, particularly when hidden paralogy and incomplete lineage sorting could be simultaneously at play.
重复-损失-合并模型下的物种树与协调估计
基因复制和基因丢失是发生在生命所有三个领域的两个进化过程。这两个过程导致不同的基因座,在一组相关的基因组,有不同的基因树。从这些基因树的数据集推断基因组的系统发育是系统基因组学的中心任务。此外,当基因组的进化史包括短分支、深聚结或不完整谱系分类(ILS)时,除了复制和丢失之外,还可能起作用,进一步增加了基因/基因组关系的复杂性。最近,研究人员开发了一种方法,通过同时模拟基因复制、丢失和相对于给定(固定)物种树的不完整谱系分类来推断这些进化过程。在这项工作中,我们专注于从存在这三个过程的序列数据推断物种树,以及位点和基因树的任务。我们开发了一种搜索启发式算法,用于从独立基因家族的序列数据中估计最大的后验物种/位点/基因树三元组,以及它们的相关参数。我们在模拟数据和来自6个酵母基因组的200个基因家族的数据集上证明了我们的方法的性能。我们的工作使新的统计系统基因组分析成为可能,特别是当隐藏的谬误和不完整的谱系分类可以同时发挥作用时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信