基于基因转移的系统发育遗传学:通过出生-死亡理论的分析表达和可加性。

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY
Guy Katriel, Udi Mahanaymi, Shelly Brezner, Noor Kezel, Christoph Koutschan, Doron Zeilberger, Mike Steel, Sagi Snir
{"title":"基于基因转移的系统发育遗传学:通过出生-死亡理论的分析表达和可加性。","authors":"Guy Katriel, Udi Mahanaymi, Shelly Brezner, Noor Kezel, Christoph Koutschan, Doron Zeilberger, Mike Steel, Sagi Snir","doi":"10.1093/sysbio/syad060","DOIUrl":null,"url":null,"abstract":"<p><p>The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"1403-1417"},"PeriodicalIF":6.1000,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory.\",\"authors\":\"Guy Katriel, Udi Mahanaymi, Shelly Brezner, Noor Kezel, Christoph Koutschan, Doron Zeilberger, Mike Steel, Sagi Snir\",\"doi\":\"10.1093/sysbio/syad060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.</p>\",\"PeriodicalId\":22120,\"journal\":{\"name\":\"Systematic Biology\",\"volume\":\" \",\"pages\":\"1403-1417\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2023-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systematic Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/sysbio/syad060\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syad060","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

基因组时代为分子系统学开辟了巨大的机遇,其中之一就是详细解读进化史。在这样大量的数据下,分析标准标记的点突变对于精细规模的系统发育学来说往往过于粗糙和缓慢。然而,基因组动力学(GD)事件提供了替代的、通常更丰富的信息。一对基因组之间的同源性指数(SI)结合了基因顺序和基因含量信息,允许对基因含量不等的基因组进行比较,并考虑其共同基因的顺序。最近,基因组动力学被建模为连续时间马尔可夫过程,基因组中的基因距离被建模为出生-死亡迁移过程。然而,由于这种情况下出现的复杂性,无法导出精确且可证明一致的估计量,从而产生启发式解决方案。在这里,我们通过使用出生-死亡理论中的技术来扩展这种建模方法,以导出模型参数的有理函数形式的系统概率动力学的显式表达式。这反过来又使我们能够根据生物体之间的SI推断出分析上准确的距离。随后,我们建立了这一估计进化距离的可加性(产生系统发育一致性的理想特性)。将新的测量方法应用于模拟研究表明,它在现实环境中,甚至在模型扩展下,如基因增益/损失或在树结构上,都能提供准确的结果。在实际数据领域,我们将新的公式应用于我们基于EggNOG数据库的新版本构建的独特数据结构——有序正交数据库,以构建一个具有4.5K以上分类群的树。据我们所知,这是构建的最大的基于基因顺序的树,它克服了以前方法中发现的缺点。如我们所示,构建基于GD的树可以基于其他系统发育方法来确认和对比研究结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory.

The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信