A General Substitution Matrix for Structural Phylogenetics.

IF 11 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Sriram G Garg, Georg K A Hochberg
{"title":"A General Substitution Matrix for Structural Phylogenetics.","authors":"Sriram G Garg, Georg K A Hochberg","doi":"10.1093/molbev/msaf124","DOIUrl":null,"url":null,"abstract":"<p><p>Sequence-based maximum likelihood phylogenetics is a widely used method for inferring evolutionary relationships, which has illuminated the evolutionary histories of proteins and the organisms that harbor them. However, modern implementations with sophisticated models of sequence evolution struggle to resolve deep evolutionary relationships, which can be obscured by excessive sequence divergence and substitution saturation. Structural phylogenetics has emerged as a promising alternative because protein structure evolves much more slowly than protein sequences. Recent developments in protein structure prediction using AI have made it possible to predict protein structures for entire protein families and then to translate these structures into a sequence representation-the 3Di structural alphabet-that can in theory be directly fed into existing sequence-based phylogenetic software. To unlock the full potential of this idea, however, requires the inference of a general substitution matrix for structural phylogenetics, which has so far been missing. Here, we infer this matrix from large datasets of protein structures and show that it results in a better fit to empirical datasets than previous approaches. We then use this matrix to re-visit the question of the root of the tree of life. Using structural phylogenies of universal paralogs, we provide the first unambiguous evidence for a root between archaea and bacteria. Finally, we discuss some practical and conceptual limitations of structural phylogenetics. Our 3Di substitution matrix provides a starting point for revisiting many deep phylogenetic problems that have so far been extremely difficult to solve.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12198762/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf124","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Sequence-based maximum likelihood phylogenetics is a widely used method for inferring evolutionary relationships, which has illuminated the evolutionary histories of proteins and the organisms that harbor them. However, modern implementations with sophisticated models of sequence evolution struggle to resolve deep evolutionary relationships, which can be obscured by excessive sequence divergence and substitution saturation. Structural phylogenetics has emerged as a promising alternative because protein structure evolves much more slowly than protein sequences. Recent developments in protein structure prediction using AI have made it possible to predict protein structures for entire protein families and then to translate these structures into a sequence representation-the 3Di structural alphabet-that can in theory be directly fed into existing sequence-based phylogenetic software. To unlock the full potential of this idea, however, requires the inference of a general substitution matrix for structural phylogenetics, which has so far been missing. Here, we infer this matrix from large datasets of protein structures and show that it results in a better fit to empirical datasets than previous approaches. We then use this matrix to re-visit the question of the root of the tree of life. Using structural phylogenies of universal paralogs, we provide the first unambiguous evidence for a root between archaea and bacteria. Finally, we discuss some practical and conceptual limitations of structural phylogenetics. Our 3Di substitution matrix provides a starting point for revisiting many deep phylogenetic problems that have so far been extremely difficult to solve.

结构系统发育的一般替换矩阵。
基于序列的最大似然(ML)系统发育学是一种广泛使用的推断进化关系的方法,它阐明了蛋白质和承载它们的生物体的进化史。但是,具有复杂序列进化模型的现代实现难以解决深层进化关系,这些关系可能被过度的序列分化和替代饱和所掩盖。结构系统发育学已经成为一种很有前途的替代方法,因为蛋白质结构的进化比蛋白质序列的进化要慢得多。利用人工智能进行蛋白质结构预测的最新发展使得预测整个蛋白质家族的蛋白质结构成为可能,然后将这些结构翻译成序列表示- 3Di结构字母表-理论上可以直接输入现有的基于序列的系统发育软件。然而,为了充分发挥这一想法的潜力,需要对结构系统发育的一般替代矩阵进行推断,这一点迄今为止一直缺失。在这里,我们从蛋白质结构的大型数据集推断出这个矩阵,并表明它比以前的方法更适合经验数据集。然后我们用这个矩阵来重新审视生命之树的根的问题。利用结构系统发育的普遍类比,我们提供了第一个明确的证据,古生菌和细菌之间的根。最后,我们讨论了结构系统发育的一些实际和概念上的局限性。我们的3Di替代矩阵为重新审视许多迄今为止极其难以解决的深层系统发育问题提供了一个起点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular biology and evolution
Molecular biology and evolution 生物-进化生物学
CiteScore
19.70
自引率
3.70%
发文量
257
审稿时长
1 months
期刊介绍: Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信