Reconstruction of Ancestral Protein Sequences Using Autoregressive Generative Models.

IF 5.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Matteo De Leonardis, Andrea Pagnani, Pierre Barrat-Charlaix
{"title":"Reconstruction of Ancestral Protein Sequences Using Autoregressive Generative Models.","authors":"Matteo De Leonardis, Andrea Pagnani, Pierre Barrat-Charlaix","doi":"10.1093/molbev/msaf070","DOIUrl":null,"url":null,"abstract":"<p><p>Ancestral sequence reconstruction (ASR) is an important tool to understand how protein structure and function changed over the course of evolution. It essentially relies on models of sequence evolution that can quantitatively describe changes in a sequence over time. Such models usually consider that sequence positions evolve independently from each other and neglect epistasis: the context-dependence of the effect of mutations. On the other hand, the last years have seen major developments in the field of generative protein models, which learn constraints associated with structure and function from large ensembles of evolutionarily related proteins. Here, we show that it is possible to extend a specific type of generative model to describe the evolution of sequences in time while taking epistasis into account. We apply the developed technique to the problem of ASR: given a protein family and its evolutionary tree, we try to infer the sequences of extinct ancestors. Using both simulations and data coming from experimental evolution we show that our method outperforms state-of-the-art ones. Moreover, it allows for sampling a greater diversity of potential ancestors, allowing for a less biased characterization of ancestral sequences.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12006719/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf070","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Ancestral sequence reconstruction (ASR) is an important tool to understand how protein structure and function changed over the course of evolution. It essentially relies on models of sequence evolution that can quantitatively describe changes in a sequence over time. Such models usually consider that sequence positions evolve independently from each other and neglect epistasis: the context-dependence of the effect of mutations. On the other hand, the last years have seen major developments in the field of generative protein models, which learn constraints associated with structure and function from large ensembles of evolutionarily related proteins. Here, we show that it is possible to extend a specific type of generative model to describe the evolution of sequences in time while taking epistasis into account. We apply the developed technique to the problem of ASR: given a protein family and its evolutionary tree, we try to infer the sequences of extinct ancestors. Using both simulations and data coming from experimental evolution we show that our method outperforms state-of-the-art ones. Moreover, it allows for sampling a greater diversity of potential ancestors, allowing for a less biased characterization of ancestral sequences.

利用自回归生成模型重建祖先蛋白序列。
祖先序列重建(ASR)是了解蛋白质结构和功能在进化过程中如何变化的重要工具。它本质上依赖于序列进化模型,这些模型可以定量地描述序列随时间的变化。这些模型通常认为序列位置相互独立地进化,而忽略了上位性:突变效应的上下文依赖性。另一方面,过去几年在生成蛋白模型领域取得了重大进展,该模型从进化相关蛋白质的大集合中学习与结构和功能相关的约束。在这里,我们证明了在考虑上位性的情况下,扩展特定类型的生成模型来描述序列的时间进化是可能的。我们将发展的技术应用于祖先序列重建(ASR)问题:给定一个蛋白质家族及其进化树,我们试图推断已灭绝祖先的序列。使用模拟和来自实验进化的数据,我们表明我们的方法优于最先进的方法。此外,它允许对潜在祖先进行更大多样性的采样,允许对祖先序列进行较少偏见的表征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular biology and evolution
Molecular biology and evolution 生物-进化生物学
CiteScore
19.70
自引率
3.70%
发文量
257
审稿时长
1 months
期刊介绍: Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信