High fitness paths can connect proteins with low sequence overlap.

ArXiv Pub Date : 2024-11-13
Pranav Kantroo, Günter P Wagner, Benjamin B Machta
{"title":"High fitness paths can connect proteins with low sequence overlap.","authors":"Pranav Kantroo, Günter P Wagner, Benjamin B Machta","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601789/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The structure and function of a protein are determined by its amino acid sequence. While random mutations change a protein's sequence, evolutionary forces shape its structural fold and biological activity. Studies have shown that neutral networks can connect a local region of sequence space by single residue mutations that preserve viability. However, the larger-scale connectedness of protein morphospace remains poorly understood. Recent advances in artificial intelligence have enabled us to computationally predict a protein's structure and quantify its functional plausibility. Here we build on these tools to develop an algorithm that generates viable paths between distantly related extant protein pairs. The intermediate sequences in these paths differ by single residue changes over subsequent steps - substitutions, insertions and deletions are admissible moves. Their fitness is evaluated using the protein language model ESM2, and maintained as high as possible subject to the constraints of the traversal. We document the qualitative variation across paths generated between progressively divergent protein pairs, some of which do not even acquire the same structural fold. The ease of interpolating between two sequences could be used as a proxy for the likelihood of homology between them.

高匹配度路径可以连接序列重叠度较低的蛋白质。
蛋白质的结构和功能由其氨基酸序列决定。随机突变会改变蛋白质的序列,而进化力量则会塑造其结构折叠和生物活性。研究表明,中性网络可以通过单残基突变将序列空间的局部区域连接起来,从而保持活力。然而,人们对蛋白质形态空间更大规模的连接性仍然知之甚少。人工智能的最新进展使我们能够通过计算预测蛋白质的结构,并量化其功能合理性。在这里,我们以这些工具为基础,开发了一种算法,可以在远缘的现存蛋白质对之间生成可行的路径。这些路径中的中间序列在随后的步骤中因单个残基变化而不同--替换、插入和删除都是允许的动作。我们使用蛋白质语言模型 ESM2 对它们的适配性进行评估,并在遍历的限制条件下尽可能保持较高的适配性。我们记录了渐进式差异蛋白质对之间生成路径的质量变化,其中一些甚至没有获得相同的结构折叠。两个序列之间插值的难易程度可以代表它们之间同源性的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信