EvoAI enables extreme compression and reconstruction of the protein sequence space.

IF 36.1 1区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Ziyuan Ma, Wenjie Li, Yunhao Shen, Yunxin Xu, Gengjiang Liu, Jiamin Chang, Zeju Li, Hong Qin, Boxue Tian, Haipeng Gong, David R Liu, B W Thuronyi, Christopher A Voigt, Shuyi Zhang
{"title":"EvoAI enables extreme compression and reconstruction of the protein sequence space.","authors":"Ziyuan Ma, Wenjie Li, Yunhao Shen, Yunxin Xu, Gengjiang Liu, Jiamin Chang, Zeju Li, Hong Qin, Boxue Tian, Haipeng Gong, David R Liu, B W Thuronyi, Christopher A Voigt, Shuyi Zhang","doi":"10.1038/s41592-024-02504-2","DOIUrl":null,"url":null,"abstract":"<p><p>Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here we establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 10<sup>48</sup>. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.</p>","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":" ","pages":""},"PeriodicalIF":36.1000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41592-024-02504-2","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here we establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 1048. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.

EvoAI 实现了蛋白质序列空间的极度压缩和重建。
设计功能更强的蛋白质需要深入了解序列与功能之间的关系,这是一个难以探索的巨大空间。通过识别功能上的重要特征来有效压缩这一空间的能力极具价值。在这里,我们建立了一种名为 EvoScan 的方法,对高拟合序列空间进行全面分割和扫描,以获得捕捉其基本特征的锚点,尤其是在高维度上。我们的方法与任何可与转录输出耦合的生物分子功能兼容。然后,我们开发了深度学习和大型语言模型,以便从这些锚点准确地重建空间,从而在没有同源或结构信息的情况下对新颖的高拟合度序列进行计算预测。我们将这种实验-计算混合方法(我们称之为 EvoAI)应用于一个抑制蛋白,发现只有 82 个锚足以压缩高拟合度序列空间,压缩比为 1048。空间的极度可压缩性为应用生物分子设计和理解自然进化提供了信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Methods
Nature Methods 生物-生化研究方法
CiteScore
58.70
自引率
1.70%
发文量
326
审稿时长
1 months
期刊介绍: Nature Methods is a monthly journal that focuses on publishing innovative methods and substantial enhancements to fundamental life sciences research techniques. Geared towards a diverse, interdisciplinary readership of researchers in academia and industry engaged in laboratory work, the journal offers new tools for research and emphasizes the immediate practical significance of the featured work. It publishes primary research papers and reviews recent technical and methodological advancements, with a particular interest in primary methods papers relevant to the biological and biomedical sciences. This includes methods rooted in chemistry with practical applications for studying biological problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信