基于多模态扩散变压器的高级蛋白质工程统一序列-结构编码

IF 7.6 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Xiaohan Lin, Zhenyu Chen, Yanheng Li, Zicheng Ma, Chuanliu Fan, Ziqiang Cao, Shihao Feng, Jun Zhang, Yi Qin Gao
{"title":"基于多模态扩散变压器的高级蛋白质工程统一序列-结构编码","authors":"Xiaohan Lin, Zhenyu Chen, Yanheng Li, Zicheng Ma, Chuanliu Fan, Ziqiang Cao, Shihao Feng, Jun Zhang, Yi Qin Gao","doi":"10.1039/d5sc02055g","DOIUrl":null,"url":null,"abstract":"Modern protein engineering demands integrated sequence-structure representations to tackle key challenges in designing, modifying, and evolving proteins for specific functions. While sequence-based methods are promising for generate novel proteins, incorporating structure-oriented information improves success rate and helps target corresponding functions. Therefore, rather than relying solely on sequence or structure-based approaches, a consensus strategy is essential. Here, we introduce ProTokens, machine-learned “amino acids” derived from structural databases via self-supervised learning, providing a compact yet information-rich representation that bridges sequence and structure modalities. Instead of treating sequences and structures separately, we build PT-DiT, a multimodal diffusion transformer-based model that integrates both into a unified representation, enabling protein engineering in a joint sequence–structure space, streamlining the design process and facilitating the efficient encoding of 3D folds, contextual protein design, sampling of metastable states, and directed evolution for diverse objectives. Therefore, as a unified solution for in-silico protein engineering, PT-DiT leverages sequence and structure insights to realize functional protein design.","PeriodicalId":9909,"journal":{"name":"Chemical Science","volume":"129 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unifying Sequence-Structure Coding for Advanced Protein Engineering via a Multimodal Diffusion Transformer\",\"authors\":\"Xiaohan Lin, Zhenyu Chen, Yanheng Li, Zicheng Ma, Chuanliu Fan, Ziqiang Cao, Shihao Feng, Jun Zhang, Yi Qin Gao\",\"doi\":\"10.1039/d5sc02055g\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern protein engineering demands integrated sequence-structure representations to tackle key challenges in designing, modifying, and evolving proteins for specific functions. While sequence-based methods are promising for generate novel proteins, incorporating structure-oriented information improves success rate and helps target corresponding functions. Therefore, rather than relying solely on sequence or structure-based approaches, a consensus strategy is essential. Here, we introduce ProTokens, machine-learned “amino acids” derived from structural databases via self-supervised learning, providing a compact yet information-rich representation that bridges sequence and structure modalities. Instead of treating sequences and structures separately, we build PT-DiT, a multimodal diffusion transformer-based model that integrates both into a unified representation, enabling protein engineering in a joint sequence–structure space, streamlining the design process and facilitating the efficient encoding of 3D folds, contextual protein design, sampling of metastable states, and directed evolution for diverse objectives. Therefore, as a unified solution for in-silico protein engineering, PT-DiT leverages sequence and structure insights to realize functional protein design.\",\"PeriodicalId\":9909,\"journal\":{\"name\":\"Chemical Science\",\"volume\":\"129 1\",\"pages\":\"\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Science\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1039/d5sc02055g\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Science","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d5sc02055g","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

现代蛋白质工程需要集成的序列结构表示来解决设计,修改和进化特定功能的蛋白质的关键挑战。虽然基于序列的方法有望产生新的蛋白质,但结合面向结构的信息可以提高成功率并有助于定位相应的功能。因此,与其仅仅依赖于基于序列或结构的方法,共识策略是必不可少的。在这里,我们介绍了ProTokens,这是一种机器学习的“氨基酸”,通过自监督学习从结构数据库中获得,提供了一个紧凑而信息丰富的表示,连接了序列和结构模式。我们不是单独处理序列和结构,而是构建了PT-DiT,这是一种基于多模态扩散转换器的模型,将两者集成到一个统一的表示中,使蛋白质工程能够在联合序列结构空间中进行,简化设计过程,促进3D折叠的有效编码,上下文蛋白质设计,亚稳态采样,以及针对不同目标的定向进化。因此,PT-DiT作为硅蛋白工程的统一解决方案,利用序列和结构的洞察力来实现功能蛋白设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Unifying Sequence-Structure Coding for Advanced Protein Engineering via a Multimodal Diffusion Transformer
Modern protein engineering demands integrated sequence-structure representations to tackle key challenges in designing, modifying, and evolving proteins for specific functions. While sequence-based methods are promising for generate novel proteins, incorporating structure-oriented information improves success rate and helps target corresponding functions. Therefore, rather than relying solely on sequence or structure-based approaches, a consensus strategy is essential. Here, we introduce ProTokens, machine-learned “amino acids” derived from structural databases via self-supervised learning, providing a compact yet information-rich representation that bridges sequence and structure modalities. Instead of treating sequences and structures separately, we build PT-DiT, a multimodal diffusion transformer-based model that integrates both into a unified representation, enabling protein engineering in a joint sequence–structure space, streamlining the design process and facilitating the efficient encoding of 3D folds, contextual protein design, sampling of metastable states, and directed evolution for diverse objectives. Therefore, as a unified solution for in-silico protein engineering, PT-DiT leverages sequence and structure insights to realize functional protein design.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Chemical Science
Chemical Science CHEMISTRY, MULTIDISCIPLINARY-
CiteScore
14.40
自引率
4.80%
发文量
1352
审稿时长
2.1 months
期刊介绍: Chemical Science is a journal that encompasses various disciplines within the chemical sciences. Its scope includes publishing ground-breaking research with significant implications for its respective field, as well as appealing to a wider audience in related areas. To be considered for publication, articles must showcase innovative and original advances in their field of study and be presented in a manner that is understandable to scientists from diverse backgrounds. However, the journal generally does not publish highly specialized research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信