A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity

IF 13 1区 生物学 Q1 CELL BIOLOGY
Bingxin Zhou, Lirong Zheng, Banghao Wu, Kai Yi, Bozitao Zhong, Yang Tan, Qian Liu, Pietro Liò, Liang Hong
{"title":"A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity","authors":"Bingxin Zhou, Lirong Zheng, Banghao Wu, Kai Yi, Bozitao Zhong, Yang Tan, Qian Liu, Pietro Liò, Liang Hong","doi":"10.1038/s41421-024-00728-2","DOIUrl":null,"url":null,"abstract":"<p>Deep learning-based methods for generating functional proteins address the growing need for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This advancement leads to the development of highly efficient and specialized proteins with diverse applications across scientific, technological, and biomedical fields. This study establishes a pipeline for protein sequence generation with a conditional protein diffusion model, namely CPDiffusion, to create diverse sequences of proteins with enhanced functions. CPDiffusion accommodates protein-specific conditions, such as secondary structures and highly conserved amino acids. Without relying on extensive training data, CPDiffusion effectively captures highly conserved residues and sequence features for specific protein families. We applied CPDiffusion to generate artificial sequences of Argonaute (Ago) proteins based on the backbone structures of wild-type (WT) <i>Kurthia massiliensis</i> Ago (KmAgo) and <i>Pyrococcus furiosus</i> Ago (PfAgo), which are complex multi-domain programmable endonucleases. The generated sequences deviate by up to nearly 400 amino acids from their WT templates. Experimental tests demonstrated that the majority of the generated proteins for both KmAgo and PfAgo show unambiguous activity in DNA cleavage, with many of them exhibiting superior activity as compared to the WT. These findings underscore CPDiffusion’s remarkable success rate in generating novel sequences for proteins with complex structures and functions in a single step, leading to enhanced activity. This approach facilitates the design of enzymes with multi-domain molecular structures and intricate functions through in silico generation and screening, all accomplished without the need for supervision from labeled data.</p>","PeriodicalId":9674,"journal":{"name":"Cell Discovery","volume":null,"pages":null},"PeriodicalIF":13.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Discovery","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41421-024-00728-2","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning-based methods for generating functional proteins address the growing need for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This advancement leads to the development of highly efficient and specialized proteins with diverse applications across scientific, technological, and biomedical fields. This study establishes a pipeline for protein sequence generation with a conditional protein diffusion model, namely CPDiffusion, to create diverse sequences of proteins with enhanced functions. CPDiffusion accommodates protein-specific conditions, such as secondary structures and highly conserved amino acids. Without relying on extensive training data, CPDiffusion effectively captures highly conserved residues and sequence features for specific protein families. We applied CPDiffusion to generate artificial sequences of Argonaute (Ago) proteins based on the backbone structures of wild-type (WT) Kurthia massiliensis Ago (KmAgo) and Pyrococcus furiosus Ago (PfAgo), which are complex multi-domain programmable endonucleases. The generated sequences deviate by up to nearly 400 amino acids from their WT templates. Experimental tests demonstrated that the majority of the generated proteins for both KmAgo and PfAgo show unambiguous activity in DNA cleavage, with many of them exhibiting superior activity as compared to the WT. These findings underscore CPDiffusion’s remarkable success rate in generating novel sequences for proteins with complex structures and functions in a single step, leading to enhanced activity. This approach facilitates the design of enzymes with multi-domain molecular structures and intricate functions through in silico generation and screening, all accomplished without the need for supervision from labeled data.

Abstract Image

条件蛋白质扩散模型可生成具有更强活性的人工可编程内切酶序列
基于深度学习的功能蛋白质生成方法满足了对新型生物催化剂日益增长的需求,可精确定制功能以满足特定要求。这一进步促使人们开发出高效、特异的蛋白质,并在科学、技术和生物医学领域得到广泛应用。本研究利用条件蛋白质扩散模型(即 CPDiffusion)建立了蛋白质序列生成管道,以创建具有增强功能的多样化蛋白质序列。CPDiffusion 可满足蛋白质的特定条件,如二级结构和高度保守的氨基酸。CPDiffusion 无需依赖大量训练数据,就能有效捕捉特定蛋白质家族的高度保守残基和序列特征。我们根据野生型(WT)Kurthia massiliensis Ago(KmAgo)和Pyrococcus furiosus Ago(PfAgo)的骨架结构,应用CPDiffusion生成了Argonaute(Ago)蛋白的人工序列。生成的序列与它们的 WT 模板最多相差近 400 个氨基酸。实验测试表明,生成的大多数 KmAgo 和 PfAgo 蛋白在 DNA 切割方面都表现出明确的活性,其中许多蛋白的活性比 WT 蛋白更强。这些发现突出表明,CPDiffusion 在为具有复杂结构和功能的蛋白质生成新序列方面取得了显著的成功,从而提高了活性。这种方法有助于通过硅学生成和筛选设计具有多域分子结构和复杂功能的酶,而这一切都不需要标记数据的监督。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Cell Discovery
Cell Discovery Biochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
24.20
自引率
0.60%
发文量
120
审稿时长
20 weeks
期刊介绍: Cell Discovery is a cutting-edge, open access journal published by Springer Nature in collaboration with the Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences (CAS). Our aim is to provide a dynamic and accessible platform for scientists to showcase their exceptional original research. Cell Discovery covers a wide range of topics within the fields of molecular and cell biology. We eagerly publish results of great significance and that are of broad interest to the scientific community. With an international authorship and a focus on basic life sciences, our journal is a valued member of Springer Nature's prestigious Molecular Cell Biology journals. In summary, Cell Discovery offers a fresh approach to scholarly publishing, enabling scientists from around the world to share their exceptional findings in molecular and cell biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信