Peng Cheng, Cong Mao, Jin Tang, Sen Yang, Yu Cheng, Wuke Wang, Qiuxi Gu, Wei Han, Hao Chen, Sihan Li, Yaofeng Chen, Jianglin Zhou, Wuju Li, Aimin Pan, Suwen Zhao, Xingxu Huang, Shiqiang Zhu, Jun Zhang, Wenjie Shu, Shengqi Wang
{"title":"Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering","authors":"Peng Cheng, Cong Mao, Jin Tang, Sen Yang, Yu Cheng, Wuke Wang, Qiuxi Gu, Wei Han, Hao Chen, Sihan Li, Yaofeng Chen, Jianglin Zhou, Wuju Li, Aimin Pan, Suwen Zhao, Xingxu Huang, Shiqiang Zhu, Jun Zhang, Wenjie Shu, Shengqi Wang","doi":"10.1038/s41422-024-00989-2","DOIUrl":null,"url":null,"abstract":"Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.","PeriodicalId":9926,"journal":{"name":"Cell Research","volume":null,"pages":null},"PeriodicalIF":28.1000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41422-024-00989-2.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Research","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41422-024-00989-2","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
期刊介绍:
Cell Research (CR) is an international journal published by Springer Nature in partnership with the Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences (CAS). It focuses on publishing original research articles and reviews in various areas of life sciences, particularly those related to molecular and cell biology. The journal covers a broad range of topics including cell growth, differentiation, and apoptosis; signal transduction; stem cell biology and development; chromatin, epigenetics, and transcription; RNA biology; structural and molecular biology; cancer biology and metabolism; immunity and molecular pathogenesis; molecular and cellular neuroscience; plant molecular and cell biology; and omics, system biology, and synthetic biology. CR is recognized as China's best international journal in life sciences and is part of Springer Nature's prestigious family of Molecular Cell Biology journals.