Protein multi-level structure feature-integrated deep learning method for mutational effect prediction

IF 3.2 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Ai-Ping Pang, Yongsheng Luo, Junping Zhou, Xue Cai, Lianggang Huang, Bo Zhang, Zhi-Qiang Liu, Yu-Guo Zheng
{"title":"Protein multi-level structure feature-integrated deep learning method for mutational effect prediction","authors":"Ai-Ping Pang,&nbsp;Yongsheng Luo,&nbsp;Junping Zhou,&nbsp;Xue Cai,&nbsp;Lianggang Huang,&nbsp;Bo Zhang,&nbsp;Zhi-Qiang Liu,&nbsp;Yu-Guo Zheng","doi":"10.1002/biot.202400203","DOIUrl":null,"url":null,"abstract":"<p>Through iterative rounds of mutation and selection, proteins can be engineered to enhance their desired biological functions. Nevertheless, identifying optimal mutation sites for directed evolution remains challenging due to the vastness of the protein sequence landscape and the epistatic mutational effects across residues. To address this challenge, we introduce MLSmut, a deep learning-based approach that leverages multi-level structural features of proteins. MLSmut extracts salient information from protein co-evolution, sequence semantics, and geometric features to predict the mutational effect. Extensive benchmark evaluations on 10 single-site and two multi-site deep mutation scanning datasets demonstrate that MLSmut surpasses existing methods in predicting mutational outcomes. To overcome the limited training data availability, we employ a two-stage training strategy: initial coarse-tuning on a large corpus of unlabeled protein data followed by fine-tuning on a curated dataset of 40−100 experimental measurements. This approach enables our model to achieve satisfactory performance on downstream protein prediction tasks. Importantly, our model holds the potential to predict the mutational effects of any protein sequence. Collectively, these findings suggest that our approach can substantially reduce the reliance on laborious wet lab experiments and deepen our understanding of the intricate relationships between mutations and protein function.</p>","PeriodicalId":134,"journal":{"name":"Biotechnology Journal","volume":"19 8","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/biot.202400203","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Through iterative rounds of mutation and selection, proteins can be engineered to enhance their desired biological functions. Nevertheless, identifying optimal mutation sites for directed evolution remains challenging due to the vastness of the protein sequence landscape and the epistatic mutational effects across residues. To address this challenge, we introduce MLSmut, a deep learning-based approach that leverages multi-level structural features of proteins. MLSmut extracts salient information from protein co-evolution, sequence semantics, and geometric features to predict the mutational effect. Extensive benchmark evaluations on 10 single-site and two multi-site deep mutation scanning datasets demonstrate that MLSmut surpasses existing methods in predicting mutational outcomes. To overcome the limited training data availability, we employ a two-stage training strategy: initial coarse-tuning on a large corpus of unlabeled protein data followed by fine-tuning on a curated dataset of 40−100 experimental measurements. This approach enables our model to achieve satisfactory performance on downstream protein prediction tasks. Importantly, our model holds the potential to predict the mutational effects of any protein sequence. Collectively, these findings suggest that our approach can substantially reduce the reliance on laborious wet lab experiments and deepen our understanding of the intricate relationships between mutations and protein function.

Abstract Image

用于突变效应预测的蛋白质多级结构特征集成深度学习方法。
通过一轮又一轮的突变和选择,蛋白质可以被改造以增强其所需的生物功能。然而,由于蛋白质序列景观的广阔性和残基间的表观突变效应,为定向进化确定最佳突变位点仍然具有挑战性。为了应对这一挑战,我们引入了 MLSmut,这是一种基于深度学习的方法,可以利用蛋白质的多层次结构特征。MLSmut 从蛋白质协同进化、序列语义和几何特征中提取突出信息,预测突变效应。在10个单位和2个多位深度突变扫描数据集上进行的广泛基准评估表明,MLSmut在预测突变结果方面超越了现有方法。为了克服训练数据有限的问题,我们采用了两阶段训练策略:首先在大量无标记蛋白质数据集上进行粗调,然后在包含 40-100 个实验测量数据集上进行微调。这种方法使我们的模型在下游蛋白质预测任务中取得了令人满意的性能。重要的是,我们的模型具有预测任何蛋白质序列突变效应的潜力。这些发现共同表明,我们的方法可以大大减少对费力的湿实验室实验的依赖,加深我们对突变与蛋白质功能之间错综复杂关系的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biotechnology Journal
Biotechnology Journal Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
8.90
自引率
2.10%
发文量
123
审稿时长
1.5 months
期刊介绍: Biotechnology Journal (2019 Journal Citation Reports: 3.543) is fully comprehensive in its scope and publishes strictly peer-reviewed papers covering novel aspects and methods in all areas of biotechnology. Some issues are devoted to a special topic, providing the latest information on the most crucial areas of research and technological advances. In addition to these special issues, the journal welcomes unsolicited submissions for primary research articles, such as Research Articles, Rapid Communications and Biotech Methods. BTJ also welcomes proposals of Review Articles - please send in a brief outline of the article and the senior author''s CV to the editorial office. BTJ promotes a special emphasis on: Systems Biotechnology Synthetic Biology and Metabolic Engineering Nanobiotechnology and Biomaterials Tissue engineering, Regenerative Medicine and Stem cells Gene Editing, Gene therapy and Immunotherapy Omics technologies Industrial Biotechnology, Biopharmaceuticals and Biocatalysis Bioprocess engineering and Downstream processing Plant Biotechnology Biosafety, Biotech Ethics, Science Communication Methods and Advances.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信