MEMO-Stab2：基于多视图序列的深度学习框架，用于预测跨膜蛋白突变诱导的稳定性变化。

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-09-29 DOI:10.1021/acs.jcim.5c01774

Yihang Bao,Zhe Liu,Hui Jin,Han Wang,Weidi Wang,Guan Ning Lin

{"title":"MEMO-Stab2：基于多视图序列的深度学习框架，用于预测跨膜蛋白突变诱导的稳定性变化。","authors":"Yihang Bao,Zhe Liu,Hui Jin,Han Wang,Weidi Wang,Guan Ning Lin","doi":"10.1021/acs.jcim.5c01774","DOIUrl":null,"url":null,"abstract":"Accurately predicting the impact of point mutations on protein thermodynamic stability is essential for understanding structure-function relationships and guiding protein design. This challenge is particularly acute for transmembrane proteins (TMPs), which play vital roles in cellular signaling and drug targeting but remain underrepresented in structural databases. Existing predictors often rely on three-dimensional structures or multiple sequence alignments, limiting their applicability to TMPs due to poor structural coverage and alignment quality. Here, we present MEMO-Stab2, a fast and structure-independent deep learning framework for predicting mutation-induced stability changes in TMPs. MEMO-Stab2 reformulates the task as a binary classification problem, distinguishing destabilizing from neutral mutations based on a ΔΔG threshold of 0.4 kcal/mol. The model integrates multiview features within a Transformer-based architecture, utilizing embeddings from multiple pretrained protein language models (PLMs) and PLM-based structural predictions. By leveraging PLMs, it operates without requiring experimental 3D structures or explicit multiple sequence alignments, implicitly capturing both evolutionary and structural contexts from the amino acid sequence alone. Across internal and external transmembrane mutation data sets, MEMO-Stab2 consistently outperforms existing tools, including specialized predictors and a state-of-the-art general model even after it was fine-tuned on the same domain-specific data, achieving an F1 score of 0.92 on an internal benchmark. Further analyses confirm the model's robustness and specificity. It demonstrates strong generalization across diverse protein families with low sequence identity and shows superior performance in challenging biophysical contexts such as the transmembrane core and interfacial regions. Its validated computational efficiency enables large-scale mutation screening in minutes, offering a practical, robust, and powerful tool for transmembrane protein variant evaluation and engineering.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"19 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MEMO-Stab2: Multi-View Sequence-Based Deep Learning Framework for Predicting Mutation-Induced Stability Changes in Transmembrane Proteins.\",\"authors\":\"Yihang Bao,Zhe Liu,Hui Jin,Han Wang,Weidi Wang,Guan Ning Lin\",\"doi\":\"10.1021/acs.jcim.5c01774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurately predicting the impact of point mutations on protein thermodynamic stability is essential for understanding structure-function relationships and guiding protein design. This challenge is particularly acute for transmembrane proteins (TMPs), which play vital roles in cellular signaling and drug targeting but remain underrepresented in structural databases. Existing predictors often rely on three-dimensional structures or multiple sequence alignments, limiting their applicability to TMPs due to poor structural coverage and alignment quality. Here, we present MEMO-Stab2, a fast and structure-independent deep learning framework for predicting mutation-induced stability changes in TMPs. MEMO-Stab2 reformulates the task as a binary classification problem, distinguishing destabilizing from neutral mutations based on a ΔΔG threshold of 0.4 kcal/mol. The model integrates multiview features within a Transformer-based architecture, utilizing embeddings from multiple pretrained protein language models (PLMs) and PLM-based structural predictions. By leveraging PLMs, it operates without requiring experimental 3D structures or explicit multiple sequence alignments, implicitly capturing both evolutionary and structural contexts from the amino acid sequence alone. Across internal and external transmembrane mutation data sets, MEMO-Stab2 consistently outperforms existing tools, including specialized predictors and a state-of-the-art general model even after it was fine-tuned on the same domain-specific data, achieving an F1 score of 0.92 on an internal benchmark. Further analyses confirm the model's robustness and specificity. It demonstrates strong generalization across diverse protein families with low sequence identity and shows superior performance in challenging biophysical contexts such as the transmembrane core and interfacial regions. Its validated computational efficiency enables large-scale mutation screening in minutes, offering a practical, robust, and powerful tool for transmembrane protein variant evaluation and engineering.\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.5c01774\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01774","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

准确预测点突变对蛋白质热力学稳定性的影响对于理解结构-功能关系和指导蛋白质设计至关重要。跨膜蛋白（TMPs）在细胞信号传导和药物靶向中起着至关重要的作用，但在结构数据库中仍然缺乏代表性，这一挑战对跨膜蛋白（TMPs）来说尤其严重。现有的预测方法通常依赖于三维结构或多个序列比对，由于结构覆盖率和比对质量较差，限制了它们对tmp的适用性。在这里，我们提出了MEMO-Stab2，这是一个快速且与结构无关的深度学习框架，用于预测突变诱导的TMPs稳定性变化。MEMO-Stab2将该任务重新定义为二元分类问题，根据0.4 kcal/mol的ΔΔG阈值区分不稳定突变和中性突变。该模型在基于transformer的架构中集成了多视图功能，利用来自多个预训练蛋白质语言模型（plm）和基于plm的结构预测的嵌入。通过利用plm，它不需要实验3D结构或明确的多序列比对，仅从氨基酸序列中隐含地捕获进化和结构背景。在内部和外部跨膜突变数据集上，MEMO-Stab2始终优于现有工具，包括专门的预测器和最先进的通用模型，即使在对相同领域特定数据进行微调之后，在内部基准测试中获得了0.92的F1分数。进一步的分析证实了模型的稳健性和特异性。它在具有低序列同一性的不同蛋白质家族中具有很强的通用性，并且在具有挑战性的生物物理环境中（如跨膜核心和界面区域）显示出优越的性能。其经过验证的计算效率可在几分钟内实现大规模突变筛选，为跨膜蛋白变异评估和工程提供了实用、稳健和强大的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MEMO-Stab2: Multi-View Sequence-Based Deep Learning Framework for Predicting Mutation-Induced Stability Changes in Transmembrane Proteins.

Accurately predicting the impact of point mutations on protein thermodynamic stability is essential for understanding structure-function relationships and guiding protein design. This challenge is particularly acute for transmembrane proteins (TMPs), which play vital roles in cellular signaling and drug targeting but remain underrepresented in structural databases. Existing predictors often rely on three-dimensional structures or multiple sequence alignments, limiting their applicability to TMPs due to poor structural coverage and alignment quality. Here, we present MEMO-Stab2, a fast and structure-independent deep learning framework for predicting mutation-induced stability changes in TMPs. MEMO-Stab2 reformulates the task as a binary classification problem, distinguishing destabilizing from neutral mutations based on a ΔΔG threshold of 0.4 kcal/mol. The model integrates multiview features within a Transformer-based architecture, utilizing embeddings from multiple pretrained protein language models (PLMs) and PLM-based structural predictions. By leveraging PLMs, it operates without requiring experimental 3D structures or explicit multiple sequence alignments, implicitly capturing both evolutionary and structural contexts from the amino acid sequence alone. Across internal and external transmembrane mutation data sets, MEMO-Stab2 consistently outperforms existing tools, including specialized predictors and a state-of-the-art general model even after it was fine-tuned on the same domain-specific data, achieving an F1 score of 0.92 on an internal benchmark. Further analyses confirm the model's robustness and specificity. It demonstrates strong generalization across diverse protein families with low sequence identity and shows superior performance in challenging biophysical contexts such as the transmembrane core and interfacial regions. Its validated computational efficiency enables large-scale mutation screening in minutes, offering a practical, robust, and powerful tool for transmembrane protein variant evaluation and engineering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.