Yihang Bao,Zhe Liu,Hui Jin,Han Wang,Weidi Wang,Guan Ning Lin
{"title":"MEMO-Stab2: Multi-View Sequence-Based Deep Learning Framework for Predicting Mutation-Induced Stability Changes in Transmembrane Proteins.","authors":"Yihang Bao,Zhe Liu,Hui Jin,Han Wang,Weidi Wang,Guan Ning Lin","doi":"10.1021/acs.jcim.5c01774","DOIUrl":null,"url":null,"abstract":"Accurately predicting the impact of point mutations on protein thermodynamic stability is essential for understanding structure-function relationships and guiding protein design. This challenge is particularly acute for transmembrane proteins (TMPs), which play vital roles in cellular signaling and drug targeting but remain underrepresented in structural databases. Existing predictors often rely on three-dimensional structures or multiple sequence alignments, limiting their applicability to TMPs due to poor structural coverage and alignment quality. Here, we present MEMO-Stab2, a fast and structure-independent deep learning framework for predicting mutation-induced stability changes in TMPs. MEMO-Stab2 reformulates the task as a binary classification problem, distinguishing destabilizing from neutral mutations based on a ΔΔG threshold of 0.4 kcal/mol. The model integrates multiview features within a Transformer-based architecture, utilizing embeddings from multiple pretrained protein language models (PLMs) and PLM-based structural predictions. By leveraging PLMs, it operates without requiring experimental 3D structures or explicit multiple sequence alignments, implicitly capturing both evolutionary and structural contexts from the amino acid sequence alone. Across internal and external transmembrane mutation data sets, MEMO-Stab2 consistently outperforms existing tools, including specialized predictors and a state-of-the-art general model even after it was fine-tuned on the same domain-specific data, achieving an F1 score of 0.92 on an internal benchmark. Further analyses confirm the model's robustness and specificity. It demonstrates strong generalization across diverse protein families with low sequence identity and shows superior performance in challenging biophysical contexts such as the transmembrane core and interfacial regions. Its validated computational efficiency enables large-scale mutation screening in minutes, offering a practical, robust, and powerful tool for transmembrane protein variant evaluation and engineering.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"19 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01774","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurately predicting the impact of point mutations on protein thermodynamic stability is essential for understanding structure-function relationships and guiding protein design. This challenge is particularly acute for transmembrane proteins (TMPs), which play vital roles in cellular signaling and drug targeting but remain underrepresented in structural databases. Existing predictors often rely on three-dimensional structures or multiple sequence alignments, limiting their applicability to TMPs due to poor structural coverage and alignment quality. Here, we present MEMO-Stab2, a fast and structure-independent deep learning framework for predicting mutation-induced stability changes in TMPs. MEMO-Stab2 reformulates the task as a binary classification problem, distinguishing destabilizing from neutral mutations based on a ΔΔG threshold of 0.4 kcal/mol. The model integrates multiview features within a Transformer-based architecture, utilizing embeddings from multiple pretrained protein language models (PLMs) and PLM-based structural predictions. By leveraging PLMs, it operates without requiring experimental 3D structures or explicit multiple sequence alignments, implicitly capturing both evolutionary and structural contexts from the amino acid sequence alone. Across internal and external transmembrane mutation data sets, MEMO-Stab2 consistently outperforms existing tools, including specialized predictors and a state-of-the-art general model even after it was fine-tuned on the same domain-specific data, achieving an F1 score of 0.92 on an internal benchmark. Further analyses confirm the model's robustness and specificity. It demonstrates strong generalization across diverse protein families with low sequence identity and shows superior performance in challenging biophysical contexts such as the transmembrane core and interfacial regions. Its validated computational efficiency enables large-scale mutation screening in minutes, offering a practical, robust, and powerful tool for transmembrane protein variant evaluation and engineering.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.