Cui-Feng Li, Zihao Yan, Fang Ge, Xuan Yu, Jing Zhang, Ming Zhang* and Dong-Jun Yu*,
{"title":"TransABseq:基于蛋白质序列突变预测抗原-抗体结合亲和力变化的两阶段方法","authors":"Cui-Feng Li, Zihao Yan, Fang Ge, Xuan Yu, Jing Zhang, Ming Zhang* and Dong-Jun Yu*, ","doi":"10.1021/acs.jcim.5c0047810.1021/acs.jcim.5c00478","DOIUrl":null,"url":null,"abstract":"<p >The antigen–antibody interaction represents a critical mechanism in host defense, contributing to pathogen neutralization, tumor surveillance, immunotherapy, and in vitro disease detection. Owing to their exceptional specificity, affinity, and selectivity, antibodies have been extensively utilized in the development of clinical diagnostic, therapeutic, and prophylactic strategies. In this study, we propose TransABseq, a novel computational framework specifically designed to predict the effects of missense mutations on antigen–antibody interactions. The model’s innovative two-stage architecture enables comprehensive feature analysis: in the first stage, multiple embeddings of protein language models are processed through a Transformer encoder module and a multiscale convolutional module; in the second stage, the XGBOOST model is used to perform quantitative output based on the deeply fused features. A critical advancement contributing to the effectiveness of TransABseq is the deep feature fusion strategy, which reveals the biochemical properties of proteins. By leveraging the multilayer self-attention mechanism of the Transformer to capture complex global dependencies within sequences and mining features at different hierarchical levels through multiscale convolution, the feature abstraction capability of TransABseq is significantly enhanced. We evaluated TransABseq through three distinct cross-validation strategies on two established benchmarks and a newly reconstructed data set. As a result, TransABseq achieved average PCC values of 0.607, 0.843, and 0.794 and average RMSE values of 1.166, 1.314, and 1.337 kcal/mol in 10-fold cross-validation. Furthermore, its robustness and predictive accuracy were validated on blind test data sets, where TransABseq outperformed existing methods, enabling it to attain a PCC of 0.721 and an RMSE of 0.925 kcal/mol. The relevant data and code have been made publicly available for academic research at: https://github.com/cuifengLI/TransABseq.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 10","pages":"5188–5204 5188–5204"},"PeriodicalIF":5.3000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TransABseq: A Two-Stage Approach for Predicting Antigen–Antibody Binding Affinity Changes upon Mutation Based on Protein Sequences\",\"authors\":\"Cui-Feng Li, Zihao Yan, Fang Ge, Xuan Yu, Jing Zhang, Ming Zhang* and Dong-Jun Yu*, \",\"doi\":\"10.1021/acs.jcim.5c0047810.1021/acs.jcim.5c00478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >The antigen–antibody interaction represents a critical mechanism in host defense, contributing to pathogen neutralization, tumor surveillance, immunotherapy, and in vitro disease detection. Owing to their exceptional specificity, affinity, and selectivity, antibodies have been extensively utilized in the development of clinical diagnostic, therapeutic, and prophylactic strategies. In this study, we propose TransABseq, a novel computational framework specifically designed to predict the effects of missense mutations on antigen–antibody interactions. The model’s innovative two-stage architecture enables comprehensive feature analysis: in the first stage, multiple embeddings of protein language models are processed through a Transformer encoder module and a multiscale convolutional module; in the second stage, the XGBOOST model is used to perform quantitative output based on the deeply fused features. A critical advancement contributing to the effectiveness of TransABseq is the deep feature fusion strategy, which reveals the biochemical properties of proteins. By leveraging the multilayer self-attention mechanism of the Transformer to capture complex global dependencies within sequences and mining features at different hierarchical levels through multiscale convolution, the feature abstraction capability of TransABseq is significantly enhanced. We evaluated TransABseq through three distinct cross-validation strategies on two established benchmarks and a newly reconstructed data set. As a result, TransABseq achieved average PCC values of 0.607, 0.843, and 0.794 and average RMSE values of 1.166, 1.314, and 1.337 kcal/mol in 10-fold cross-validation. Furthermore, its robustness and predictive accuracy were validated on blind test data sets, where TransABseq outperformed existing methods, enabling it to attain a PCC of 0.721 and an RMSE of 0.925 kcal/mol. The relevant data and code have been made publicly available for academic research at: https://github.com/cuifengLI/TransABseq.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 10\",\"pages\":\"5188–5204 5188–5204\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00478\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00478","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
TransABseq: A Two-Stage Approach for Predicting Antigen–Antibody Binding Affinity Changes upon Mutation Based on Protein Sequences
The antigen–antibody interaction represents a critical mechanism in host defense, contributing to pathogen neutralization, tumor surveillance, immunotherapy, and in vitro disease detection. Owing to their exceptional specificity, affinity, and selectivity, antibodies have been extensively utilized in the development of clinical diagnostic, therapeutic, and prophylactic strategies. In this study, we propose TransABseq, a novel computational framework specifically designed to predict the effects of missense mutations on antigen–antibody interactions. The model’s innovative two-stage architecture enables comprehensive feature analysis: in the first stage, multiple embeddings of protein language models are processed through a Transformer encoder module and a multiscale convolutional module; in the second stage, the XGBOOST model is used to perform quantitative output based on the deeply fused features. A critical advancement contributing to the effectiveness of TransABseq is the deep feature fusion strategy, which reveals the biochemical properties of proteins. By leveraging the multilayer self-attention mechanism of the Transformer to capture complex global dependencies within sequences and mining features at different hierarchical levels through multiscale convolution, the feature abstraction capability of TransABseq is significantly enhanced. We evaluated TransABseq through three distinct cross-validation strategies on two established benchmarks and a newly reconstructed data set. As a result, TransABseq achieved average PCC values of 0.607, 0.843, and 0.794 and average RMSE values of 1.166, 1.314, and 1.337 kcal/mol in 10-fold cross-validation. Furthermore, its robustness and predictive accuracy were validated on blind test data sets, where TransABseq outperformed existing methods, enabling it to attain a PCC of 0.721 and an RMSE of 0.925 kcal/mol. The relevant data and code have been made publicly available for academic research at: https://github.com/cuifengLI/TransABseq.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.