Pei Xu, Kai Zhong, Honghua Ge, Xiaoping Song, Weihua Wang
{"title":"Prediction of protein thermostability trends based on the self-attention mechanism driven sparse convolutional network.","authors":"Pei Xu, Kai Zhong, Honghua Ge, Xiaoping Song, Weihua Wang","doi":"10.1016/j.compbiolchem.2025.108693","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence (AI)-assisted thermostability prediction of proteins can significantly alleviate the burden of mutation screening, thereby enhancing the efficiency of protein engineering. To further improve prediction accuracy and shorten the development cycle of new proteins, we integrate protein sequences, mutation relationships, and physicochemical properties for encoding, introducing the innovative Sparse Convolutional Network driven by the self-attention mechanism, named SCSAddG. Experimental results demonstrate that SCSAddG achieves a prediction accuracy of 0.868, a precision of 0.710, a recall of 0.606, an F1 score of 0.653, and an area under the Receiver Operating Characteristic (AUROC) of 0.825 in the general dataset S2648. Compared to traditional Convolutional Neural Networks (CNN), SCSAddG exhibits slightly higher prediction accuracy and outperforms the Rosetta bioinformatics simulation software 12% in terms of accuracy. Furthermore, in the experimental transglutaminase dataset, SCSAddG exhibits significantly better prediction accuracy compared to CNN (0.744 vs. 0.667), achieving a precision of 1.000. The results of wet laboratory experiments are consistent with the model predictions. In the 5-fold cross-validation, the SCSAddG model outperformed the CNN across multiple evaluation metrics, demonstrating its superior predictive performance and robust reliability. These results indicate that SCSAddG can effectively evaluate the trends in protein thermostability and serve as a valuable tool to guide protein thermostability engineering.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"120 Pt 2","pages":"108693"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational biology and chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.compbiolchem.2025.108693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial intelligence (AI)-assisted thermostability prediction of proteins can significantly alleviate the burden of mutation screening, thereby enhancing the efficiency of protein engineering. To further improve prediction accuracy and shorten the development cycle of new proteins, we integrate protein sequences, mutation relationships, and physicochemical properties for encoding, introducing the innovative Sparse Convolutional Network driven by the self-attention mechanism, named SCSAddG. Experimental results demonstrate that SCSAddG achieves a prediction accuracy of 0.868, a precision of 0.710, a recall of 0.606, an F1 score of 0.653, and an area under the Receiver Operating Characteristic (AUROC) of 0.825 in the general dataset S2648. Compared to traditional Convolutional Neural Networks (CNN), SCSAddG exhibits slightly higher prediction accuracy and outperforms the Rosetta bioinformatics simulation software 12% in terms of accuracy. Furthermore, in the experimental transglutaminase dataset, SCSAddG exhibits significantly better prediction accuracy compared to CNN (0.744 vs. 0.667), achieving a precision of 1.000. The results of wet laboratory experiments are consistent with the model predictions. In the 5-fold cross-validation, the SCSAddG model outperformed the CNN across multiple evaluation metrics, demonstrating its superior predictive performance and robust reliability. These results indicate that SCSAddG can effectively evaluate the trends in protein thermostability and serve as a valuable tool to guide protein thermostability engineering.