Mingming Guan, Jiyun Han, Shizhuo Zhang, Hongyu Zheng, Juntao Liu
{"title":"SpatConv能够通过预训练的蛋白质语言模型和可解释的生物空间卷积准确预测蛋白质结合位点。","authors":"Mingming Guan, Jiyun Han, Shizhuo Zhang, Hongyu Zheng, Juntao Liu","doi":"10.34133/research.0773","DOIUrl":null,"url":null,"abstract":"<p><p>Protein interactions with molecules, such as other proteins, peptides, or small ligands, play a critical role in biological processes, and the identification of protein binding sites is crucial for understanding the mechanisms underlying diseases such as cancer. Traditional protein binding site prediction models usually extract residue features manually and then employ a graph or point-cloud-based architecture borrowed from other fields. Therefore, substantial information loss and limited learning ability cause them to fail to capture residue binding patterns. To solve these challenges, we introduce a general network that predicts the binding residues of proteins, peptides, and metal ions on proteins. SpatConv extracts sequence features from a pretrained large protein language model and structure features from a local coordinate framework. SpatConv learns residue binding patterns through a specially designed, graph-free bio-spatial convolution, which characterizes the complex spatial environments around the residues. After training and testing, SpatConv demonstrates great improvements over the state-of-the-art predictors and reveals novel biological insights into the relationship between binding sites and physicochemical properties. Notably, SpatConv exhibits robust performance across predicted and experimental structures, enhancing its reliability. Additionally, when applying it to the spike protein structure of severe acute respiratory syndrome coronavirus 2, SpatConv successfully identifies antibody binding sites and predicts potential binding regions, providing strong evidence supporting new drug development. A user-friendly online server for SpatConv is freely available at http://liulab.top/SpatConv/server.</p>","PeriodicalId":21120,"journal":{"name":"Research","volume":"8 ","pages":"0773"},"PeriodicalIF":11.0000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237623/pdf/","citationCount":"0","resultStr":"{\"title\":\"SpatConv Enables the Accurate Prediction of Protein Binding Sites by a Pretrained Protein Language Model and an Interpretable Bio-spatial Convolution.\",\"authors\":\"Mingming Guan, Jiyun Han, Shizhuo Zhang, Hongyu Zheng, Juntao Liu\",\"doi\":\"10.34133/research.0773\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Protein interactions with molecules, such as other proteins, peptides, or small ligands, play a critical role in biological processes, and the identification of protein binding sites is crucial for understanding the mechanisms underlying diseases such as cancer. Traditional protein binding site prediction models usually extract residue features manually and then employ a graph or point-cloud-based architecture borrowed from other fields. Therefore, substantial information loss and limited learning ability cause them to fail to capture residue binding patterns. To solve these challenges, we introduce a general network that predicts the binding residues of proteins, peptides, and metal ions on proteins. SpatConv extracts sequence features from a pretrained large protein language model and structure features from a local coordinate framework. SpatConv learns residue binding patterns through a specially designed, graph-free bio-spatial convolution, which characterizes the complex spatial environments around the residues. After training and testing, SpatConv demonstrates great improvements over the state-of-the-art predictors and reveals novel biological insights into the relationship between binding sites and physicochemical properties. Notably, SpatConv exhibits robust performance across predicted and experimental structures, enhancing its reliability. Additionally, when applying it to the spike protein structure of severe acute respiratory syndrome coronavirus 2, SpatConv successfully identifies antibody binding sites and predicts potential binding regions, providing strong evidence supporting new drug development. A user-friendly online server for SpatConv is freely available at http://liulab.top/SpatConv/server.</p>\",\"PeriodicalId\":21120,\"journal\":{\"name\":\"Research\",\"volume\":\"8 \",\"pages\":\"0773\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237623/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.34133/research.0773\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.34133/research.0773","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
SpatConv Enables the Accurate Prediction of Protein Binding Sites by a Pretrained Protein Language Model and an Interpretable Bio-spatial Convolution.
Protein interactions with molecules, such as other proteins, peptides, or small ligands, play a critical role in biological processes, and the identification of protein binding sites is crucial for understanding the mechanisms underlying diseases such as cancer. Traditional protein binding site prediction models usually extract residue features manually and then employ a graph or point-cloud-based architecture borrowed from other fields. Therefore, substantial information loss and limited learning ability cause them to fail to capture residue binding patterns. To solve these challenges, we introduce a general network that predicts the binding residues of proteins, peptides, and metal ions on proteins. SpatConv extracts sequence features from a pretrained large protein language model and structure features from a local coordinate framework. SpatConv learns residue binding patterns through a specially designed, graph-free bio-spatial convolution, which characterizes the complex spatial environments around the residues. After training and testing, SpatConv demonstrates great improvements over the state-of-the-art predictors and reveals novel biological insights into the relationship between binding sites and physicochemical properties. Notably, SpatConv exhibits robust performance across predicted and experimental structures, enhancing its reliability. Additionally, when applying it to the spike protein structure of severe acute respiratory syndrome coronavirus 2, SpatConv successfully identifies antibody binding sites and predicts potential binding regions, providing strong evidence supporting new drug development. A user-friendly online server for SpatConv is freely available at http://liulab.top/SpatConv/server.
期刊介绍:
Research serves as a global platform for academic exchange, collaboration, and technological advancements. This journal welcomes high-quality research contributions from any domain, with open arms to authors from around the globe.
Comprising fundamental research in the life and physical sciences, Research also highlights significant findings and issues in engineering and applied science. The journal proudly features original research articles, reviews, perspectives, and editorials, fostering a diverse and dynamic scholarly environment.