Jia Mi , Chang Li , Han Wang , Ying Du , Chong Chu , Jing Wan , Kunfeng Wang
{"title":"USPDB:一种基于子图采样的新型u形等变图神经网络,用于蛋白质- dna结合位点预测","authors":"Jia Mi , Chang Li , Han Wang , Ying Du , Chong Chu , Jing Wan , Kunfeng Wang","doi":"10.1016/j.eswa.2025.128554","DOIUrl":null,"url":null,"abstract":"<div><div>Protein-DNA binding directly influences the normal functioning of biological processes by regulating gene expression. Accurate identification of binding sites can reveal the mechanisms of protein-DNA interactions and provide a clear direction for drug target development. However, traditional experimental methods are time-consuming and costly, necessitating the development of efficient computational methods. Although existing computational methods have made significant progress in the field of protein binding site prediction, they have difficulty extracting key residue features and atomic-level features. To address this, we propose a novel method, USPDB, based on a U-shaped Equivariant Graph Neural Network(U-EGNNet) and Subgraph Sampling for Protein-DNA Binding Site Prediction. USPDB reformulates the binding site prediction task by converting the protein into a graph and performing a binary classification for each residue. It leverages protein large language models, such as Protrans, ESM2, and ESM3, to extract sequence and structural features. The General Equivariant Transformer (GET) module is employed to capture geometric features of residues and atoms. Additionally, the U-EGNNet, composed of EGNN and Subgraph Sampling, is utilized to preserve more global information while sampling subgraphs that contain key residues for further computation. Experimental results on DNA_test_181 and DNA_test_129 datasets demonstrate that USPDB achieves prediction accuracies of 0.532 and 0.361, respectively, outperforming all baseline methods. Through interpretability analysis, we observed that USPDB effectively focuses on residues within DNA-binding domains without requiring prior knowledge, thereby enhancing the performance of DNA-binding protein prediction. The code is publicly available at the following link: <span><span>https://github.com/MiJia-ID/USPDB</span><svg><path></path></svg></span></div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128554"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"USPDB: A novel U-shaped equivariant graph neural network with subgraph sampling for protein-DNA binding site prediction\",\"authors\":\"Jia Mi , Chang Li , Han Wang , Ying Du , Chong Chu , Jing Wan , Kunfeng Wang\",\"doi\":\"10.1016/j.eswa.2025.128554\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Protein-DNA binding directly influences the normal functioning of biological processes by regulating gene expression. Accurate identification of binding sites can reveal the mechanisms of protein-DNA interactions and provide a clear direction for drug target development. However, traditional experimental methods are time-consuming and costly, necessitating the development of efficient computational methods. Although existing computational methods have made significant progress in the field of protein binding site prediction, they have difficulty extracting key residue features and atomic-level features. To address this, we propose a novel method, USPDB, based on a U-shaped Equivariant Graph Neural Network(U-EGNNet) and Subgraph Sampling for Protein-DNA Binding Site Prediction. USPDB reformulates the binding site prediction task by converting the protein into a graph and performing a binary classification for each residue. It leverages protein large language models, such as Protrans, ESM2, and ESM3, to extract sequence and structural features. The General Equivariant Transformer (GET) module is employed to capture geometric features of residues and atoms. Additionally, the U-EGNNet, composed of EGNN and Subgraph Sampling, is utilized to preserve more global information while sampling subgraphs that contain key residues for further computation. Experimental results on DNA_test_181 and DNA_test_129 datasets demonstrate that USPDB achieves prediction accuracies of 0.532 and 0.361, respectively, outperforming all baseline methods. Through interpretability analysis, we observed that USPDB effectively focuses on residues within DNA-binding domains without requiring prior knowledge, thereby enhancing the performance of DNA-binding protein prediction. The code is publicly available at the following link: <span><span>https://github.com/MiJia-ID/USPDB</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"291 \",\"pages\":\"Article 128554\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425021736\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425021736","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
USPDB: A novel U-shaped equivariant graph neural network with subgraph sampling for protein-DNA binding site prediction
Protein-DNA binding directly influences the normal functioning of biological processes by regulating gene expression. Accurate identification of binding sites can reveal the mechanisms of protein-DNA interactions and provide a clear direction for drug target development. However, traditional experimental methods are time-consuming and costly, necessitating the development of efficient computational methods. Although existing computational methods have made significant progress in the field of protein binding site prediction, they have difficulty extracting key residue features and atomic-level features. To address this, we propose a novel method, USPDB, based on a U-shaped Equivariant Graph Neural Network(U-EGNNet) and Subgraph Sampling for Protein-DNA Binding Site Prediction. USPDB reformulates the binding site prediction task by converting the protein into a graph and performing a binary classification for each residue. It leverages protein large language models, such as Protrans, ESM2, and ESM3, to extract sequence and structural features. The General Equivariant Transformer (GET) module is employed to capture geometric features of residues and atoms. Additionally, the U-EGNNet, composed of EGNN and Subgraph Sampling, is utilized to preserve more global information while sampling subgraphs that contain key residues for further computation. Experimental results on DNA_test_181 and DNA_test_129 datasets demonstrate that USPDB achieves prediction accuracies of 0.532 and 0.361, respectively, outperforming all baseline methods. Through interpretability analysis, we observed that USPDB effectively focuses on residues within DNA-binding domains without requiring prior knowledge, thereby enhancing the performance of DNA-binding protein prediction. The code is publicly available at the following link: https://github.com/MiJia-ID/USPDB
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.