Deep-ProBind: binding protein prediction with transformer-based deep learning model.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-03-22 DOI:10.1186/s12859-025-06101-8

Salman Khan, Sumaiya Noor, Hamid Hussain Awan, Shehryar Iqbal, Salman A AlQahtani, Naqqash Dilshad, Nijad Ahmad

{"title":"Deep-ProBind: binding protein prediction with transformer-based deep learning model.","authors":"Salman Khan, Sumaiya Noor, Hamid Hussain Awan, Shehryar Iqbal, Salman A AlQahtani, Naqqash Dilshad, Nijad Ahmad","doi":"10.1186/s12859-025-06101-8","DOIUrl":null,"url":null,"abstract":"<p><p>Binding proteins play a crucial role in biological systems by selectively interacting with specific molecules, such as DNA, RNA, or peptides, to regulate various cellular processes. Their ability to recognize and bind target molecules with high specificity makes them essential for signal transduction, transport, and enzymatic activity. Traditional experimental methods for identifying protein-binding peptides are costly and time-consuming. Current sequence-based approaches often struggle with accuracy, focusing too narrowly on proximal sequence features and ignoring structural data. This study presents Deep-ProBind, a powerful prediction model designed to classify protein binding sites by integrating sequence and structural information. The proposed model employs a transformer and evolutionary-based attention mechanism, i.e., Bidirectional Encoder Representations from Transformers (BERT) and Pseudo position specific scoring matrix -Discrete Wavelet Transform (PsePSSM -DWT) approach to encode peptides. The SHapley Additive exPlanations (SHAP) algorithm selects the optimal hybrid features, and a Deep Neural Network (DNN) is then used as the classification algorithm to predict protein-binding peptides. The performance of the proposed model was evaluated in comparison with traditional Machine Learning (ML) algorithms and existing models. Experimental results demonstrate that Deep-ProBind achieved 92.67% accuracy with tenfold cross-validation on benchmark datasets and 93.62% accuracy on independent samples. The Deep-ProBind outperforms existing models by 3.57% on training data and 1.52% on independent tests. These results demonstrate Deep-ProBind's reliability and effectiveness, making it a valuable tool for researchers and a potential resource in pharmacological studies, where peptide binding plays a critical role in therapeutic development.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"88"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929993/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06101-8","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Binding proteins play a crucial role in biological systems by selectively interacting with specific molecules, such as DNA, RNA, or peptides, to regulate various cellular processes. Their ability to recognize and bind target molecules with high specificity makes them essential for signal transduction, transport, and enzymatic activity. Traditional experimental methods for identifying protein-binding peptides are costly and time-consuming. Current sequence-based approaches often struggle with accuracy, focusing too narrowly on proximal sequence features and ignoring structural data. This study presents Deep-ProBind, a powerful prediction model designed to classify protein binding sites by integrating sequence and structural information. The proposed model employs a transformer and evolutionary-based attention mechanism, i.e., Bidirectional Encoder Representations from Transformers (BERT) and Pseudo position specific scoring matrix -Discrete Wavelet Transform (PsePSSM -DWT) approach to encode peptides. The SHapley Additive exPlanations (SHAP) algorithm selects the optimal hybrid features, and a Deep Neural Network (DNN) is then used as the classification algorithm to predict protein-binding peptides. The performance of the proposed model was evaluated in comparison with traditional Machine Learning (ML) algorithms and existing models. Experimental results demonstrate that Deep-ProBind achieved 92.67% accuracy with tenfold cross-validation on benchmark datasets and 93.62% accuracy on independent samples. The Deep-ProBind outperforms existing models by 3.57% on training data and 1.52% on independent tests. These results demonstrate Deep-ProBind's reliability and effectiveness, making it a valuable tool for researchers and a potential resource in pharmacological studies, where peptide binding plays a critical role in therapeutic development.

查看原文本刊更多论文

Deep-ProBind：利用基于变换器的深度学习模型预测结合蛋白。

结合蛋白通过选择性地与 DNA、RNA 或肽等特定分子相互作用来调节各种细胞过程，在生物系统中发挥着至关重要的作用。结合蛋白能够高度特异性地识别和结合目标分子，因此对信号转导、运输和酶活性至关重要。鉴定蛋白质结合肽的传统实验方法既昂贵又耗时。目前基于序列的方法往往过于关注近端序列特征而忽略了结构数据，因而在准确性方面存在困难。本研究提出的 Deep-ProBind 是一个功能强大的预测模型，旨在通过整合序列和结构信息对蛋白质结合位点进行分类。该模型采用了变压器和基于进化的注意机制，即变压器双向编码器表示法（BERT）和伪位置特定评分矩阵-离散小波变换（PsePSSM-DWT）方法来编码肽。SHapley Additive exPlanations（SHAP）算法选择最佳混合特征，然后使用深度神经网络（DNN）作为分类算法来预测蛋白质结合肽。通过与传统机器学习（ML）算法和现有模型的比较，对所提出模型的性能进行了评估。实验结果表明，Deep-ProBind 在基准数据集上的十倍交叉验证准确率达到 92.67%，在独立样本上的准确率达到 93.62%。在训练数据上，Deep-ProBind 比现有模型高出 3.57%，在独立测试上高出 1.52%。这些结果证明了 Deep-ProBind 的可靠性和有效性，使其成为研究人员的宝贵工具和药理学研究的潜在资源，其中肽结合在治疗开发中起着至关重要的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.