Protein-protein and protein-nucleic acid binding site prediction via interpretable hierarchical geometric deep learning.

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2024-01-02 DOI:10.1093/gigascience/giae080

Shizhuo Zhang, Jiyun Han, Juntao Liu

{"title":"Protein-protein and protein-nucleic acid binding site prediction via interpretable hierarchical geometric deep learning.","authors":"Shizhuo Zhang, Jiyun Han, Juntao Liu","doi":"10.1093/gigascience/giae080","DOIUrl":null,"url":null,"abstract":"<p><p>Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease diagnosis and drug design. However, accurate predictions by computational approaches remain highly challenging due to the limited knowledge of residue binding patterns. The binding pattern of a residue should be characterized by the spatial distribution of its neighboring residues combined with their physicochemical information interaction, which yet cannot be achieved by previous methods. Here, we design GraphRBF, a hierarchical geometric deep learning model to learn residue binding patterns from big data. To achieve it, GraphRBF describes physicochemical information interactions by designing an enhanced graph neural network and characterizes residue spatial distributions by introducing a prioritized radial basis function neural network. After training and testing, GraphRBF shows great improvements over existing state-of-the-art methods and strong interpretability of its learned representations. Applying GraphRBF to the SARS-CoV-2 omicron spike protein, it successfully identifies known epitopes of the protein. Moreover, it predicts multiple potential binding regions for new nanobodies or even new drugs with strong evidence. A user-friendly online server for GraphRBF is freely available at http://liulab.top/GraphRBF/server.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528319/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae080","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease diagnosis and drug design. However, accurate predictions by computational approaches remain highly challenging due to the limited knowledge of residue binding patterns. The binding pattern of a residue should be characterized by the spatial distribution of its neighboring residues combined with their physicochemical information interaction, which yet cannot be achieved by previous methods. Here, we design GraphRBF, a hierarchical geometric deep learning model to learn residue binding patterns from big data. To achieve it, GraphRBF describes physicochemical information interactions by designing an enhanced graph neural network and characterizes residue spatial distributions by introducing a prioritized radial basis function neural network. After training and testing, GraphRBF shows great improvements over existing state-of-the-art methods and strong interpretability of its learned representations. Applying GraphRBF to the SARS-CoV-2 omicron spike protein, it successfully identifies known epitopes of the protein. Moreover, it predicts multiple potential binding regions for new nanobodies or even new drugs with strong evidence. A user-friendly online server for GraphRBF is freely available at http://liulab.top/GraphRBF/server.

查看原文本刊更多论文

通过可解释分层几何深度学习预测蛋白质-蛋白质和蛋白质-核酸结合位点。

蛋白质-蛋白质和蛋白质-核酸结合位点的鉴定有助于深入了解与蛋白质功能相关的生物过程，并为疾病诊断和药物设计提供技术指导。然而，由于对残基结合模式的了解有限，通过计算方法进行准确预测仍然具有很大的挑战性。一个残基的结合模式应该由其相邻残基的空间分布结合其物理化学信息相互作用来表征，而以往的方法无法实现这一点。在此，我们设计了一种分层几何深度学习模型 GraphRBF，用于从大数据中学习残基结合模式。为了实现这一目标，GraphRBF 通过设计一个增强的图神经网络来描述理化信息的相互作用，并通过引入一个优先径向基函数神经网络来表征残基的空间分布。经过训练和测试，GraphRBF 与现有的先进方法相比有了很大的改进，其学习到的表征具有很强的可解释性。将 GraphRBF 应用于 SARS-CoV-2 omicron 穗蛋白，它成功地识别了该蛋白的已知表位。此外，它还预测了新纳米抗体甚至新药的多个潜在结合区域，证据确凿。GraphRBF 的用户友好型在线服务器可在 http://liulab.top/GraphRBF/server 免费获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.