GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction.

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences Pub Date : 2024-06-01 Epub Date: 2024-03-08 DOI:10.1007/s12539-024-00609-y

Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang

{"title":"GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction.","authors":"Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang","doi":"10.1007/s12539-024-00609-y","DOIUrl":null,"url":null,"abstract":"<p><p>Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"361-377"},"PeriodicalIF":3.9000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-024-00609-y","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.

查看原文本刊更多论文

GraphsformerCPI：用于化合物-蛋白质相互作用预测的图形转换器

准确预测化合物与蛋白质的相互作用（CPI）是计算机辅助药物设计的一项关键任务。近年来，化合物活性和生物医学数据的指数级增长凸显了对高效且可解释的预测方法的需求。在本研究中，我们提出了一种端到端的深度学习框架 GraphsformerCPI，它可以提高预测性能和可解释性。GraphsformerCPI 将化合物和蛋白质视为具有空间结构的节点序列，并利用新颖的结构增强自注意机制来整合分子内的语义和图结构特征，从而实现深度分子表征。为了捕捉化合物原子和蛋白质残基之间的重要关联，我们设计了一种双重注意机制，通过.交叉映射有效提取关系特征。通过将 Transformers 强大的学习能力扩展到空间结构并广泛利用注意力机制，我们的模型具有很强的可解释性，这是与大多数黑盒深度学习方法相比的显著优势。为了评估 GraphsformerCPI，我们在基准数据集上进行了广泛的实验，包括人类数据集、线虫数据集、戴维斯数据集和 KIBA 数据集。我们探索了模型深度和辍学率对性能的影响，并将我们的模型与最先进的基线模型进行了比较。结果表明，GraphsformerCPI 在分类数据集上的表现优于基线模型，在回归数据集上的表现也很有竞争力。具体来说，在人类数据集上，GraphsformerCPI 的 AUC 平均提高了 1.6%，精确度提高了 0.5%，召回率提高了 5.3%。在 KIBA 数据集上，一致性指数（CI）和均方误差（MSE）分别平均提高了 3.3% 和 7.2%。分子对接结果表明，我们的模型提供了关于内在相互作用和结合机制的新见解。我们的研究在有效预测 CPI 和结合亲和力、识别关键原子和残基、提高模型可解释性等方面具有实际意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

8.60

自引率

4.20%

发文量

期刊介绍： Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.