Graph neural networks for identifying protein-reactive compounds†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2024-07-25 DOI:10.1039/D4DD00038B

Victor Hugo Cano Gil and Christopher N. Rowley

{"title":"Graph neural networks for identifying protein-reactive compounds†","authors":"Victor Hugo Cano Gil and Christopher N. Rowley","doi":"10.1039/D4DD00038B","DOIUrl":null,"url":null,"abstract":"<p >The identification of protein-reactive electrophilic compounds is critical to the design of new covalent modifier drugs, screening for toxic compounds, and the exclusion of reactive compounds from high throughput screening. In this work, we employ traditional and graph machine learning (ML) algorithms to classify molecules being reactive towards proteins or nonreactive. For training data, we built a new dataset, ProteinReactiveDB, composed primarily of covalent and noncovalent inhibitors from the DrugBank, BindingDB, and CovalentInDB databases. To assess the transferability of the trained models, we created a custom set of covalent and noncovalent inhibitors, which was constructed from the recent literature. Baseline models were developed using Morgan fingerprints as training inputs, but they performed poorly when applied to compounds outside the training set. We then trained various Graph Neural Networks (GNNs), with the best GNN model achieving an Area Under the Receiver Operator Characteristic (AUROC) curve of 0.80, precision of 0.89, and recall of 0.72. We also explore the interpretability of these GNNs using Gradient Activation Mapping (GradCAM), which shows regions of the molecules GNNs deem most relevant when making a prediction. These maps indicated that our trained models can identify electrophilic functional groups in a molecule and classify molecules as protein-reactive based on their presence. We demonstrate the use of these models by comparing their performance against common chemical filters, identifying covalent modifiers in the ChEMBL database and generating a putative covalent inhibitor based on an established noncovalent inhibitor.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 1776-1792"},"PeriodicalIF":6.2000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00038b?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00038b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The identification of protein-reactive electrophilic compounds is critical to the design of new covalent modifier drugs, screening for toxic compounds, and the exclusion of reactive compounds from high throughput screening. In this work, we employ traditional and graph machine learning (ML) algorithms to classify molecules being reactive towards proteins or nonreactive. For training data, we built a new dataset, ProteinReactiveDB, composed primarily of covalent and noncovalent inhibitors from the DrugBank, BindingDB, and CovalentInDB databases. To assess the transferability of the trained models, we created a custom set of covalent and noncovalent inhibitors, which was constructed from the recent literature. Baseline models were developed using Morgan fingerprints as training inputs, but they performed poorly when applied to compounds outside the training set. We then trained various Graph Neural Networks (GNNs), with the best GNN model achieving an Area Under the Receiver Operator Characteristic (AUROC) curve of 0.80, precision of 0.89, and recall of 0.72. We also explore the interpretability of these GNNs using Gradient Activation Mapping (GradCAM), which shows regions of the molecules GNNs deem most relevant when making a prediction. These maps indicated that our trained models can identify electrophilic functional groups in a molecule and classify molecules as protein-reactive based on their presence. We demonstrate the use of these models by comparing their performance against common chemical filters, identifying covalent modifiers in the ChEMBL database and generating a putative covalent inhibitor based on an established noncovalent inhibitor.

Abstract Image

查看原文本刊更多论文

识别蛋白质活性化合物的图神经网络

蛋白质反应性亲电化合物的鉴定对于设计新的共价修饰药物、筛选有毒化合物以及将反应性化合物排除在高通量筛选之外至关重要。在这项工作中，我们采用了传统的图式机器学习（ML）算法来分类对蛋白质有反应或无反应的分子。作为训练数据，我们建立了一个新的数据集 ProteinReactiveDB，主要由 DrugBank、BindingDB 和 CovalentInDB 数据库中的共价和非共价抑制剂组成。为了评估训练模型的可移植性，我们创建了一套定制的共价和非共价抑制剂，这套抑制剂是根据最近的文献构建的。我们使用摩根指纹作为训练输入开发了基准模型，但当这些模型应用于训练集之外的化合物时，表现不佳。我们随后训练了各种图神经网络 (GNN)，其中最佳的 GNN 模型的接收者运算特性曲线下面积 (AUROC) 为 0.80，精确度为 0.89，召回率为 0.72。我们还使用梯度激活图谱 (GradCAM) 探索了这些 GNN 的可解释性，该图谱显示了 GNN 在进行预测时认为最相关的分子区域。这些图谱表明，我们训练有素的模型可以识别分子中的亲电官能团，并根据它们的存在将分子划分为对蛋白质有反应的分子。我们通过比较这些模型与常见化学过滤器的性能、识别 ChEMBL 数据库中的共价修饰物以及根据已确定的非共价抑制剂生成推定共价抑制剂，展示了这些模型的用途。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量