Graph neural networks for identifying protein-reactive compounds†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Victor Hugo Cano Gil and Christopher N. Rowley
{"title":"Graph neural networks for identifying protein-reactive compounds†","authors":"Victor Hugo Cano Gil and Christopher N. Rowley","doi":"10.1039/D4DD00038B","DOIUrl":null,"url":null,"abstract":"<p >The identification of protein-reactive electrophilic compounds is critical to the design of new covalent modifier drugs, screening for toxic compounds, and the exclusion of reactive compounds from high throughput screening. In this work, we employ traditional and graph machine learning (ML) algorithms to classify molecules being reactive towards proteins or nonreactive. For training data, we built a new dataset, ProteinReactiveDB, composed primarily of covalent and noncovalent inhibitors from the DrugBank, BindingDB, and CovalentInDB databases. To assess the transferability of the trained models, we created a custom set of covalent and noncovalent inhibitors, which was constructed from the recent literature. Baseline models were developed using Morgan fingerprints as training inputs, but they performed poorly when applied to compounds outside the training set. We then trained various Graph Neural Networks (GNNs), with the best GNN model achieving an Area Under the Receiver Operator Characteristic (AUROC) curve of 0.80, precision of 0.89, and recall of 0.72. We also explore the interpretability of these GNNs using Gradient Activation Mapping (GradCAM), which shows regions of the molecules GNNs deem most relevant when making a prediction. These maps indicated that our trained models can identify electrophilic functional groups in a molecule and classify molecules as protein-reactive based on their presence. We demonstrate the use of these models by comparing their performance against common chemical filters, identifying covalent modifiers in the ChEMBL database and generating a putative covalent inhibitor based on an established noncovalent inhibitor.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00038b?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00038b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The identification of protein-reactive electrophilic compounds is critical to the design of new covalent modifier drugs, screening for toxic compounds, and the exclusion of reactive compounds from high throughput screening. In this work, we employ traditional and graph machine learning (ML) algorithms to classify molecules being reactive towards proteins or nonreactive. For training data, we built a new dataset, ProteinReactiveDB, composed primarily of covalent and noncovalent inhibitors from the DrugBank, BindingDB, and CovalentInDB databases. To assess the transferability of the trained models, we created a custom set of covalent and noncovalent inhibitors, which was constructed from the recent literature. Baseline models were developed using Morgan fingerprints as training inputs, but they performed poorly when applied to compounds outside the training set. We then trained various Graph Neural Networks (GNNs), with the best GNN model achieving an Area Under the Receiver Operator Characteristic (AUROC) curve of 0.80, precision of 0.89, and recall of 0.72. We also explore the interpretability of these GNNs using Gradient Activation Mapping (GradCAM), which shows regions of the molecules GNNs deem most relevant when making a prediction. These maps indicated that our trained models can identify electrophilic functional groups in a molecule and classify molecules as protein-reactive based on their presence. We demonstrate the use of these models by comparing their performance against common chemical filters, identifying covalent modifiers in the ChEMBL database and generating a putative covalent inhibitor based on an established noncovalent inhibitor.

Abstract Image

识别蛋白质活性化合物的图神经网络
蛋白质反应性亲电化合物的鉴定对于设计新的共价修饰药物、筛选有毒化合物以及将反应性化合物排除在高通量筛选之外至关重要。在这项工作中,我们采用了传统的图式机器学习(ML)算法来分类对蛋白质有反应或无反应的分子。作为训练数据,我们建立了一个新的数据集 ProteinReactiveDB,主要由 DrugBank、BindingDB 和 CovalentInDB 数据库中的共价和非共价抑制剂组成。为了评估训练模型的可移植性,我们创建了一套定制的共价和非共价抑制剂,这套抑制剂是根据最近的文献构建的。我们使用摩根指纹作为训练输入开发了基准模型,但当这些模型应用于训练集之外的化合物时,表现不佳。我们随后训练了各种图神经网络 (GNN),其中最佳的 GNN 模型的接收者运算特性曲线下面积 (AUROC) 为 0.80,精确度为 0.89,召回率为 0.72。我们还使用梯度激活图谱 (GradCAM) 探索了这些 GNN 的可解释性,该图谱显示了 GNN 在进行预测时认为最相关的分子区域。这些图谱表明,我们训练有素的模型可以识别分子中的亲电官能团,并根据它们的存在将分子划分为对蛋白质有反应的分子。我们通过比较这些模型与常见化学过滤器的性能、识别 ChEMBL 数据库中的共价修饰物以及根据已确定的非共价抑制剂生成推定共价抑制剂,展示了这些模型的用途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信