Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation

IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Taeyeub Lee, Joram M. Posma
{"title":"Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation","authors":"Taeyeub Lee,&nbsp;Joram M. Posma","doi":"10.1186/s13321-025-01068-3","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>Drug-induced liver injury (DILI) is a significant concern in drug development, often leading to the discontinuation of clinical trials and the withdrawal of drugs from the market. This study explores the application of graph neural networks (GNNs) for DILI prediction, using molecular graph representations as the primary input.</p><h3>Methods</h3><p>We evaluated several GNN architectures, including Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), Graph Sample and Aggregation (GraphSAGE), and Graph Isomorphism Networks (GINs), using the latest FDA DILI dataset and other molecular property prediction datasets. We introduce a novel approach that creates a custom graph dataset, driven by molecular optimisation, that incorporates detailed and realistic chemical features such as bond lengths and partial charges as input into the GNN models. We have named our model approach DILIGeNN.</p><h3>Results</h3><p>DILIGeNN achieved an AUC of 0.897 on the DILI dataset, surpassing the current state-of-the-art model in the DILI prediction task. Furthermore, DILIGeNN outperformed the state-of-the-art in other graph-based molecular prediction tasks, achieving an AUC of 0.918 on the Clintox dataset, 0.993 on the BBBP dataset, and 0.953 on the BACE dataset, indicating strong generalisation and performance across different datasets.</p><h3>Conclusion</h3><p>DILIGeNN, utilising a single graph representation as input, outperforms the state-of-the-art methods in DILI prediction that incorporate both molecular fingerprint and graph-structured data. These findings highlight the effectiveness of our molecular graph generation and the GNN training approach as a powerful tool for early-stage drug development and drug repurposing pipeline.</p><p>Scientific Contribution: DILIGeNN is a GNN framework that extracts graph features from 3D optimised molecular structures as is done in target-based drug discovery and molecular docking simulation. Our method is the first to encode spatial and electrostatic information into a single graph representation, as opposed to other work that require multiple graphs or additional chemical descriptors for feature representation. Our approach, using warm starts following repeated early stopping during training, outperforms the current state-of-the-art methods in liver toxicity (DILI), permeability (BBBP) and activity (BACE) prediction tasks.</p><h3>Graphic Abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01068-3","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01068-3","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

Drug-induced liver injury (DILI) is a significant concern in drug development, often leading to the discontinuation of clinical trials and the withdrawal of drugs from the market. This study explores the application of graph neural networks (GNNs) for DILI prediction, using molecular graph representations as the primary input.

Methods

We evaluated several GNN architectures, including Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), Graph Sample and Aggregation (GraphSAGE), and Graph Isomorphism Networks (GINs), using the latest FDA DILI dataset and other molecular property prediction datasets. We introduce a novel approach that creates a custom graph dataset, driven by molecular optimisation, that incorporates detailed and realistic chemical features such as bond lengths and partial charges as input into the GNN models. We have named our model approach DILIGeNN.

Results

DILIGeNN achieved an AUC of 0.897 on the DILI dataset, surpassing the current state-of-the-art model in the DILI prediction task. Furthermore, DILIGeNN outperformed the state-of-the-art in other graph-based molecular prediction tasks, achieving an AUC of 0.918 on the Clintox dataset, 0.993 on the BBBP dataset, and 0.953 on the BACE dataset, indicating strong generalisation and performance across different datasets.

Conclusion

DILIGeNN, utilising a single graph representation as input, outperforms the state-of-the-art methods in DILI prediction that incorporate both molecular fingerprint and graph-structured data. These findings highlight the effectiveness of our molecular graph generation and the GNN training approach as a powerful tool for early-stage drug development and drug repurposing pipeline.

Scientific Contribution: DILIGeNN is a GNN framework that extracts graph features from 3D optimised molecular structures as is done in target-based drug discovery and molecular docking simulation. Our method is the first to encode spatial and electrostatic information into a single graph representation, as opposed to other work that require multiple graphs or additional chemical descriptors for feature representation. Our approach, using warm starts following repeated early stopping during training, outperforms the current state-of-the-art methods in liver toxicity (DILI), permeability (BBBP) and activity (BACE) prediction tasks.

Graphic Abstract

基于分子优化增广图特征的图神经网络改进药物性肝损伤预测
目的药物性肝损伤(DILI)是药物开发中的一个重要问题,经常导致临床试验中止和药物退出市场。本研究探索了图神经网络(GNNs)在DILI预测中的应用,使用分子图表示作为主要输入。方法利用最新的FDA DILI数据集和其他分子性质预测数据集,我们评估了几种GNN架构,包括图卷积网络(GCNs)、图注意力网络(GATs)、图样本和聚合(GraphSAGE)和图同构网络(GINs)。我们引入了一种新方法,该方法创建了一个自定义图形数据集,由分子优化驱动,该数据集将详细和现实的化学特征(如键长和部分电荷)作为输入输入到GNN模型中。我们将我们的模型方法命名为DILIGeNN。结果diligenn在DILI数据集上的AUC为0.897,在DILI预测任务中超过了目前最先进的模型。此外,DILIGeNN在其他基于图的分子预测任务中表现优于最先进的技术,在Clintox数据集上实现了0.918的AUC,在BBBP数据集上实现了0.993,在BACE数据集上实现了0.953,表明在不同数据集上具有很强的泛化性和性能。结论:使用单个图表示作为输入的diligenn在DILI预测中优于结合分子指纹和图结构数据的最先进方法。这些发现突出了我们的分子图生成和GNN训练方法作为早期药物开发和药物再利用管道的强大工具的有效性。科学贡献:DILIGeNN是一个GNN框架,从3D优化的分子结构中提取图形特征,就像在基于靶标的药物发现和分子对接模拟中所做的那样。我们的方法是第一个将空间和静电信息编码成单个图表示,而不是其他需要多个图或额外的化学描述符来表示特征的工作。我们的方法是在训练中反复提前停止后进行热启动,在肝毒性(DILI)、渗透性(BBBP)和活性(BACE)预测任务中优于当前最先进的方法。图形抽象
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信