Journal of Cheminformatics最新文献

筛选
英文 中文
Biosynfoni: a biosynthesis-informed and interpretable lightweight molecular fingerprint Biosynfoni:生物合成信息和可解释的轻量级分子指纹
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-29 DOI: 10.1186/s13321-025-01081-6
Lucina-May Nollen, David Meijer, Maria Sorokina, Justin J. J. van der Hooft
{"title":"Biosynfoni: a biosynthesis-informed and interpretable lightweight molecular fingerprint","authors":"Lucina-May Nollen,&nbsp;David Meijer,&nbsp;Maria Sorokina,&nbsp;Justin J. J. van der Hooft","doi":"10.1186/s13321-025-01081-6","DOIUrl":"10.1186/s13321-025-01081-6","url":null,"abstract":"<div><p>Natural products provide a rich source of bioactive molecules for a variety of applications. Molecular fingerprints are the tool of choice for systematic large-scale studies of their structures. However, current molecular fingerprints insufficiently represent characteristic features of natural products inherently, decreasing the interpretability of natural product-specific predictions. Here, we show that a natural product-specific molecular fingerprint based on a relatively small set of selected biosynthetic building blocks provides more interpretable predictions of biosynthetic distance and natural product classification. Our fingerprint Biosynfoni outperforms MACCS, Morgan, and Daylight-like fingerprints in biosynthetic distance estimation, using 39 substructure keys. Moreover, Biosynfoni’s design, compactness, and concrete substructure definition allow easy visualisation of the detected substructures and their respective biosynthetic pathway origins. Through Biosynfoni, users can gain more insights from predictions and better examine the importance of features within machine learning models. Our results show that a short fingerprint consisting of biologically significant building blocks performs on par with top-performing molecular fingerprints for natural product classification while improving prediction explainability.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01081-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models FusionCLM:通过化学语言模型的知识融合增强分子性质预测
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-29 DOI: 10.1186/s13321-025-01073-6
Yutong Lu, Yan Yi Li, Yan Sun, Pingzhao Hu
{"title":"FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models","authors":"Yutong Lu,&nbsp;Yan Yi Li,&nbsp;Yan Sun,&nbsp;Pingzhao Hu","doi":"10.1186/s13321-025-01073-6","DOIUrl":"10.1186/s13321-025-01073-6","url":null,"abstract":"<div><p>Chemical Language Models (CLMs) have demonstrated capabilities in extracting patterns and predicting from vast volume of the Simplified Molecular Input Line Entry System (SMILES), a notation used to represent molecular structures. Different CLMs, developed from various architectures, can provide unique insights into molecular properties. To harness the uniqueness of different CLMs, we propose FusionCLM, a novel stacking-ensemble learning algorithm that integrate the outputs of multiple CLMs into a unified framework. FusionCLM first generates SMILES embeddings, predictions, and losses from each CLM. Auxiliary models are trained on these first-level predictions and embeddings to estimate test losses during inference. The losses and predictions are then concatenated to create an integrated feature matrix, which trains second-level meta-models for final predictions. Empirical testing on five datasets demonstrates that FusionCLM have better performance than individual CLM at the first level and three advanced multimodal deep learning frameworks, showcasing FusionCLM’s potential in advancing molecular property prediction.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01073-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture of experts for multitask learning in cardiotoxicity assessment 心脏毒性评估中多任务学习的专家组合
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-29 DOI: 10.1186/s13321-025-01072-7
Edoardo Luca Viganò, Mateusz Iwan, Erika Colombo, Davide Ballabio, Alessandra Roncaglioni
{"title":"Mixture of experts for multitask learning in cardiotoxicity assessment","authors":"Edoardo Luca Viganò,&nbsp;Mateusz Iwan,&nbsp;Erika Colombo,&nbsp;Davide Ballabio,&nbsp;Alessandra Roncaglioni","doi":"10.1186/s13321-025-01072-7","DOIUrl":"10.1186/s13321-025-01072-7","url":null,"abstract":"<p>In recent years, the integration of Artificial Intelligence and Machine Learning methods with biochemical and biomedical research has revolutionized the field of toxicology, significantly advancing our understanding of the toxicological effects of chemicals on biological systems. Cardiovascular diseases remain the leading global cause of death. The constant exposure to multiple chemicals with potential cardiotoxic effects, including environmental contaminants, pesticides, food additives, and drugs, can significantly contribute to these adverse health outcomes. Traditional methods for assessing chemical hazards and their impact on biological function heavily rely on experimental assays and animal studies, which are often time-consuming, resource-intensive, and limited in scalability. To overcome these limitations in silico methods have emerged as indispensable tools in toxicological research, reducing the need for traditional in vivo testing and conserving valuable resources in terms of time and cost. In this study, Artificial Intelligence methods are used as first-tier components within an Integrated Approach to Testing and Assessment. We explored the potential benefits of using Multitask Neural Networks, where multiple levels of cardiotoxicity information are combined to enhance model performance. Multitask learning, based on specific architectures such as Mixture of Experts (MoE), showed promising results and surpasses the performance of single-task baseline models. When predicting a holdout set, multitask model achieved high performance on twelve different endpoints related to cardiotoxicity defined by Adverse Outcome Pathways Network. The best developed model achieved a balanced accuracy of 78%, a sensitivity of 80%, and a specificity of 76% across all endpoints in the holdout set.</p><p>An advanced multitask model was developed to predict cardiotoxicity mechanisms induced by small molecules. The model demonstrates broad mechanistic coverage and achieves performance comparable to, or exceeding, state-of-the-art methods. These results suggest that the model could serve as a valuable first-tier component in advanced New Approach Methodologies for prioritizing chemicals for further testing.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01072-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdapTor: Adaptive Topological Regression for quantitative structure–activity relationship modeling AdapTor:自适应拓扑回归定量结构-活动关系建模
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-28 DOI: 10.1186/s13321-025-01071-8
Yixiang Mao, Souparno Ghosh, Ranadip Pal
{"title":"AdapTor: Adaptive Topological Regression for quantitative structure–activity relationship modeling","authors":"Yixiang Mao,&nbsp;Souparno Ghosh,&nbsp;Ranadip Pal","doi":"10.1186/s13321-025-01071-8","DOIUrl":"10.1186/s13321-025-01071-8","url":null,"abstract":"<div><p>Quantitative structure–activity relationship (QSAR) modeling has become a critical tool in drug design. Recently proposed Topological Regression (TR), a computationally efficient and highly interpretable QSAR model that maps distances in the chemical domain to distances in the activity domain, has shown predictive performance comparable to state-of-the-art deep learning-based models. However, TR’s dependence on simple random sampling-based anchor selection and utilization of radial basis function for response reconstruction constrain its interpretability and predictive capacity. To address these limitations, we propose Adaptive Topological Regression (AdapToR) with adaptive anchor selection and optimization-based reconstruction. We evaluated AdapToR on the NCI60 GI50 dataset, which consists of over 50,000 drug responses across 60 human cancer cell lines, and compared its performance to Transformer CNN, Graph Transformer, TR, and other baseline models. The results demonstrate that AdapToR outperforms competing QSAR models for drug response prediction with significantly lower computational cost and greater interpretability as compared to deep learning-based models.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01071-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrosynthetic crosstalk between single-step reaction and multi-step planning 单步反应与多步计划间的反合成串扰
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-28 DOI: 10.1186/s13321-025-01088-z
Junseok Choe, Hajung Kim, Yan Ting Chok, Mogan Gim, Jaewoo Kang
{"title":"Retrosynthetic crosstalk between single-step reaction and multi-step planning","authors":"Junseok Choe,&nbsp;Hajung Kim,&nbsp;Yan Ting Chok,&nbsp;Mogan Gim,&nbsp;Jaewoo Kang","doi":"10.1186/s13321-025-01088-z","DOIUrl":"10.1186/s13321-025-01088-z","url":null,"abstract":"<div><p>Retrosynthesis—the process of deconstructing complex molecules into simpler, more accessible precursors—is a cornerstone of drug discovery and material design. While machine learning has improved single-step retrosynthesis prediction, generating complete multi-step retrosynthetic routes remains challenging. In this study, we explore the integration of single-step retrosynthesis models with various planning algorithms to improve multi-step retrosynthetic route generation. We expand the exploration space beyond previously limited settings by incorporating combinations of planning algorithms and single-step retrosynthesis models and diverse datasets, enabling a more comprehensive assessment of retrosynthetic strategies. We evaluated synthetic routes based on both solvability, the ability to generate a complete route, and route feasibility, which reflects their practical executability in the laboratory. Our findings show that the model combination with the highest solvability does not always produce the most feasible routes, underscoring the need for more nuanced evaluation. Through a systematic analysis of combinations of planning algorithms and single-step retrosynthesis models, their performance across different datasets, and various practical metrics, our study provides a more comprehensive evaluation of retrosynthetic planning strategies. These insights contribute to a better understanding of computational retrosynthesis and its alignment with real-world applicability.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01088-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comment on “Advancing material property prediction: using physics-informed machine learning models for viscosity” 对“推进材料性能预测:使用物理信息的粘度机器学习模型”的评论
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-28 DOI: 10.1186/s13321-025-01070-9
Maximilian Fleck, Samir Darouich, Marcelle B. M. Spera, Niels Hansen
{"title":"Comment on “Advancing material property prediction: using physics-informed machine learning models for viscosity”","authors":"Maximilian Fleck,&nbsp;Samir Darouich,&nbsp;Marcelle B. M. Spera,&nbsp;Niels Hansen","doi":"10.1186/s13321-025-01070-9","DOIUrl":"10.1186/s13321-025-01070-9","url":null,"abstract":"<div><p>When data availability is limited, the prediction of properties through purely data-driven machine learning (ML) is challenging. Integrating physically-based modeling techniques into ML methods may lead to better performance. In a recent work by Chew et al. (“<i>Advancing material property prediction: using physics-informed machine learning models for viscosity</i>”) descriptors from classical molecular dynamics (MD) simulations were included into a quantitative structure–property relationship to accurately predict temperature-dependent viscosity of pure liquids. Through feature importance analysis, the authors found that heat of vaporization was the most relevant descriptor for the prediction of viscosity. In this comment, we would like to discuss the physical origin of this finding by referring to Eyring’s rate theory, and develop an alternative modeling approach using a thermodynamic-based architecture that requires less input data.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01070-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic benchmarking of 13 AI methods for predicting cyclic peptide membrane permeability 13种预测环肽膜通透性的人工智能方法的系统基准测试
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-28 DOI: 10.1186/s13321-025-01083-4
Wei Liu, Jianguo Li, Chandra S. Verma, Hwee Kuan Lee
{"title":"Systematic benchmarking of 13 AI methods for predicting cyclic peptide membrane permeability","authors":"Wei Liu,&nbsp;Jianguo Li,&nbsp;Chandra S. Verma,&nbsp;Hwee Kuan Lee","doi":"10.1186/s13321-025-01083-4","DOIUrl":"10.1186/s13321-025-01083-4","url":null,"abstract":"<div><p>Cyclic peptides are promising drug candidates due to their ability to modulate intracellular protein–protein interactions, a property often inaccessible to small molecules. However, their typically poor membrane permeability limits therapeutic applicability. Accurate computational prediction of permeability can accelerate the identification of cell-permeable candidates, reducing reliance on time-consuming and costly experimental screening. Although deep learning has shown potential in predicting molecular properties, its application in permeability prediction remains underexplored. A systematic evaluation of these models is important to assess current capabilities and guide future development. In this study, we conduct a comprehensive benchmark of 13 machine learning models for predicting cyclic peptide membrane permeability. These models cover four types of molecular representations: fingerprints, SMILES strings, molecular graphs, and 2D images. We use experimentally measured PAMPA permeability data from the CycPeptMPDB database, comprising nearly 6000 cyclic peptides, and evaluate performance across three prediction tasks: regression, binary classification, and soft-label classification. Two data-splitting strategies, random split and scaffold split, are used to assess the generalizability of trained models. Our results show that model performance depends strongly on molecular representation and model architecture. Graph-based models, particularly the Directed Message Passing Neural Network (DMPNN), consistently achieve top performance across tasks. Regression generally outperforms classification. Scaffold-based splitting, although intended to more rigorously assess generalization, yields substantially lower model generalizability compared to random splitting. Comparing prediction errors with experimental variability highlights the practical value of current models while also indicating room for further improvement.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01083-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
xBitterT5: an explainable transformer-based framework with multimodal inputs for identifying bitter-taste peptides xbitt5:一个可解释的基于转换器的框架,具有多模态输入,用于识别苦味肽
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-20 DOI: 10.1186/s13321-025-01078-1
Nguyen Doan Hieu Nguyen, Nhat Truong Pham, Duong Thanh Tran, Leyi Wei, Adeel Malik, Balachandran Manavalan
{"title":"xBitterT5: an explainable transformer-based framework with multimodal inputs for identifying bitter-taste peptides","authors":"Nguyen Doan Hieu Nguyen,&nbsp;Nhat Truong Pham,&nbsp;Duong Thanh Tran,&nbsp;Leyi Wei,&nbsp;Adeel Malik,&nbsp;Balachandran Manavalan","doi":"10.1186/s13321-025-01078-1","DOIUrl":"10.1186/s13321-025-01078-1","url":null,"abstract":"<div><p>Bitter peptides (BPs), derived from the hydrolysis of proteins in food, play a crucial role in both food science and biomedicine by influencing taste perception and participating in various physiological processes. Accurate identification of BPs is essential for understanding food quality and potential health impacts. Traditional machine learning approaches for BP identification have relied on conventional feature descriptors, achieving moderate success but struggling with the complexities of biological sequence data. Recent advances utilizing protein language model embedding and meta-learning approaches have improved the accuracy, but frequently neglect the molecular representations of peptides and lack interpretability. In this study, we propose xBitterT5, a novel multimodal and interpretable framework for BP identification that integrates pretrained transformer-based embeddings from BioT5+ with the combination of peptide sequence and its SELFIES molecular representation. Specifically, incorporating both peptide sequences and their molecular strings, xBitterT5 demonstrates superior performance compared to previous methods on the same benchmark datasets. Importantly, the model provides residue-level interpretability, highlighting chemically meaningful substructures that significantly contribute to its bitterness, thus offering mechanistic insights beyond black-box predictions. A user-friendly web server (https://balalab-skku.org/xBitterT5/) and a standalone version (https://github.com/cbbl-skku-org/xBitterT5/) are freely available to support both computational biologists and experimental researchers in peptide-based food and biomedicine.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01078-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144880932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReactionT5: a pre-trained transformer model for accurate chemical reaction prediction with limited data 反应5:一个预先训练的变压器模型,在有限的数据下进行准确的化学反应预测
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-19 DOI: 10.1186/s13321-025-01075-4
Tatsuya Sagawa, Ryosuke Kojima
{"title":"ReactionT5: a pre-trained transformer model for accurate chemical reaction prediction with limited data","authors":"Tatsuya Sagawa,&nbsp;Ryosuke Kojima","doi":"10.1186/s13321-025-01075-4","DOIUrl":"10.1186/s13321-025-01075-4","url":null,"abstract":"<div><p>Accurate chemical reaction prediction is critical for reducing both cost and time in drug development. This study introduces ReactionT5, a transformer-based chemical reaction foundation model pre-trained on the Open Reaction Database—a large publicly available reaction dataset. In benchmarks for product prediction, retrosynthesis, and yield prediction, ReactionT5 outperformed existing models. Specifically, ReactionT5 achieved 97.5% accuracy in product prediction, 71.0% in retrosynthesis, and a coefficient of determination of 0.947 in yield prediction. Remarkably, ReactionT5, when fine-tuned with only a limited dataset of reactions, achieved performance on par with models fine-tuned on the complete dataset. Additionally, the visualization of ReactionT5 embeddings illustrates that the model successfully captures and represents the chemical reaction space, indicating effective learning of reaction properties.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01075-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144868633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation 基于分子优化增广图特征的图神经网络改进药物性肝损伤预测
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-18 DOI: 10.1186/s13321-025-01068-3
Taeyeub Lee, Joram M. Posma
{"title":"Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation","authors":"Taeyeub Lee,&nbsp;Joram M. Posma","doi":"10.1186/s13321-025-01068-3","DOIUrl":"10.1186/s13321-025-01068-3","url":null,"abstract":"<div><h3>Purpose</h3><p>Drug-induced liver injury (DILI) is a significant concern in drug development, often leading to the discontinuation of clinical trials and the withdrawal of drugs from the market. This study explores the application of graph neural networks (GNNs) for DILI prediction, using molecular graph representations as the primary input.</p><h3>Methods</h3><p>We evaluated several GNN architectures, including Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), Graph Sample and Aggregation (GraphSAGE), and Graph Isomorphism Networks (GINs), using the latest FDA DILI dataset and other molecular property prediction datasets. We introduce a novel approach that creates a custom graph dataset, driven by molecular optimisation, that incorporates detailed and realistic chemical features such as bond lengths and partial charges as input into the GNN models. We have named our model approach DILIGeNN.</p><h3>Results</h3><p>DILIGeNN achieved an AUC of 0.897 on the DILI dataset, surpassing the current state-of-the-art model in the DILI prediction task. Furthermore, DILIGeNN outperformed the state-of-the-art in other graph-based molecular prediction tasks, achieving an AUC of 0.918 on the Clintox dataset, 0.993 on the BBBP dataset, and 0.953 on the BACE dataset, indicating strong generalisation and performance across different datasets.</p><h3>Conclusion</h3><p>DILIGeNN, utilising a single graph representation as input, outperforms the state-of-the-art methods in DILI prediction that incorporate both molecular fingerprint and graph-structured data. These findings highlight the effectiveness of our molecular graph generation and the GNN training approach as a powerful tool for early-stage drug development and drug repurposing pipeline.</p><p>Scientific Contribution: DILIGeNN is a GNN framework that extracts graph features from 3D optimised molecular structures as is done in target-based drug discovery and molecular docking simulation. Our method is the first to encode spatial and electrostatic information into a single graph representation, as opposed to other work that require multiple graphs or additional chemical descriptors for feature representation. Our approach, using warm starts following repeated early stopping during training, outperforms the current state-of-the-art methods in liver toxicity (DILI), permeability (BBBP) and activity (BACE) prediction tasks.</p><h3>Graphic Abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01068-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144861405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信