Machine learning-driven bioavailability prediction in early-stage drug development: a KNIME-based computational workflow for digital health applications.
{"title":"Machine learning-driven bioavailability prediction in early-stage drug development: a KNIME-based computational workflow for digital health applications.","authors":"Majdi Hammami, Walid Yeddes, Hamza Gadhoumi, Raghda Yazidi, Moufida Saidani Tounsi, Kamel Msaada","doi":"10.1080/00498254.2025.2508804","DOIUrl":null,"url":null,"abstract":"<p><p>Bioavailability prediction remains a significant challenge in early-stage drug development, where conventional experimental approaches are time-consuming and resource-intensive. This study explores the application of machine learning techniques to enhance the efficiency of bioavailability prediction. By leveraging computational workflows within the KNIME Analytics Platform, we aim to automate bioavailability assessment and reduce dependence on costly <i>in vitro</i> and <i>in vivo</i> studies.</p><p><p>A dataset comprising 475 drug-like compounds characterised by key molecular descriptors was analysed using multiple machine learning models, including Random Forest, Gradient Boosting, Decision Trees, k-Nearest Neighbours, and neural networks. Model performance was assessed through 5-fold cross-validation, with ensemble models outperforming linear and neural network-based approaches. Random Forest demonstrated the highest predictive performance (<i>R</i><sup>2</sup> = 0.87, RMSE = 0.08). Feature importance analysis identified topological polar surface area and solubility as the most influential factors in bioavailability prediction.</p><p><p>The findings underscore the potential of integrating open-source tools and machine learning methodologies in pharmaceutical research, improving workflow efficiency while adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. This approach facilitates rapid and cost-effective bioavailability assessment, supporting AI-driven predictive modelling and digital health applications in drug development.</p>","PeriodicalId":23812,"journal":{"name":"Xenobiotica","volume":" ","pages":"1-10"},"PeriodicalIF":1.3000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Xenobiotica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/00498254.2025.2508804","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
Bioavailability prediction remains a significant challenge in early-stage drug development, where conventional experimental approaches are time-consuming and resource-intensive. This study explores the application of machine learning techniques to enhance the efficiency of bioavailability prediction. By leveraging computational workflows within the KNIME Analytics Platform, we aim to automate bioavailability assessment and reduce dependence on costly in vitro and in vivo studies.
A dataset comprising 475 drug-like compounds characterised by key molecular descriptors was analysed using multiple machine learning models, including Random Forest, Gradient Boosting, Decision Trees, k-Nearest Neighbours, and neural networks. Model performance was assessed through 5-fold cross-validation, with ensemble models outperforming linear and neural network-based approaches. Random Forest demonstrated the highest predictive performance (R2 = 0.87, RMSE = 0.08). Feature importance analysis identified topological polar surface area and solubility as the most influential factors in bioavailability prediction.
The findings underscore the potential of integrating open-source tools and machine learning methodologies in pharmaceutical research, improving workflow efficiency while adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. This approach facilitates rapid and cost-effective bioavailability assessment, supporting AI-driven predictive modelling and digital health applications in drug development.
期刊介绍:
Xenobiotica covers seven main areas, including:General Xenobiochemistry, including in vitro studies concerned with the metabolism, disposition and excretion of drugs, and other xenobiotics, as well as the structure, function and regulation of associated enzymesClinical Pharmacokinetics and Metabolism, covering the pharmacokinetics and absorption, distribution, metabolism and excretion of drugs and other xenobiotics in manAnimal Pharmacokinetics and Metabolism, covering the pharmacokinetics, and absorption, distribution, metabolism and excretion of drugs and other xenobiotics in animalsPharmacogenetics, defined as the identification and functional characterisation of polymorphic genes that encode xenobiotic metabolising enzymes and transporters that may result in altered enzymatic, cellular and clinical responses to xenobioticsMolecular Toxicology, concerning the mechanisms of toxicity and the study of toxicology of xenobiotics at the molecular levelXenobiotic Transporters, concerned with all aspects of the carrier proteins involved in the movement of xenobiotics into and out of cells, and their impact on pharmacokinetic behaviour in animals and manTopics in Xenobiochemistry, in the form of reviews and commentaries are primarily intended to be a critical analysis of the issue, wherein the author offers opinions on the relevance of data or of a particular experimental approach or methodology