Gintautas Kamuntavičius, Tanya Paquet, Orestis Bastas, Dainius Šalkauskas, Alvaro Prat, Hisham Abdel Aty, Aurimas Pabrinkis, Povilas Norvaišas, Roy Tal
{"title":"Benchmarking ML in ADMET predictions: the practical impact of feature representations in ligand-based models","authors":"Gintautas Kamuntavičius, Tanya Paquet, Orestis Bastas, Dainius Šalkauskas, Alvaro Prat, Hisham Abdel Aty, Aurimas Pabrinkis, Povilas Norvaišas, Roy Tal","doi":"10.1186/s13321-025-01041-0","DOIUrl":null,"url":null,"abstract":"<div><p>This study, focusing on predicting Absorption, Distribution, Metabolism, Excretion, and Toxicology (ADMET) properties, addresses the key challenges of ML models trained using ligand-based representations. We propose a structured approach to data feature selection, taking a step beyond the conventional practice of combining different representations without systematic reasoning. Additionally, we enhance model evaluation methods by integrating cross-validation with statistical hypothesis testing, adding a layer of reliability to the model assessments. Our final evaluations include a practical scenario, where models trained on one source of data are evaluated on a different one. This approach aims to bolster the reliability of ADMET predictions, providing more dependable and informative model evaluations.</p><p><b>Scientific contribution</b></p><p>This study provided a structured approach to feature selection. We improve model evaluation by combining cross-validation with statistical hypothesis testing, making results more reliable. The methodology used in our study can be generalized beyond feature selection, boosting the confidence in selected models which is crucial in a noisy domain such as the ADMET prediction tasks. Additionally, we assess how well models trained on one dataset perform on another, offering practical insights for using external data in drug discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01041-0","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01041-0","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
This study, focusing on predicting Absorption, Distribution, Metabolism, Excretion, and Toxicology (ADMET) properties, addresses the key challenges of ML models trained using ligand-based representations. We propose a structured approach to data feature selection, taking a step beyond the conventional practice of combining different representations without systematic reasoning. Additionally, we enhance model evaluation methods by integrating cross-validation with statistical hypothesis testing, adding a layer of reliability to the model assessments. Our final evaluations include a practical scenario, where models trained on one source of data are evaluated on a different one. This approach aims to bolster the reliability of ADMET predictions, providing more dependable and informative model evaluations.
Scientific contribution
This study provided a structured approach to feature selection. We improve model evaluation by combining cross-validation with statistical hypothesis testing, making results more reliable. The methodology used in our study can be generalized beyond feature selection, boosting the confidence in selected models which is crucial in a noisy domain such as the ADMET prediction tasks. Additionally, we assess how well models trained on one dataset perform on another, offering practical insights for using external data in drug discovery.
期刊介绍:
Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling.
Coverage includes, but is not limited to:
chemical information systems, software and databases, and molecular modelling,
chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases,
computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.