Hong Wang, Pan Zhang, Stephen J Barigye, James R Empfield, Steven S Wesolowski
{"title":"Half-life prediction of central nervous system (CNS) small molecules in humans using gradient tree boosting.","authors":"Hong Wang, Pan Zhang, Stephen J Barigye, James R Empfield, Steven S Wesolowski","doi":"10.1080/17568919.2025.2557178","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>To develop a machine learning (ML) model for early-stage prediction of human half-life of oral central nervous system (CNS) drugs and to establish a curated dataset, including key <i>in</i> <i>vitro</i> and <i>in</i> <i>vivo</i> data, to support future modeling efforts.</p><p><strong>Materials & methods: </strong>Human and rat half-life, plasma protein binding (PPB), and liver microsomal clearance (LM) data for 76 diverse CNS drugs and candidates were obtained from public sources or evaluated at WuXi AppTec. Gradient tree boosting (GTB) models were constructed using ChemAxon's Trainer Engine. Feature importance was assessed, and model performance was evaluated on an external validation set.</p><p><strong>Results: </strong>The best-performing model achieved 82.4% of predictions within two-fold of observed values, with a coefficient of determination (R<sup>2</sup>) of 0.75 and a root mean square error (RMSE) of 0.25. Good generalizability was confirmed using similarity-based data splitting and Y-randomization. Integration of <i>in</i> <i>vitro</i> features, preclinical <i>in</i> <i>vivo</i> data, and physicochemical properties substantially improved predictive performance. Key features driving accurate human half-life prediction were identified.</p><p><strong>Conclusion: </strong>This model demonstrates practical applications for early-stage prediction of human half-life and prioritization of CNS drug candidates. The curated dataset offers a valuable resource to enhance internal databases and advance more robust predictive models.</p>","PeriodicalId":12475,"journal":{"name":"Future medicinal chemistry","volume":" ","pages":"2213-2219"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452432/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future medicinal chemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/17568919.2025.2557178","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/7 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: To develop a machine learning (ML) model for early-stage prediction of human half-life of oral central nervous system (CNS) drugs and to establish a curated dataset, including key invitro and invivo data, to support future modeling efforts.
Materials & methods: Human and rat half-life, plasma protein binding (PPB), and liver microsomal clearance (LM) data for 76 diverse CNS drugs and candidates were obtained from public sources or evaluated at WuXi AppTec. Gradient tree boosting (GTB) models were constructed using ChemAxon's Trainer Engine. Feature importance was assessed, and model performance was evaluated on an external validation set.
Results: The best-performing model achieved 82.4% of predictions within two-fold of observed values, with a coefficient of determination (R2) of 0.75 and a root mean square error (RMSE) of 0.25. Good generalizability was confirmed using similarity-based data splitting and Y-randomization. Integration of invitro features, preclinical invivo data, and physicochemical properties substantially improved predictive performance. Key features driving accurate human half-life prediction were identified.
Conclusion: This model demonstrates practical applications for early-stage prediction of human half-life and prioritization of CNS drug candidates. The curated dataset offers a valuable resource to enhance internal databases and advance more robust predictive models.
期刊介绍:
Future Medicinal Chemistry offers a forum for the rapid publication of original research and critical reviews of the latest milestones in the field. Strong emphasis is placed on ensuring that the journal stimulates awareness of issues that are anticipated to play an increasingly central role in influencing the future direction of pharmaceutical chemistry. Where relevant, contributions are also actively encouraged on areas as diverse as biotechnology, enzymology, green chemistry, genomics, immunology, materials science, neglected diseases and orphan drugs, pharmacogenomics, proteomics and toxicology.