Joseph A Napoli, Michael Reutlinger, Patricia Brandl, Wenyi Wang, Jérôme Hert, Prashant Desai
{"title":"Multitask Deep Learning Models of Combined Industrial Absorption, Distribution, Metabolism, and Excretion Datasets to Improve Generalization.","authors":"Joseph A Napoli, Michael Reutlinger, Patricia Brandl, Wenyi Wang, Jérôme Hert, Prashant Desai","doi":"10.1021/acs.molpharmaceut.4c01086","DOIUrl":null,"url":null,"abstract":"<p><p>The optimization of absorption, distribution, metabolism, and excretion (ADME) profiles of compounds is critical to the drug discovery process. As such, machine learning (ML) models for ADME are widely used for prioritizing the design and synthesis of compounds. The effectiveness of ML models for ADME depends on the availability of high-quality experimental data for a diverse set of compounds that is relevant to the emerging chemical space being explored by the drug discovery teams. To that end, ADME data sets from Genentech and Roche were combined to evaluate the impact of expanding the chemical space on the performance of ML models, a first experiment of its kind for large-scale, historical ADME data sets. The combined ADME data set consisted of over 1 million individual measurements distributed across 11 assay end points. We utilized a multitask (MT) neural network architecture that enables the modeling of multiple end points simultaneously and thereby exploits information transfer between interconnected ADME end points. Both single- and cross-site MT models were trained and compared against single-site, single-task baseline models. Given the differences in assay protocols across the two sites, the data for corresponding end points across sites were modeled as separate tasks. Models were evaluated against test sets representing varying degrees of extrapolation difficulty, including cluster-based, temporal, and external test sets. We found that cross-site MT models appeared to provide a greater generalization capacity compared to single-site models. The performance improvement of the cross-site MT models was more pronounced for the relatively \"distant\" external and temporal test sets, suggesting an expanded applicability domain. The data exchange exercise described here demonstrates the value of expanding the learning from ADME data from multiple sources without the need to aggregate such data when the experimental methods are disparate.</p>","PeriodicalId":52,"journal":{"name":"Molecular Pharmaceutics","volume":" ","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Pharmaceutics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1021/acs.molpharmaceut.4c01086","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
The optimization of absorption, distribution, metabolism, and excretion (ADME) profiles of compounds is critical to the drug discovery process. As such, machine learning (ML) models for ADME are widely used for prioritizing the design and synthesis of compounds. The effectiveness of ML models for ADME depends on the availability of high-quality experimental data for a diverse set of compounds that is relevant to the emerging chemical space being explored by the drug discovery teams. To that end, ADME data sets from Genentech and Roche were combined to evaluate the impact of expanding the chemical space on the performance of ML models, a first experiment of its kind for large-scale, historical ADME data sets. The combined ADME data set consisted of over 1 million individual measurements distributed across 11 assay end points. We utilized a multitask (MT) neural network architecture that enables the modeling of multiple end points simultaneously and thereby exploits information transfer between interconnected ADME end points. Both single- and cross-site MT models were trained and compared against single-site, single-task baseline models. Given the differences in assay protocols across the two sites, the data for corresponding end points across sites were modeled as separate tasks. Models were evaluated against test sets representing varying degrees of extrapolation difficulty, including cluster-based, temporal, and external test sets. We found that cross-site MT models appeared to provide a greater generalization capacity compared to single-site models. The performance improvement of the cross-site MT models was more pronounced for the relatively "distant" external and temporal test sets, suggesting an expanded applicability domain. The data exchange exercise described here demonstrates the value of expanding the learning from ADME data from multiple sources without the need to aggregate such data when the experimental methods are disparate.
期刊介绍:
Molecular Pharmaceutics publishes the results of original research that contributes significantly to the molecular mechanistic understanding of drug delivery and drug delivery systems. The journal encourages contributions describing research at the interface of drug discovery and drug development.
Scientific areas within the scope of the journal include physical and pharmaceutical chemistry, biochemistry and biophysics, molecular and cellular biology, and polymer and materials science as they relate to drug and drug delivery system efficacy. Mechanistic Drug Delivery and Drug Targeting research on modulating activity and efficacy of a drug or drug product is within the scope of Molecular Pharmaceutics. Theoretical and experimental peer-reviewed research articles, communications, reviews, and perspectives are welcomed.