{"title":"AI In Action: Redefining Drug Discovery and Development","authors":"Anshul Kanakia, Mark Sale, Liang Zhao, Zhu Zhou","doi":"10.1111/cts.70149","DOIUrl":null,"url":null,"abstract":"<p>AI has revolutionized the drug discovery space in recent years, with applications ranging from highly accurate structure predictions of proteins [<span>1</span>], to the design and optimization of both small and large molecules [<span>2</span>]. Several large foundational models have been developed for encoding functional information of proteins in a powerful way to support the drug development pipeline [<span>3, 4</span>]. Figure 1 highlights the areas in the pipeline where AI now plays a significant role and is poised to disrupt traditional experimental techniques. The culmination of AI-driven discovery is de novo design, where the entire preclinical pipeline can be performed in silico, resulting in billions of dollars of R&D cost savings, translating to reduced costs of medications and higher clinical success rates via optimization of safer and more developable molecules showing strong efficacy for well-selected targets.</p><p>While de novo design is as-yet unproven, the success rate of the 21 AI-developed drugs that have completed Phase I trials as of December 2023 is 80%–90%, significantly higher than ~40% for traditional methods [<span>5</span>]. We continue to see an increase in the number of candidate drugs developed using AI enter clinical stages, and this trend is growing at an exponential rate—from 3 in 2016 to 17 in 2020 and 67 in 2023 [<span>5</span>].</p><p>The intersection between high-quality data access across life science modalities like imaging, multi-omics, DMRs, and very large protein repertoires, and recent advancements in the scaling and architecture of large deep learning models has led to an explosion in AI applications for healthcare. While some of this data is publicly available, much of it is proprietary and under the control of large pharmaceutical companies, partly due to regulatory and privacy concerns. Conversely, innovation in AI for drug discovery is being led by academic and industry research laboratories, often resulting in highly funded spin-off ventures like Genentech, Recursion, Absci, and more recently, Evolutionary Scale. Such AI-first life sciences companies have found success in synergistic partnerships with large pharmaceutical companies, thereby gaining access to the large proprietary datasets upon which to apply their AI expertise. Some of these partnerships have led to acquisitions such as the 2009 purchase of Genentech by Roche for approximately $46.8 billion, highlighting the value that AI internalization brings to large pharmaceutical companies.</p><p>The use of AI is poised to cover the full life cycle of a drug product, including drug discovery, drug development, and application assessment in a regulatory setting. Recent research from the Food and Drug Administration (FDA) included two distinct case studies. The first case exemplifies the use of conventional machine learning (ML) approaches through a project aimed at decoding kinase–adverse event associations for small molecule kinase inhibitors (SMKIs). By constructing a multi-domain dataset from 4638 patients in registrational trials of 16 FDA-approved SMKIs, ML models such as Random Survival Forests (RSF), Artificial Neural Networks (ANNs), and DeepHit were utilized to find potential associations between 442 kinases and 2145 adverse events. This information was made publicly accessible via an interactive web application, “Identification of Kinase-Specific Signal” (https://gongj.shinyapps.io/ml4ki). This platform aids experimentalists in identifying and verifying kinase-inhibitor adverse event pairs and serves as a precision-medicine tool to mitigate individual patient safety risks by forecasting clinical safety signals [<span>6</span>]. In general, the credibility of AI models in extrapolation and generalization heavily depends on the diversity and comprehensiveness of the training data. Future studies integrating richer datasets with detailed genomic, phenotypic, and demographic information could further improve the precision of such associations and help refine the applicability of these models to specific patient subgroups. For future research, while Multi-Input Neural Networks were not employed in this study, they represent a promising architecture for integrating heterogeneous datasets, such as kinase activity, demographic data, and clinical outcomes, into a unified predictive framework. Additionally, hybrid approaches combining neural networks with Markov Chains could be explored to capture sequential dependencies in disease progression and improve the robustness of predictions across diverse patient cohorts.</p><p>The second case study showcases the application of generative AI methods through the development of PharmBERT, a domain-specific large language model (LLM) for drug labels [<span>7</span>]. Leveraging the foundational BERT architecture, PharmBERT was pre-trained on textual data extracted from 138,924 raw drug labels sourced from DailyMed. This pre-training on domain-specific text significantly improved the model's performance in extracting pharmacokinetic information from drug labeling. PharmBERT demonstrated superior performance in tasks such as adverse drug reaction (ADR) detection and ADME (absorption, distribution, metabolism, and excretion) classification, surpassing other models like ClinicalBERT and BioBERT. This advancement underscores the potential of LLMs to enhance the efficiency of text-related regulatory work and improve the extraction of critical information from complex drug labels.</p><p>Together, these case studies illustrate the transformative impact of AI on drug development and regulatory science. Traditional AI methods provide robust frameworks for specific, structured data analyses, while generative AI methods offer expansive capabilities for handling unstructured data and developing generalized intelligence. Both approaches are crucial for advancing personalized medicine and optimizing drug development processes.</p><p>Figure 2 summarizes the results from two surveys during the “When AI Meets Drug Development” session at the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting. The first question evaluates views on AI's potential as a significant change in drug R&D. Notably, 80% of participants recognized AI's significant impact, while 12% were unconvinced. No participants were unaware of AI's application in drug R&D, suggesting a high level of awareness within the clinical pharmacology community. A small minority (6%) were uncertain about AI's current capabilities, and 2% selected an unspecified option. Regarding AI's future impact in the next 5–10 years, 45% highlighted a preference for its application in molecule design and optimization, followed by clinical trials and development (28%), target discovery and validation (20%), and preclinical testing and screening (7%). The results highlight the current familiarity, usage, and perceptions of AI among clinical pharmacology community, indicating a strong interest and optimism about AI's role in the future of drug development.</p><p>Looking ahead, the integration of AI in drug R&D is poised to accelerate, driven by advancements from leading tech companies. NVIDIA's powerful GPUs and AI frameworks are enabling faster and more efficient generative drug discovery processes. Google Health is leveraging its expertise in data analytics and ML to enhance predictive modeling and patient data analysis. Apple Health is contributing through its health data ecosystem, facilitating personalized medicine and real-time health monitoring. OpenAI's cutting-edge language models are revolutionizing the way researchers generate hypotheses and analyze scientific literature. These innovations collectively promise to streamline the drug development pipeline, reduce costs, and improve clinical outcomes, heralding a new era of precision medicine.</p><p>As global investment in AI for drug discovery accelerates, so does the expectation of improved outcomes for drug programs. As of 2024, there are no on-market medications that have been developed using an AI-first pipeline. Future drivers for AI, particularly in healthcare, need to show disruption to existing business processes and tangible financial gains. This could happen via the launch of the first AI-developed medication or AI-based clinical pipeline improvements that significantly reduce the lead time from first patient in to regulatory approval.</p><p>M.S. is an employee of Certara. A.K. is an employee of AstraZeneca. All other authors declared no competing interests for this work.</p>","PeriodicalId":50610,"journal":{"name":"Cts-Clinical and Translational Science","volume":"18 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cts.70149","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cts-Clinical and Translational Science","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cts.70149","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
AI has revolutionized the drug discovery space in recent years, with applications ranging from highly accurate structure predictions of proteins [1], to the design and optimization of both small and large molecules [2]. Several large foundational models have been developed for encoding functional information of proteins in a powerful way to support the drug development pipeline [3, 4]. Figure 1 highlights the areas in the pipeline where AI now plays a significant role and is poised to disrupt traditional experimental techniques. The culmination of AI-driven discovery is de novo design, where the entire preclinical pipeline can be performed in silico, resulting in billions of dollars of R&D cost savings, translating to reduced costs of medications and higher clinical success rates via optimization of safer and more developable molecules showing strong efficacy for well-selected targets.
While de novo design is as-yet unproven, the success rate of the 21 AI-developed drugs that have completed Phase I trials as of December 2023 is 80%–90%, significantly higher than ~40% for traditional methods [5]. We continue to see an increase in the number of candidate drugs developed using AI enter clinical stages, and this trend is growing at an exponential rate—from 3 in 2016 to 17 in 2020 and 67 in 2023 [5].
The intersection between high-quality data access across life science modalities like imaging, multi-omics, DMRs, and very large protein repertoires, and recent advancements in the scaling and architecture of large deep learning models has led to an explosion in AI applications for healthcare. While some of this data is publicly available, much of it is proprietary and under the control of large pharmaceutical companies, partly due to regulatory and privacy concerns. Conversely, innovation in AI for drug discovery is being led by academic and industry research laboratories, often resulting in highly funded spin-off ventures like Genentech, Recursion, Absci, and more recently, Evolutionary Scale. Such AI-first life sciences companies have found success in synergistic partnerships with large pharmaceutical companies, thereby gaining access to the large proprietary datasets upon which to apply their AI expertise. Some of these partnerships have led to acquisitions such as the 2009 purchase of Genentech by Roche for approximately $46.8 billion, highlighting the value that AI internalization brings to large pharmaceutical companies.
The use of AI is poised to cover the full life cycle of a drug product, including drug discovery, drug development, and application assessment in a regulatory setting. Recent research from the Food and Drug Administration (FDA) included two distinct case studies. The first case exemplifies the use of conventional machine learning (ML) approaches through a project aimed at decoding kinase–adverse event associations for small molecule kinase inhibitors (SMKIs). By constructing a multi-domain dataset from 4638 patients in registrational trials of 16 FDA-approved SMKIs, ML models such as Random Survival Forests (RSF), Artificial Neural Networks (ANNs), and DeepHit were utilized to find potential associations between 442 kinases and 2145 adverse events. This information was made publicly accessible via an interactive web application, “Identification of Kinase-Specific Signal” (https://gongj.shinyapps.io/ml4ki). This platform aids experimentalists in identifying and verifying kinase-inhibitor adverse event pairs and serves as a precision-medicine tool to mitigate individual patient safety risks by forecasting clinical safety signals [6]. In general, the credibility of AI models in extrapolation and generalization heavily depends on the diversity and comprehensiveness of the training data. Future studies integrating richer datasets with detailed genomic, phenotypic, and demographic information could further improve the precision of such associations and help refine the applicability of these models to specific patient subgroups. For future research, while Multi-Input Neural Networks were not employed in this study, they represent a promising architecture for integrating heterogeneous datasets, such as kinase activity, demographic data, and clinical outcomes, into a unified predictive framework. Additionally, hybrid approaches combining neural networks with Markov Chains could be explored to capture sequential dependencies in disease progression and improve the robustness of predictions across diverse patient cohorts.
The second case study showcases the application of generative AI methods through the development of PharmBERT, a domain-specific large language model (LLM) for drug labels [7]. Leveraging the foundational BERT architecture, PharmBERT was pre-trained on textual data extracted from 138,924 raw drug labels sourced from DailyMed. This pre-training on domain-specific text significantly improved the model's performance in extracting pharmacokinetic information from drug labeling. PharmBERT demonstrated superior performance in tasks such as adverse drug reaction (ADR) detection and ADME (absorption, distribution, metabolism, and excretion) classification, surpassing other models like ClinicalBERT and BioBERT. This advancement underscores the potential of LLMs to enhance the efficiency of text-related regulatory work and improve the extraction of critical information from complex drug labels.
Together, these case studies illustrate the transformative impact of AI on drug development and regulatory science. Traditional AI methods provide robust frameworks for specific, structured data analyses, while generative AI methods offer expansive capabilities for handling unstructured data and developing generalized intelligence. Both approaches are crucial for advancing personalized medicine and optimizing drug development processes.
Figure 2 summarizes the results from two surveys during the “When AI Meets Drug Development” session at the 2024 American Society of Clinical Pharmacology and Therapeutics Annual Meeting. The first question evaluates views on AI's potential as a significant change in drug R&D. Notably, 80% of participants recognized AI's significant impact, while 12% were unconvinced. No participants were unaware of AI's application in drug R&D, suggesting a high level of awareness within the clinical pharmacology community. A small minority (6%) were uncertain about AI's current capabilities, and 2% selected an unspecified option. Regarding AI's future impact in the next 5–10 years, 45% highlighted a preference for its application in molecule design and optimization, followed by clinical trials and development (28%), target discovery and validation (20%), and preclinical testing and screening (7%). The results highlight the current familiarity, usage, and perceptions of AI among clinical pharmacology community, indicating a strong interest and optimism about AI's role in the future of drug development.
Looking ahead, the integration of AI in drug R&D is poised to accelerate, driven by advancements from leading tech companies. NVIDIA's powerful GPUs and AI frameworks are enabling faster and more efficient generative drug discovery processes. Google Health is leveraging its expertise in data analytics and ML to enhance predictive modeling and patient data analysis. Apple Health is contributing through its health data ecosystem, facilitating personalized medicine and real-time health monitoring. OpenAI's cutting-edge language models are revolutionizing the way researchers generate hypotheses and analyze scientific literature. These innovations collectively promise to streamline the drug development pipeline, reduce costs, and improve clinical outcomes, heralding a new era of precision medicine.
As global investment in AI for drug discovery accelerates, so does the expectation of improved outcomes for drug programs. As of 2024, there are no on-market medications that have been developed using an AI-first pipeline. Future drivers for AI, particularly in healthcare, need to show disruption to existing business processes and tangible financial gains. This could happen via the launch of the first AI-developed medication or AI-based clinical pipeline improvements that significantly reduce the lead time from first patient in to regulatory approval.
M.S. is an employee of Certara. A.K. is an employee of AstraZeneca. All other authors declared no competing interests for this work.
期刊介绍:
Clinical and Translational Science (CTS), an official journal of the American Society for Clinical Pharmacology and Therapeutics, highlights original translational medicine research that helps bridge laboratory discoveries with the diagnosis and treatment of human disease. Translational medicine is a multi-faceted discipline with a focus on translational therapeutics. In a broad sense, translational medicine bridges across the discovery, development, regulation, and utilization spectrum. Research may appear as Full Articles, Brief Reports, Commentaries, Phase Forwards (clinical trials), Reviews, or Tutorials. CTS also includes invited didactic content that covers the connections between clinical pharmacology and translational medicine. Best-in-class methodologies and best practices are also welcomed as Tutorials. These additional features provide context for research articles and facilitate understanding for a wide array of individuals interested in clinical and translational science. CTS welcomes high quality, scientifically sound, original manuscripts focused on clinical pharmacology and translational science, including animal, in vitro, in silico, and clinical studies supporting the breadth of drug discovery, development, regulation and clinical use of both traditional drugs and innovative modalities.