人工智能在药物发现中的应用思考

IF 6.8 1区医学 Q1 CHEMISTRY, MEDICINAL

Journal of Medicinal Chemistry Pub Date : 2025-02-24 DOI:10.1021/acs.jmedchem.5c00373

Yuxi Wang, Zelin Hu, Junbiao Chang, Bin Yu

{"title":"人工智能在药物发现中的应用思考","authors":"Yuxi Wang, Zelin Hu, Junbiao Chang, Bin Yu","doi":"10.1021/acs.jmedchem.5c00373","DOIUrl":null,"url":null,"abstract":"Advances in machine learning algorithms and big data processing capabilities have propelled artificial intelligence (AI) to the forefront, with its applications in drug discovery rapidly increasing. A report from McKinsey & Company indicates that the use of AI can help pharmaceutical companies save 30 to 50% in drug development costs and increase the pipeline speed by over 20%. (1) AI technology is becoming increasingly common from drug discovery to clinical trials. First, AI can quickly screen potential active compounds using machine learning and deep learning algorithms, guiding hit-to-lead optimization to improve therapeutic efficacy and reduce toxicity, significantly enhancing the efficiency of drug discovery. Second, AI-driven predictive models can enhance the success rate of experiments, reduce research and development (R&D) costs, and even change a drug’s behavior in the body, including its pharmacokinetics (PK) and pharmacodynamics (PD), to optimize dosage and dosing regimens. By integrating expertise from various disciplines, it enables the derivation of novel conclusions, producing forward-looking results in traditionally empirical areas. Furthermore, AI has driven the development of personalized medicine, bringing significant transformations and opportunities for the future of healthcare. Drug discovery and development is the most advanced area of AI, with numerous breakthroughs already achieved (Figure 1A). For instance, Alex Zhavoronkov’s team developed a deep generative model, generative tensorial reinforcement learning (GENTRL) for de novo small molecule design. (2) The model combines reinforcement learning, variational inference, and tensor decompositions for rapid generation and optimization of potentially pharmacologically active compounds. GENTRL uses three layers of self-organizing mappings (SOMs) to generate innovative DDR1 inhibitors, and these small molecules then undergo multiple screenings and optimizations, resulting in six candidate compounds in just 21 days, with four showing potent biological activity. Even more surprisingly, ISM001-055AI, a first-in-class small molecule inhibitor developed by generative AI technology, shows positive results in a phase IIa clinical trial for treating idiopathic pulmonary fibrosis (IPF). (3) In September 2024, Insilico Medicine announced the results of a Phase IIa clinical trial of ISM001-055. The trial, which enrolled 71 IPF patients across 21 clinical research centers in China, demonstrated that ISM001-055 had a favorable safety profile at all dose levels and exhibited a dose-dependent trend of efficacy in forced vital capacity (FVC), an important measure of lung function in IPF patients. The improvement in lung function in just 12 weeks brings great hope to IPF patients. The success of ISM001-055 showcases the huge potential of AI in drug discovery and development, paving the way for larger-scale trials in the future. Fang’s team recently developed a robotic system for high-throughput chemical synthesis, online characterization, and large-scale photocatalytic reaction condition screening (10,000 reaction conditions per day). The system is based on liquid-core waveguide, microfluidic liquid-handling, and artificial intelligence techniques to significantly reduce the time and effort required for complex organic synthesis. (4) They have also developed an AI-assisted method for predicting absorbance, analyzing factors that influence convective and molecular diffusion effects. When a candidate compound is selected at the chemistry, manufacturing, and control (CMC) stage, crystal structure prediction (CSP) and structure-based stability evaluation can be used with advanced experimental screening techniques to minimize the risk of polymorphic problems and accelerate decision-making for formulation studies. (5,6) During the development of Paxlovid, Pfizer’s oral Covid-19 treatment, Pfizer and XtalPi elucidated the subtle differences in structure, dynamics, and stability of two enantiotropically related anhydrous polymorphs of the novel antiviral medicine nirmatrelvir by employing a series of highly orthogonal experimental and computational approaches. (7) This work ultimately determined Form 1 for formulation development as it is the stable form above the transition temperature of 17 °C, demonstrating the unprecedented speed in bringing Paxlovid to patients in record time amidst the pandemic. Figure 1. (A) AI-driven drug discovery process, including de novo design, compound screening, interaction mulations, organic synthesis, and optimization of pharmacokinetic properties. (B) The AlphaFold model architecture predicts the initial protein structure, demonstrating near-experimental accuracy in protein structure prediction through recycling (PDB code: 6Y4F). Some elements were created with BioRender (https://biorender.com). Antibody drugs, along with small molecule drugs, are widely used in treating cancer, autoimmune diseases, and infectious diseases because of their high specificity for targets. (8) AI can be used for structure prediction, design optimization, humanization of antibody drugs, and predicting the optimal drug-to-antibody ratio (DAR) for antibody-drug conjugates (ADCs). This helps shorten development timelines and increase success rates. AI’s superior generalization ability and high degree of automation make it highly effective in vaccine development, which usually takes over 10 years. For example, AI has played a crucial role in optimizing mRNA sequences. (9,10) The LinearDesign deep learning algorithm has greatly enhanced the stability, protein expression, and immunogenicity of mRNA vaccines by optimizing mRNA sequence structural stability and codon usage. The algorithm successfully solved the large-scale search problem in traditional mRNA design, finding the optimal mRNA sequence in less than 11 min. It showcases the potential of AI and computational linguistics technologies in mRNA design and was licensed to Moderna, a leading global RNA technology company in 2022 to enhance the development of mRNA vaccines and therapeutics. (10) AlphaFold is a recent breakthrough in protein structure prediction that uses a neural network-based machine learning method. It combines evolutionary data and geometric properties to accurately predict three-dimensional structures (Figure 1B). (9) After entering the amino acid sequence of a protein, Evoformer generates a multiple sequence alignment (MSA) and templates from relevant evolutionary databases to capture the relationships and covariation between sequences. The predicted structure is then fed back into Evoformer and the structure module multiple times for iterative optimization. Each recursive step, known as “Recycling,” refines the structure further to enhance accuracy. Significant progress has been made in predicting protein structures, but this is only the beginning of exploration in life sciences. AI is currently mainly used at the molecular level, but future research will integrate diverse high-quality data and advance to higher levels to discover new patterns and principles in biological systems. For example, the Peking University International Cancer Institute has partnered with XtalPi to investigate disease mechanisms and drug actions through multimodal data integration. (11) The Shu group at Peking University utilized high-throughput CRISPR technology for large-scale gene editing in cells, whereas XtalPi’s independently developed cell research platform, X-Map, which allows for collection of large-scale, high-content imaging and transcriptome data following cell perturbation. The X-Profiler algorithm developed by XtalPi is proficient at extracting relevant information for specific downstream tasks. It effectively addresses issues like out-of-focus blurring in high-content imaging due to variations in well plate edge height, reduces data noise, and enhances the signal-to-noise ratio (SNR). Furthermore, the X-Profiler can adaptively adjust data quality control strategies based on task requirements, significantly boosting model performance. By integrating experimental methods with AI algorithms, this approach enables precise observations at the cellular level using real-world multidimensional data, establishing correlations between physiological changes and gene or drug regulation. This method offers higher throughput and lower costs compared to animal models, allowing for quicker generation of high-quality data for specific research systems and improving the efficiency and success rate of drug discovery and development. Although AI is well established in many aspects of drug discovery and development, the outdated traditional education system is no longer suitable for the rapid development of AI and urgently needs reform and optimization. To address the educational challenges posed by the rapid development of AI, it is essential to have a clear understanding of its specific applications. This will help us pinpoint which areas of teaching content require additional support. The traditional teaching methods have issues like outdated curricula, lack of practical experience, and limited interdisciplinary integration, focusing too much on foundational subjects and neglecting crucial skills in data science and AI. The lack of interdisciplinary integration and practical experience makes it difficult to meet the demands of the AI era. Given AI’s increasing importance, teaching approaches should be updated. First and foremost, teachers should be trained with AI technologies to ensure they are prepared to teach AI courses. Second, interdisciplinary integration in education should be promoted to inspire insights from different disciplines. Computer science and data science should be integrated into the pharmaceutical curriculum to break down disciplinary silos. For example, the basic theory component of the drug molecular design course should include fundamental concepts of AI, machine learning, and deep learning, along with their applications in the pharmaceutical industry. This will cover skills such as data preprocessing, model training and evaluation, and feature engineering. The core course section should explore AI applications in drug target screening, molecular docking, virtual screening, molecular structure analysis, and AI modeling comprehensively. Cutting-edge AI technologies in drug discovery, such as reinforcement learning and Generative Adversarial Networks (GANs), should be introduced. GANs are a specialized type of deep learning with two neural networks─one for image generation and one for discrimination. They can efficiently create diverse, high-quality molecules, showing unique advantages and potential by learning existing data distribution to explore new chemical spaces. Third, new courses should be added and professors from various fields should collaborate to edit textbooks that combine pharmaceutical science with AI for the updated curriculum. International academic conferences and online courses are also valuable educational resources for students to learn AI. It is important to expand AI-related courses, including big data analysis, natural language processing, bioinformatics, etc., to enhance relevant skills. However, increasing reliance on AI learning unilaterally will only add more pressure on students, thus learning new courses should be personalized to avoid overwhelming students with the broad scope of AI content. Reform should focus on innovation, professionalism, and personalization to better address individual learning needs. Students should choose courses based on their professional needs, focusing on relevant skills after gaining foundational knowledge in the early stages of education. Use AI technology in real-world drug development scenarios to teach students how to solve practical problems. AI-driven practical training and hands-on experience must be included. For example, creating a virtual lab platform that integrates AI models and data analysis tools for simulating real-world scenarios like drug screening, molecular docking, and drug safety prediction. This platform should also have online collaboration features to allow students to work together in real time, mimicking a professional work environment that promotes interdisciplinary teamwork. Additionally, enhancing partnerships with pharmaceutical and AI technology companies to develop a platform that connects industry and academia is crucial. Project content should be tailored to meet the actual needs of industry partners to offer students practical solutions relevant to the field. By gaining a strong grasp of AI concepts, students will be better prepared for future roles in drug development. To be expected, AI technology has and will profoundly revolutionize the field of drug discovery. Educational policymakers and academic institutions globally should update their training programs to incorporate advanced AI technologies. Students need to learn latest AI techniques like deep learning and natural language processing to be better prepared for future challenges in drug development. Students without AI skills can be readily replaced by those who embrace AI advancement. However, we have to keep in mind that AI may not be a panacea due to its limitation in the quality of the data. High-quality, comprehensive experimental data sets are essential for generating reliable results, yet not all entities with an AI platform possess such data. The impact of AI and machine learning in drug discovery is directly dependent on the quality and depth of the training data used to develop the AI models. Regardless of how advanced the algorithms are, if the training data is biased or of poor quality, the AI predictions will be inaccurate, thereby limiting their potential to drive new drug discovery. And algorithmic biases can lead to inaccurate therapeutic effects of the model on certain populations, which is especially critical in the safety of patients’ medication. Additionally, it is not only important to pay attention to the quality of the data, but also to ensure the protection of sensitive patient data. We need to establish strict data protection measures to ensure the security of the data during acquisition, storage, and processing, thereby safeguarding patient data privacy. Y.W. and Z.H. contributed equally to this work. This work was provided by the National Natural Science Foundation of China (Nos. 82473782, 22277110, and 82473761), Sichuan Science and Technology Program (No. 2024ZYD0130), Natural Science Foundation of Henan Province (No. 242301420005), Key Research Project for Basic Research in Henan Province Universities (No. 25ZX001). The authors acknowledge Professor Craig W. Lindsley (Editor-in-Chief of Journal of Medicinal Chemistry; Executive Director, Warren Center for Neuroscience Drug Discovery, Vanderbilt University) and Dr. Shuhao Wen (Chairman of XtalPi) for their insightful discussions. This article references 11 other publications. This article has not yet been cited by other publications.","PeriodicalId":46,"journal":{"name":"Journal of Medicinal Chemistry","volume":"22 1","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Thinking on the Use of Artificial Intelligence in Drug Discovery\",\"authors\":\"Yuxi Wang, Zelin Hu, Junbiao Chang, Bin Yu\",\"doi\":\"10.1021/acs.jmedchem.5c00373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advances in machine learning algorithms and big data processing capabilities have propelled artificial intelligence (AI) to the forefront, with its applications in drug discovery rapidly increasing. A report from McKinsey & Company indicates that the use of AI can help pharmaceutical companies save 30 to 50% in drug development costs and increase the pipeline speed by over 20%. (1) AI technology is becoming increasingly common from drug discovery to clinical trials. First, AI can quickly screen potential active compounds using machine learning and deep learning algorithms, guiding hit-to-lead optimization to improve therapeutic efficacy and reduce toxicity, significantly enhancing the efficiency of drug discovery. Second, AI-driven predictive models can enhance the success rate of experiments, reduce research and development (R&D) costs, and even change a drug’s behavior in the body, including its pharmacokinetics (PK) and pharmacodynamics (PD), to optimize dosage and dosing regimens. By integrating expertise from various disciplines, it enables the derivation of novel conclusions, producing forward-looking results in traditionally empirical areas. Furthermore, AI has driven the development of personalized medicine, bringing significant transformations and opportunities for the future of healthcare. Drug discovery and development is the most advanced area of AI, with numerous breakthroughs already achieved (Figure 1A). For instance, Alex Zhavoronkov’s team developed a deep generative model, generative tensorial reinforcement learning (GENTRL) for de novo small molecule design. (2) The model combines reinforcement learning, variational inference, and tensor decompositions for rapid generation and optimization of potentially pharmacologically active compounds. GENTRL uses three layers of self-organizing mappings (SOMs) to generate innovative DDR1 inhibitors, and these small molecules then undergo multiple screenings and optimizations, resulting in six candidate compounds in just 21 days, with four showing potent biological activity. Even more surprisingly, ISM001-055AI, a first-in-class small molecule inhibitor developed by generative AI technology, shows positive results in a phase IIa clinical trial for treating idiopathic pulmonary fibrosis (IPF). (3) In September 2024, Insilico Medicine announced the results of a Phase IIa clinical trial of ISM001-055. The trial, which enrolled 71 IPF patients across 21 clinical research centers in China, demonstrated that ISM001-055 had a favorable safety profile at all dose levels and exhibited a dose-dependent trend of efficacy in forced vital capacity (FVC), an important measure of lung function in IPF patients. The improvement in lung function in just 12 weeks brings great hope to IPF patients. The success of ISM001-055 showcases the huge potential of AI in drug discovery and development, paving the way for larger-scale trials in the future. Fang’s team recently developed a robotic system for high-throughput chemical synthesis, online characterization, and large-scale photocatalytic reaction condition screening (10,000 reaction conditions per day). The system is based on liquid-core waveguide, microfluidic liquid-handling, and artificial intelligence techniques to significantly reduce the time and effort required for complex organic synthesis. (4) They have also developed an AI-assisted method for predicting absorbance, analyzing factors that influence convective and molecular diffusion effects. When a candidate compound is selected at the chemistry, manufacturing, and control (CMC) stage, crystal structure prediction (CSP) and structure-based stability evaluation can be used with advanced experimental screening techniques to minimize the risk of polymorphic problems and accelerate decision-making for formulation studies. (5,6) During the development of Paxlovid, Pfizer’s oral Covid-19 treatment, Pfizer and XtalPi elucidated the subtle differences in structure, dynamics, and stability of two enantiotropically related anhydrous polymorphs of the novel antiviral medicine nirmatrelvir by employing a series of highly orthogonal experimental and computational approaches. (7) This work ultimately determined Form 1 for formulation development as it is the stable form above the transition temperature of 17 °C, demonstrating the unprecedented speed in bringing Paxlovid to patients in record time amidst the pandemic. Figure 1. (A) AI-driven drug discovery process, including de novo design, compound screening, interaction mulations, organic synthesis, and optimization of pharmacokinetic properties. (B) The AlphaFold model architecture predicts the initial protein structure, demonstrating near-experimental accuracy in protein structure prediction through recycling (PDB code: 6Y4F). Some elements were created with BioRender (https://biorender.com). Antibody drugs, along with small molecule drugs, are widely used in treating cancer, autoimmune diseases, and infectious diseases because of their high specificity for targets. (8) AI can be used for structure prediction, design optimization, humanization of antibody drugs, and predicting the optimal drug-to-antibody ratio (DAR) for antibody-drug conjugates (ADCs). This helps shorten development timelines and increase success rates. AI’s superior generalization ability and high degree of automation make it highly effective in vaccine development, which usually takes over 10 years. For example, AI has played a crucial role in optimizing mRNA sequences. (9,10) The LinearDesign deep learning algorithm has greatly enhanced the stability, protein expression, and immunogenicity of mRNA vaccines by optimizing mRNA sequence structural stability and codon usage. The algorithm successfully solved the large-scale search problem in traditional mRNA design, finding the optimal mRNA sequence in less than 11 min. It showcases the potential of AI and computational linguistics technologies in mRNA design and was licensed to Moderna, a leading global RNA technology company in 2022 to enhance the development of mRNA vaccines and therapeutics. (10) AlphaFold is a recent breakthrough in protein structure prediction that uses a neural network-based machine learning method. It combines evolutionary data and geometric properties to accurately predict three-dimensional structures (Figure 1B). (9) After entering the amino acid sequence of a protein, Evoformer generates a multiple sequence alignment (MSA) and templates from relevant evolutionary databases to capture the relationships and covariation between sequences. The predicted structure is then fed back into Evoformer and the structure module multiple times for iterative optimization. Each recursive step, known as “Recycling,” refines the structure further to enhance accuracy. Significant progress has been made in predicting protein structures, but this is only the beginning of exploration in life sciences. AI is currently mainly used at the molecular level, but future research will integrate diverse high-quality data and advance to higher levels to discover new patterns and principles in biological systems. For example, the Peking University International Cancer Institute has partnered with XtalPi to investigate disease mechanisms and drug actions through multimodal data integration. (11) The Shu group at Peking University utilized high-throughput CRISPR technology for large-scale gene editing in cells, whereas XtalPi’s independently developed cell research platform, X-Map, which allows for collection of large-scale, high-content imaging and transcriptome data following cell perturbation. The X-Profiler algorithm developed by XtalPi is proficient at extracting relevant information for specific downstream tasks. It effectively addresses issues like out-of-focus blurring in high-content imaging due to variations in well plate edge height, reduces data noise, and enhances the signal-to-noise ratio (SNR). Furthermore, the X-Profiler can adaptively adjust data quality control strategies based on task requirements, significantly boosting model performance. By integrating experimental methods with AI algorithms, this approach enables precise observations at the cellular level using real-world multidimensional data, establishing correlations between physiological changes and gene or drug regulation. This method offers higher throughput and lower costs compared to animal models, allowing for quicker generation of high-quality data for specific research systems and improving the efficiency and success rate of drug discovery and development. Although AI is well established in many aspects of drug discovery and development, the outdated traditional education system is no longer suitable for the rapid development of AI and urgently needs reform and optimization. To address the educational challenges posed by the rapid development of AI, it is essential to have a clear understanding of its specific applications. This will help us pinpoint which areas of teaching content require additional support. The traditional teaching methods have issues like outdated curricula, lack of practical experience, and limited interdisciplinary integration, focusing too much on foundational subjects and neglecting crucial skills in data science and AI. The lack of interdisciplinary integration and practical experience makes it difficult to meet the demands of the AI era. Given AI’s increasing importance, teaching approaches should be updated. First and foremost, teachers should be trained with AI technologies to ensure they are prepared to teach AI courses. Second, interdisciplinary integration in education should be promoted to inspire insights from different disciplines. Computer science and data science should be integrated into the pharmaceutical curriculum to break down disciplinary silos. For example, the basic theory component of the drug molecular design course should include fundamental concepts of AI, machine learning, and deep learning, along with their applications in the pharmaceutical industry. This will cover skills such as data preprocessing, model training and evaluation, and feature engineering. The core course section should explore AI applications in drug target screening, molecular docking, virtual screening, molecular structure analysis, and AI modeling comprehensively. Cutting-edge AI technologies in drug discovery, such as reinforcement learning and Generative Adversarial Networks (GANs), should be introduced. GANs are a specialized type of deep learning with two neural networks─one for image generation and one for discrimination. They can efficiently create diverse, high-quality molecules, showing unique advantages and potential by learning existing data distribution to explore new chemical spaces. Third, new courses should be added and professors from various fields should collaborate to edit textbooks that combine pharmaceutical science with AI for the updated curriculum. International academic conferences and online courses are also valuable educational resources for students to learn AI. It is important to expand AI-related courses, including big data analysis, natural language processing, bioinformatics, etc., to enhance relevant skills. However, increasing reliance on AI learning unilaterally will only add more pressure on students, thus learning new courses should be personalized to avoid overwhelming students with the broad scope of AI content. Reform should focus on innovation, professionalism, and personalization to better address individual learning needs. Students should choose courses based on their professional needs, focusing on relevant skills after gaining foundational knowledge in the early stages of education. Use AI technology in real-world drug development scenarios to teach students how to solve practical problems. AI-driven practical training and hands-on experience must be included. For example, creating a virtual lab platform that integrates AI models and data analysis tools for simulating real-world scenarios like drug screening, molecular docking, and drug safety prediction. This platform should also have online collaboration features to allow students to work together in real time, mimicking a professional work environment that promotes interdisciplinary teamwork. Additionally, enhancing partnerships with pharmaceutical and AI technology companies to develop a platform that connects industry and academia is crucial. Project content should be tailored to meet the actual needs of industry partners to offer students practical solutions relevant to the field. By gaining a strong grasp of AI concepts, students will be better prepared for future roles in drug development. To be expected, AI technology has and will profoundly revolutionize the field of drug discovery. Educational policymakers and academic institutions globally should update their training programs to incorporate advanced AI technologies. Students need to learn latest AI techniques like deep learning and natural language processing to be better prepared for future challenges in drug development. Students without AI skills can be readily replaced by those who embrace AI advancement. However, we have to keep in mind that AI may not be a panacea due to its limitation in the quality of the data. High-quality, comprehensive experimental data sets are essential for generating reliable results, yet not all entities with an AI platform possess such data. The impact of AI and machine learning in drug discovery is directly dependent on the quality and depth of the training data used to develop the AI models. Regardless of how advanced the algorithms are, if the training data is biased or of poor quality, the AI predictions will be inaccurate, thereby limiting their potential to drive new drug discovery. And algorithmic biases can lead to inaccurate therapeutic effects of the model on certain populations, which is especially critical in the safety of patients’ medication. Additionally, it is not only important to pay attention to the quality of the data, but also to ensure the protection of sensitive patient data. We need to establish strict data protection measures to ensure the security of the data during acquisition, storage, and processing, thereby safeguarding patient data privacy. Y.W. and Z.H. contributed equally to this work. This work was provided by the National Natural Science Foundation of China (Nos. 82473782, 22277110, and 82473761), Sichuan Science and Technology Program (No. 2024ZYD0130), Natural Science Foundation of Henan Province (No. 242301420005), Key Research Project for Basic Research in Henan Province Universities (No. 25ZX001). The authors acknowledge Professor Craig W. Lindsley (Editor-in-Chief of Journal of Medicinal Chemistry; Executive Director, Warren Center for Neuroscience Drug Discovery, Vanderbilt University) and Dr. Shuhao Wen (Chairman of XtalPi) for their insightful discussions. This article references 11 other publications. This article has not yet been cited by other publications.\",\"PeriodicalId\":46,\"journal\":{\"name\":\"Journal of Medicinal Chemistry\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medicinal Chemistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jmedchem.5c00373\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medicinal Chemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1021/acs.jmedchem.5c00373","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

机器学习算法和大数据处理能力的进步将人工智能（AI）推向了前沿，其在药物发现中的应用迅速增加。麦肯锡的一份报告；公司表示，使用人工智能可以帮助制药公司节省30 - 50%的药物开发成本，并将管道速度提高20%以上。(1)从药物发现到临床试验，人工智能技术越来越普遍。首先，人工智能可以使用机器学习和深度学习算法快速筛选潜在的活性化合物，指导hit-to-lead优化以提高治疗效果并降低毒性，显著提高药物发现效率。其次，人工智能驱动的预测模型可以提高实验成功率，降低研发成本，甚至可以改变药物在体内的行为，包括药代动力学（PK）和药效学（PD），以优化剂量和给药方案。通过整合来自不同学科的专业知识，它能够得出新的结论，在传统的经验领域产生前瞻性的结果。此外，人工智能推动了个性化医疗的发展，为医疗保健的未来带来了重大变革和机遇。药物发现和开发是人工智能最先进的领域，已经取得了许多突破（图1A）。例如，Alex Zhavoronkov的团队为从头小分子设计开发了一种深度生成模型，生成张量强化学习（GENTRL）。(2)该模型结合了强化学习、变分推理和张量分解，用于快速生成和优化潜在的药理活性化合物。GENTRL使用三层自组织映射（SOMs）来生成创新的DDR1抑制剂，然后这些小分子经过多次筛选和优化，在短短21天内产生6种候选化合物，其中4种显示出强大的生物活性。更令人惊讶的是，ISM001-055AI是一种由生成式人工智能技术开发的一流小分子抑制剂，在治疗特发性肺纤维化（IPF）的IIa期临床试验中显示出积极的结果。(3) 2024年9月，Insilico Medicine公布了ISM001-055的IIa期临床试验结果。该试验在中国21个临床研究中心招募了71名IPF患者，结果表明ISM001-055在所有剂量水平下都具有良好的安全性，并且在用力肺活量（FVC）方面表现出剂量依赖的疗效趋势，FVC是IPF患者肺功能的重要指标。在短短12周内肺功能的改善给IPF患者带来了很大的希望。ISM001-055的成功展示了人工智能在药物发现和开发方面的巨大潜力，为未来更大规模的试验铺平了道路。Fang的团队最近开发了一个机器人系统，用于高通量化学合成，在线表征和大规模光催化反应条件筛选（每天10,000个反应条件）。该系统基于液芯波导，微流体液体处理和人工智能技术，可显着减少复杂有机合成所需的时间和精力。(4)他们还开发了一种人工智能辅助的方法来预测吸光度，分析影响对流和分子扩散效应的因素。当候选化合物在化学、制造和控制（CMC）阶段被选择时，晶体结构预测（CSP）和基于结构的稳定性评估可以与先进的实验筛选技术一起使用，以最大限度地减少多晶问题的风险，并加快配方研究的决策。（5,6）在辉瑞公司的口服新冠病毒治疗药物Paxlovid的开发过程中，辉瑞公司和XtalPi通过一系列高度正交的实验和计算方法，阐明了新型抗病毒药物nirmatrelvir的两种对向性相关无水多态性在结构、动力学和稳定性方面的微妙差异。(7)这项工作最终确定了制剂开发的形式1，因为它是高于17°C转变温度的稳定形式，显示了在大流行期间以创纪录的时间将Paxlovid带给患者的前所未有的速度。图1所示。(A)人工智能驱动的药物发现过程，包括从头设计、化合物筛选、相互作用模拟、有机合成和药代动力学性质优化。(B) AlphaFold模型架构预测了初始蛋白质结构，通过循环预测蛋白质结构显示出接近实验的准确性（PDB代码：6Y4F）。一些元素是用BioRender （https://biorender.com）创建的。抗体药物与小分子药物因其对靶点的高特异性而广泛应用于治疗癌症、自身免疫性疾病和感染性疾病。(8)人工智能可用于抗体药物的结构预测、设计优化、人源化以及预测抗体-药物偶联物（adc）的最佳药抗比（DAR）。这有助于缩短开发时间并提高成功率。人工智能优越的泛化能力和高度的自动化程度使其在通常需要10年以上的疫苗开发中非常有效。例如，人工智能在优化mRNA序列方面发挥了至关重要的作用。（9,10）线性设计深度学习算法通过优化mRNA序列结构稳定性和密码子使用，极大地提高了mRNA疫苗的稳定性、蛋白质表达和免疫原性。该算法成功解决了传统mRNA设计中的大规模搜索问题，在不到11分钟的时间内找到了最优的mRNA序列。它展示了人工智能和计算语言学技术在mRNA设计中的潜力，并于2022年被授权给全球领先的RNA技术公司Moderna，以加强mRNA疫苗和治疗方法的开发。（10） AlphaFold是最近在蛋白质结构预测方面的突破，它使用了基于神经网络的机器学习方法。它结合了进化数据和几何属性来准确预测三维结构（图1B）。(9) Evoformer在输入蛋白质的氨基酸序列后，从相关的进化数据库中生成多序列比对（multiple sequence alignment， MSA）和模板，捕捉序列之间的关系和共变。然后将预测的结构多次反馈到Evoformer和结构模块中进行迭代优化。每个递归步骤（称为“循环”）都进一步改进结构以提高准确性。在预测蛋白质结构方面已经取得了重大进展，但这只是生命科学探索的开始。人工智能目前主要应用于分子水平，但未来的研究将整合各种高质量的数据，并向更高的水平推进，以发现生物系统中的新模式和新原理。例如，北京大学国际癌症研究所与XtalPi合作，通过多模式数据集成研究疾病机制和药物作用。（11）北京大学Shu研究组利用高通量CRISPR技术在细胞中进行大规模基因编辑，而XtalPi自主开发的细胞研究平台X-Map，可以在细胞扰动后收集大规模、高含量的成像和转录组数据。XtalPi开发的X-Profiler算法精通提取特定下游任务的相关信息。它有效地解决了由于井板边缘高度变化导致的高含量成像失焦模糊等问题，降低了数据噪声，提高了信噪比（SNR）。此外，X-Profiler可以根据任务需求自适应调整数据质量控制策略，显著提高模型性能。通过将实验方法与人工智能算法相结合，该方法可以使用现实世界的多维数据在细胞水平上进行精确观察，建立生理变化与基因或药物调节之间的相关性。与动物模型相比，该方法具有更高的通量和更低的成本，可以更快地为特定研究系统生成高质量数据，并提高药物发现和开发的效率和成功率。虽然人工智能在药物发现和开发的许多方面已经建立起来，但过时的传统教育体系已经不适合人工智能的快速发展，迫切需要改革和优化。为了应对人工智能快速发展带来的教育挑战，有必要清楚地了解其具体应用。这将帮助我们确定教学内容的哪些领域需要额外的支持。传统的教学方法存在课程陈旧、缺乏实践经验、跨学科整合有限、过于关注基础学科而忽视数据科学和人工智能等关键技能等问题。缺乏跨学科的融合和实践经验，难以满足AI时代的需求。鉴于人工智能越来越重要，教学方法应该更新。首先，教师应该接受人工智能技术的培训，以确保他们准备好教授人工智能课程。第二，促进教育的跨学科融合，激发不同学科的见解。计算机科学和数据科学应该整合到药学课程中，以打破学科孤岛。例如，药物分子设计课程的基础理论部分应该包括人工智能、机器学习和深度学习的基本概念，以及它们在制药行业的应用。这将涵盖数据预处理、模型训练和评估以及特征工程等技能。核心课程部分应全面探讨AI在药物靶点筛选、分子对接、虚拟筛选、分子结构分析、AI建模等方面的应用。应该引入药物发现领域的尖端人工智能技术，如强化学习和生成对抗网络（GANs）。gan是一种特殊类型的深度学习，它有两个神经网络──一个用于图像生成，另一个用于识别。通过学习现有的数据分布，探索新的化学空间，它们可以高效地创造出多样化、高质量的分子，显示出独特的优势和潜力。第三，应该增加新的课程，让不同领域的教授合作编写结合制药科学和人工智能的教科书，以更新课程。国际学术会议和在线课程也是学生学习人工智能的宝贵教育资源。拓展人工智能相关课程，包括大数据分析、自然语言处理、生物信息学等，提升相关技能。然而，片面地依赖人工智能学习只会给学生带来更大的压力，因此学习新课程应该个性化，避免让学生被广泛的人工智能内容压垮。改革要注重创新、专业、个性化，更好满足个人学习需求。学生应根据自己的专业需求选择课程，在教育的早期阶段，在获得基础知识的基础上，重点学习相关技能。在真实的药物开发场景中使用AI技术，教学生如何解决实际问题。必须包括人工智能驱动的实践培训和实践经验。例如，创建一个虚拟实验室平台，集成人工智能模型和数据分析工具，模拟药物筛选、分子对接和药物安全性预测等现实场景。该平台还应具有在线协作功能，允许学生实时协作，模拟促进跨学科团队合作的专业工作环境。此外，加强与制药和人工智能技术公司的合作，开发连接产业界和学术界的平台也至关重要。项目内容应根据行业合作伙伴的实际需求量身定制，为学生提供与该领域相关的实用解决方案。通过对人工智能概念的掌握，学生将为未来在药物开发中的角色做好更好的准备。可以预见的是，人工智能技术已经并将深刻地改变药物发现领域。全球的教育政策制定者和学术机构应该更新他们的培训计划，以纳入先进的人工智能技术。学生们需要学习最新的人工智能技术，如深度学习和自然语言处理，以便更好地为未来的药物开发挑战做好准备。没有人工智能技能的学生很容易被那些接受人工智能进步的人所取代。然而，我们必须记住，由于数据质量的限制，人工智能可能不是万灵药。高质量、全面的实验数据集对于产生可靠的结果至关重要，但并非所有拥有人工智能平台的实体都拥有此类数据。人工智能和机器学习对药物发现的影响直接取决于用于开发人工智能模型的训练数据的质量和深度。无论算法有多先进，如果训练数据有偏见或质量差，人工智能预测将是不准确的，从而限制了它们推动新药发现的潜力。算法偏差可能导致模型对某些人群的治疗效果不准确，这对患者用药的安全性尤其重要。此外，不仅要注意数据的质量，还要确保对患者敏感数据的保护。我们需要建立严格的数据保护措施，确保数据在采集、存储和处理过程中的安全性，从而保障患者数据隐私。Y.W.和Z.H.对这项工作作出了同样的贡献。国家自然科学基金（No. 82473782、22277110、82473761）、四川省科技计划（No. 2024ZYD0130）、河南省自然科学基金（No. 242301420005）、河南省高校基础研究重点研究项目（No. 25ZX001）资助。作者感谢Craig W。 Lindsley （Journal of Medicinal Chemistry主编）；范德比尔特大学沃伦神经科学药物发现中心执行主任)和XtalPi主席文书豪博士进行了深刻的讨论。本文引用了11个其他出版物。这篇文章尚未被其他出版物引用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Thinking on the Use of Artificial Intelligence in Drug Discovery

查看原文本刊更多论文

Thinking on the Use of Artificial Intelligence in Drug Discovery

Advances in machine learning algorithms and big data processing capabilities have propelled artificial intelligence (AI) to the forefront, with its applications in drug discovery rapidly increasing. A report from McKinsey & Company indicates that the use of AI can help pharmaceutical companies save 30 to 50% in drug development costs and increase the pipeline speed by over 20%. (1) AI technology is becoming increasingly common from drug discovery to clinical trials. First, AI can quickly screen potential active compounds using machine learning and deep learning algorithms, guiding hit-to-lead optimization to improve therapeutic efficacy and reduce toxicity, significantly enhancing the efficiency of drug discovery. Second, AI-driven predictive models can enhance the success rate of experiments, reduce research and development (R&D) costs, and even change a drug’s behavior in the body, including its pharmacokinetics (PK) and pharmacodynamics (PD), to optimize dosage and dosing regimens. By integrating expertise from various disciplines, it enables the derivation of novel conclusions, producing forward-looking results in traditionally empirical areas. Furthermore, AI has driven the development of personalized medicine, bringing significant transformations and opportunities for the future of healthcare. Drug discovery and development is the most advanced area of AI, with numerous breakthroughs already achieved (Figure 1A). For instance, Alex Zhavoronkov’s team developed a deep generative model, generative tensorial reinforcement learning (GENTRL) for de novo small molecule design. (2) The model combines reinforcement learning, variational inference, and tensor decompositions for rapid generation and optimization of potentially pharmacologically active compounds. GENTRL uses three layers of self-organizing mappings (SOMs) to generate innovative DDR1 inhibitors, and these small molecules then undergo multiple screenings and optimizations, resulting in six candidate compounds in just 21 days, with four showing potent biological activity. Even more surprisingly, ISM001-055AI, a first-in-class small molecule inhibitor developed by generative AI technology, shows positive results in a phase IIa clinical trial for treating idiopathic pulmonary fibrosis (IPF). (3) In September 2024, Insilico Medicine announced the results of a Phase IIa clinical trial of ISM001-055. The trial, which enrolled 71 IPF patients across 21 clinical research centers in China, demonstrated that ISM001-055 had a favorable safety profile at all dose levels and exhibited a dose-dependent trend of efficacy in forced vital capacity (FVC), an important measure of lung function in IPF patients. The improvement in lung function in just 12 weeks brings great hope to IPF patients. The success of ISM001-055 showcases the huge potential of AI in drug discovery and development, paving the way for larger-scale trials in the future. Fang’s team recently developed a robotic system for high-throughput chemical synthesis, online characterization, and large-scale photocatalytic reaction condition screening (10,000 reaction conditions per day). The system is based on liquid-core waveguide, microfluidic liquid-handling, and artificial intelligence techniques to significantly reduce the time and effort required for complex organic synthesis. (4) They have also developed an AI-assisted method for predicting absorbance, analyzing factors that influence convective and molecular diffusion effects. When a candidate compound is selected at the chemistry, manufacturing, and control (CMC) stage, crystal structure prediction (CSP) and structure-based stability evaluation can be used with advanced experimental screening techniques to minimize the risk of polymorphic problems and accelerate decision-making for formulation studies. (5,6) During the development of Paxlovid, Pfizer’s oral Covid-19 treatment, Pfizer and XtalPi elucidated the subtle differences in structure, dynamics, and stability of two enantiotropically related anhydrous polymorphs of the novel antiviral medicine nirmatrelvir by employing a series of highly orthogonal experimental and computational approaches. (7) This work ultimately determined Form 1 for formulation development as it is the stable form above the transition temperature of 17 °C, demonstrating the unprecedented speed in bringing Paxlovid to patients in record time amidst the pandemic. Figure 1. (A) AI-driven drug discovery process, including de novo design, compound screening, interaction mulations, organic synthesis, and optimization of pharmacokinetic properties. (B) The AlphaFold model architecture predicts the initial protein structure, demonstrating near-experimental accuracy in protein structure prediction through recycling (PDB code: 6Y4F). Some elements were created with BioRender (https://biorender.com). Antibody drugs, along with small molecule drugs, are widely used in treating cancer, autoimmune diseases, and infectious diseases because of their high specificity for targets. (8) AI can be used for structure prediction, design optimization, humanization of antibody drugs, and predicting the optimal drug-to-antibody ratio (DAR) for antibody-drug conjugates (ADCs). This helps shorten development timelines and increase success rates. AI’s superior generalization ability and high degree of automation make it highly effective in vaccine development, which usually takes over 10 years. For example, AI has played a crucial role in optimizing mRNA sequences. (9,10) The LinearDesign deep learning algorithm has greatly enhanced the stability, protein expression, and immunogenicity of mRNA vaccines by optimizing mRNA sequence structural stability and codon usage. The algorithm successfully solved the large-scale search problem in traditional mRNA design, finding the optimal mRNA sequence in less than 11 min. It showcases the potential of AI and computational linguistics technologies in mRNA design and was licensed to Moderna, a leading global RNA technology company in 2022 to enhance the development of mRNA vaccines and therapeutics. (10) AlphaFold is a recent breakthrough in protein structure prediction that uses a neural network-based machine learning method. It combines evolutionary data and geometric properties to accurately predict three-dimensional structures (Figure 1B). (9) After entering the amino acid sequence of a protein, Evoformer generates a multiple sequence alignment (MSA) and templates from relevant evolutionary databases to capture the relationships and covariation between sequences. The predicted structure is then fed back into Evoformer and the structure module multiple times for iterative optimization. Each recursive step, known as “Recycling,” refines the structure further to enhance accuracy. Significant progress has been made in predicting protein structures, but this is only the beginning of exploration in life sciences. AI is currently mainly used at the molecular level, but future research will integrate diverse high-quality data and advance to higher levels to discover new patterns and principles in biological systems. For example, the Peking University International Cancer Institute has partnered with XtalPi to investigate disease mechanisms and drug actions through multimodal data integration. (11) The Shu group at Peking University utilized high-throughput CRISPR technology for large-scale gene editing in cells, whereas XtalPi’s independently developed cell research platform, X-Map, which allows for collection of large-scale, high-content imaging and transcriptome data following cell perturbation. The X-Profiler algorithm developed by XtalPi is proficient at extracting relevant information for specific downstream tasks. It effectively addresses issues like out-of-focus blurring in high-content imaging due to variations in well plate edge height, reduces data noise, and enhances the signal-to-noise ratio (SNR). Furthermore, the X-Profiler can adaptively adjust data quality control strategies based on task requirements, significantly boosting model performance. By integrating experimental methods with AI algorithms, this approach enables precise observations at the cellular level using real-world multidimensional data, establishing correlations between physiological changes and gene or drug regulation. This method offers higher throughput and lower costs compared to animal models, allowing for quicker generation of high-quality data for specific research systems and improving the efficiency and success rate of drug discovery and development. Although AI is well established in many aspects of drug discovery and development, the outdated traditional education system is no longer suitable for the rapid development of AI and urgently needs reform and optimization. To address the educational challenges posed by the rapid development of AI, it is essential to have a clear understanding of its specific applications. This will help us pinpoint which areas of teaching content require additional support. The traditional teaching methods have issues like outdated curricula, lack of practical experience, and limited interdisciplinary integration, focusing too much on foundational subjects and neglecting crucial skills in data science and AI. The lack of interdisciplinary integration and practical experience makes it difficult to meet the demands of the AI era. Given AI’s increasing importance, teaching approaches should be updated. First and foremost, teachers should be trained with AI technologies to ensure they are prepared to teach AI courses. Second, interdisciplinary integration in education should be promoted to inspire insights from different disciplines. Computer science and data science should be integrated into the pharmaceutical curriculum to break down disciplinary silos. For example, the basic theory component of the drug molecular design course should include fundamental concepts of AI, machine learning, and deep learning, along with their applications in the pharmaceutical industry. This will cover skills such as data preprocessing, model training and evaluation, and feature engineering. The core course section should explore AI applications in drug target screening, molecular docking, virtual screening, molecular structure analysis, and AI modeling comprehensively. Cutting-edge AI technologies in drug discovery, such as reinforcement learning and Generative Adversarial Networks (GANs), should be introduced. GANs are a specialized type of deep learning with two neural networks─one for image generation and one for discrimination. They can efficiently create diverse, high-quality molecules, showing unique advantages and potential by learning existing data distribution to explore new chemical spaces. Third, new courses should be added and professors from various fields should collaborate to edit textbooks that combine pharmaceutical science with AI for the updated curriculum. International academic conferences and online courses are also valuable educational resources for students to learn AI. It is important to expand AI-related courses, including big data analysis, natural language processing, bioinformatics, etc., to enhance relevant skills. However, increasing reliance on AI learning unilaterally will only add more pressure on students, thus learning new courses should be personalized to avoid overwhelming students with the broad scope of AI content. Reform should focus on innovation, professionalism, and personalization to better address individual learning needs. Students should choose courses based on their professional needs, focusing on relevant skills after gaining foundational knowledge in the early stages of education. Use AI technology in real-world drug development scenarios to teach students how to solve practical problems. AI-driven practical training and hands-on experience must be included. For example, creating a virtual lab platform that integrates AI models and data analysis tools for simulating real-world scenarios like drug screening, molecular docking, and drug safety prediction. This platform should also have online collaboration features to allow students to work together in real time, mimicking a professional work environment that promotes interdisciplinary teamwork. Additionally, enhancing partnerships with pharmaceutical and AI technology companies to develop a platform that connects industry and academia is crucial. Project content should be tailored to meet the actual needs of industry partners to offer students practical solutions relevant to the field. By gaining a strong grasp of AI concepts, students will be better prepared for future roles in drug development. To be expected, AI technology has and will profoundly revolutionize the field of drug discovery. Educational policymakers and academic institutions globally should update their training programs to incorporate advanced AI technologies. Students need to learn latest AI techniques like deep learning and natural language processing to be better prepared for future challenges in drug development. Students without AI skills can be readily replaced by those who embrace AI advancement. However, we have to keep in mind that AI may not be a panacea due to its limitation in the quality of the data. High-quality, comprehensive experimental data sets are essential for generating reliable results, yet not all entities with an AI platform possess such data. The impact of AI and machine learning in drug discovery is directly dependent on the quality and depth of the training data used to develop the AI models. Regardless of how advanced the algorithms are, if the training data is biased or of poor quality, the AI predictions will be inaccurate, thereby limiting their potential to drive new drug discovery. And algorithmic biases can lead to inaccurate therapeutic effects of the model on certain populations, which is especially critical in the safety of patients’ medication. Additionally, it is not only important to pay attention to the quality of the data, but also to ensure the protection of sensitive patient data. We need to establish strict data protection measures to ensure the security of the data during acquisition, storage, and processing, thereby safeguarding patient data privacy. Y.W. and Z.H. contributed equally to this work. This work was provided by the National Natural Science Foundation of China (Nos. 82473782, 22277110, and 82473761), Sichuan Science and Technology Program (No. 2024ZYD0130), Natural Science Foundation of Henan Province (No. 242301420005), Key Research Project for Basic Research in Henan Province Universities (No. 25ZX001). The authors acknowledge Professor Craig W. Lindsley (Editor-in-Chief of Journal of Medicinal Chemistry; Executive Director, Warren Center for Neuroscience Drug Discovery, Vanderbilt University) and Dr. Shuhao Wen (Chairman of XtalPi) for their insightful discussions. This article references 11 other publications. This article has not yet been cited by other publications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Medicinal Chemistry 医学-医药化学

CiteScore

4.00

自引率

11.00%

发文量

804

审稿时长

1.9 months

期刊介绍： The Journal of Medicinal Chemistry is a prestigious biweekly peer-reviewed publication that focuses on the multifaceted field of medicinal chemistry. Since its inception in 1959 as the Journal of Medicinal and Pharmaceutical Chemistry, it has evolved to become a cornerstone in the dissemination of research findings related to the design, synthesis, and development of therapeutic agents. The Journal of Medicinal Chemistry is recognized for its significant impact in the scientific community, as evidenced by its 2022 impact factor of 7.3. This metric reflects the journal's influence and the importance of its content in shaping the future of drug discovery and development. The journal serves as a vital resource for chemists, pharmacologists, and other researchers interested in the molecular mechanisms of drug action and the optimization of therapeutic compounds.