基于机器学习的定量结构-溶解剖面关系。

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-06-23 Epub Date: 2025-06-05 DOI:10.1021/acs.jcim.5c00197

Lap Au-Yeung, Chih-Yuan Tseng, Yun K Tam, Peichun Amy Tsai

{"title":"基于机器学习的定量结构-溶解剖面关系。","authors":"Lap Au-Yeung, Chih-Yuan Tseng, Yun K Tam, Peichun Amy Tsai","doi":"10.1021/acs.jcim.5c00197","DOIUrl":null,"url":null,"abstract":"Determining accurate drug dissolution processes in the gastrointestinal tract is critical in drug discovery as dissolution profiles provide essential information for estimating the bioavailability of orally administered drugs. While various methods have been developed to predict drug solubility based on chemical structures, no reliable tools currently exist for predicting the dissolution rate constant. This study presents a novel two-stage machine learning approach, termed Machine Learning based Quantitative Structure-Dissolution Profile Relationship, which integrates physics-informed neural networks (PINNs) and deep neural networks (DNNs) to predict drug dissolution profiles in water, with varying concentrations of surfactant Sodium Lauryl Sulfate. In the first stage, PINNs extract key dissolution parameters─namely the dissolution rate constant (k) and the dissolved mass fraction at saturation (ϕs)─from existing dissolution data. By leveraging a physical law governing the dissolution process, PINNs aim to enhance prediction performance and reduce data requirements. Assuming first-order kinetics of the drug dissolution process as described by the Noyes-Whitney equation, PINNs, with 8 hidden layers and 40 neurons per layer, may outperform traditional nonlinear regression by effectively filtering noise and focusing on physically meaningful data. In the second stage, these extracted parameters (k and ϕs) are used to train a DNN to predict dissolution profiles based on the drug's chemical structure and dissolution medium. Using the FDA-recommended metrics: the difference and similarity factors (f1 and f2), the DNN─with 128 neurons in two hidden layers and a learning rate of 10-2.8─achieved an average testing accuracy of 61.7% at an 80:20 train-to-test split. Although this current accuracy is below the generally acceptable range of 70-80%, this approach shows significant potential as a low-cost, time-efficient tool for early phase drug formulation. Future improvements are expected as data quality and diversity increase.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"6273-6286"},"PeriodicalIF":5.3000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Based Quantitative Structure-Dissolution Profile Relationship.\",\"authors\":\"Lap Au-Yeung, Chih-Yuan Tseng, Yun K Tam, Peichun Amy Tsai\",\"doi\":\"10.1021/acs.jcim.5c00197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Determining accurate drug dissolution processes in the gastrointestinal tract is critical in drug discovery as dissolution profiles provide essential information for estimating the bioavailability of orally administered drugs. While various methods have been developed to predict drug solubility based on chemical structures, no reliable tools currently exist for predicting the dissolution rate constant. This study presents a novel two-stage machine learning approach, termed Machine Learning based Quantitative Structure-Dissolution Profile Relationship, which integrates physics-informed neural networks (PINNs) and deep neural networks (DNNs) to predict drug dissolution profiles in water, with varying concentrations of surfactant Sodium Lauryl Sulfate. In the first stage, PINNs extract key dissolution parameters─namely the dissolution rate constant (k) and the dissolved mass fraction at saturation (ϕs)─from existing dissolution data. By leveraging a physical law governing the dissolution process, PINNs aim to enhance prediction performance and reduce data requirements. Assuming first-order kinetics of the drug dissolution process as described by the Noyes-Whitney equation, PINNs, with 8 hidden layers and 40 neurons per layer, may outperform traditional nonlinear regression by effectively filtering noise and focusing on physically meaningful data. In the second stage, these extracted parameters (k and ϕs) are used to train a DNN to predict dissolution profiles based on the drug's chemical structure and dissolution medium. Using the FDA-recommended metrics: the difference and similarity factors (f1 and f2), the DNN─with 128 neurons in two hidden layers and a learning rate of 10-2.8─achieved an average testing accuracy of 61.7% at an 80:20 train-to-test split. Although this current accuracy is below the generally acceptable range of 70-80%, this approach shows significant potential as a low-cost, time-efficient tool for early phase drug formulation. Future improvements are expected as data quality and diversity increase.\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\" \",\"pages\":\"6273-6286\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.5c00197\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c00197","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

摘要

确定药物在胃肠道中的准确溶出过程对药物发现至关重要，因为溶出谱为估计口服药物的生物利用度提供了重要信息。虽然已经开发了各种方法来预测基于化学结构的药物溶解度，但目前还没有可靠的工具来预测溶解速率常数。本研究提出了一种新的两阶段机器学习方法，称为基于机器学习的定量结构-溶解剖面关系，该方法集成了物理信息神经网络（pinn）和深度神经网络（dnn）来预测药物在不同浓度表面活性剂十二烷基硫酸钠水中的溶解剖面。在第一阶段，pin n从现有的溶解数据中提取关键的溶解参数──即溶解速率常数(k)和饱和溶解质量分数（ϕs）。通过利用控制溶解过程的物理定律，pinn旨在提高预测性能并减少数据需求。假设Noyes-Whitney方程描述的药物溶解过程的一阶动力学，具有8个隐藏层和每层40个神经元的pinn可能通过有效过滤噪声和专注于物理上有意义的数据而优于传统的非线性回归。在第二阶段，这些提取的参数（k和ϕs）用于训练DNN，以根据药物的化学结构和溶解介质预测溶解曲线。使用fda推荐的指标：差异因子和相似因子（f1和f2）， DNN──在两个隐藏层中有128个神经元，学习率为10-2.8──在80:20的训练-测试分割下实现了61.7%的平均测试准确率。虽然目前的准确度低于通常可接受的70-80%的范围，但该方法作为早期药物配方的低成本，高效的工具显示出巨大的潜力。随着数据质量和多样性的增加，预计未来会有所改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine Learning Based Quantitative Structure-Dissolution Profile Relationship.

Determining accurate drug dissolution processes in the gastrointestinal tract is critical in drug discovery as dissolution profiles provide essential information for estimating the bioavailability of orally administered drugs. While various methods have been developed to predict drug solubility based on chemical structures, no reliable tools currently exist for predicting the dissolution rate constant. This study presents a novel two-stage machine learning approach, termed Machine Learning based Quantitative Structure-Dissolution Profile Relationship, which integrates physics-informed neural networks (PINNs) and deep neural networks (DNNs) to predict drug dissolution profiles in water, with varying concentrations of surfactant Sodium Lauryl Sulfate. In the first stage, PINNs extract key dissolution parameters─namely the dissolution rate constant (k) and the dissolved mass fraction at saturation (ϕ_s)─from existing dissolution data. By leveraging a physical law governing the dissolution process, PINNs aim to enhance prediction performance and reduce data requirements. Assuming first-order kinetics of the drug dissolution process as described by the Noyes-Whitney equation, PINNs, with 8 hidden layers and 40 neurons per layer, may outperform traditional nonlinear regression by effectively filtering noise and focusing on physically meaningful data. In the second stage, these extracted parameters (k and ϕ_s) are used to train a DNN to predict dissolution profiles based on the drug's chemical structure and dissolution medium. Using the FDA-recommended metrics: the difference and similarity factors (f₁ and f₂), the DNN─with 128 neurons in two hidden layers and a learning rate of 10^-2.8─achieved an average testing accuracy of 61.7% at an 80:20 train-to-test split. Although this current accuracy is below the generally acceptable range of 70-80%, this approach shows significant potential as a low-cost, time-efficient tool for early phase drug formulation. Future improvements are expected as data quality and diversity increase.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.