Exploring the solubility potential of anti-cancer and supportive agents in supercritical CO2 through advanced computational intelligence techniques

IF 8.4 2区 工程技术 Q1 CHEMISTRY, MULTIDISCIPLINARY
Reza Soleimani , Mandana Moradi Kouchi , Ziba Behtouei , Zahra Ghasemi , Alireza Baghban
{"title":"Exploring the solubility potential of anti-cancer and supportive agents in supercritical CO2 through advanced computational intelligence techniques","authors":"Reza Soleimani ,&nbsp;Mandana Moradi Kouchi ,&nbsp;Ziba Behtouei ,&nbsp;Zahra Ghasemi ,&nbsp;Alireza Baghban","doi":"10.1016/j.jcou.2025.103227","DOIUrl":null,"url":null,"abstract":"<div><div>The accurate prediction of solid drug solubility in supercritical carbon dioxide (SC-CO₂) is critical for optimizing pharmaceutical processes, especially in environmentally sustainable drug formulation and purification. This study develops a machine learning (ML) framework for predicting solubility in SC-CO₂ using 744 experimental data points (520 training, 112 validation, 112 testing). Four features—melting point, molecular weight, pressure, and temperature—were used as model inputs. A comparative assessment was performed between conventional regression methods (Linear, Ridge, Lasso, Elastic Net) and advanced ML algorithms, including Support Vector Machine, K-Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost, Gaussian Process Regression, Artificial Neural Networks, and Convolutional Neural Networks (CNN). The results show that tree-based ensembles and deep learning approaches significantly outperform linear models. Notably, the CNN model achieved the best test performance with R² = 0.9839 and MSE = 0.0800, followed by CatBoost (R² = 0.9795) and Gaussian Process Regression (R² = 0.9751). Feature importance analysis using SHAP revealed molecular weight as the most influential variable, followed by pressure, temperature, and melting point. Overall, this study highlights the potential of ML in improving solubility prediction and supports its application in early-stage drug development and green pharmaceutical processing.</div></div>","PeriodicalId":350,"journal":{"name":"Journal of CO2 Utilization","volume":"102 ","pages":"Article 103227"},"PeriodicalIF":8.4000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of CO2 Utilization","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2212982025002112","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The accurate prediction of solid drug solubility in supercritical carbon dioxide (SC-CO₂) is critical for optimizing pharmaceutical processes, especially in environmentally sustainable drug formulation and purification. This study develops a machine learning (ML) framework for predicting solubility in SC-CO₂ using 744 experimental data points (520 training, 112 validation, 112 testing). Four features—melting point, molecular weight, pressure, and temperature—were used as model inputs. A comparative assessment was performed between conventional regression methods (Linear, Ridge, Lasso, Elastic Net) and advanced ML algorithms, including Support Vector Machine, K-Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost, Gaussian Process Regression, Artificial Neural Networks, and Convolutional Neural Networks (CNN). The results show that tree-based ensembles and deep learning approaches significantly outperform linear models. Notably, the CNN model achieved the best test performance with R² = 0.9839 and MSE = 0.0800, followed by CatBoost (R² = 0.9795) and Gaussian Process Regression (R² = 0.9751). Feature importance analysis using SHAP revealed molecular weight as the most influential variable, followed by pressure, temperature, and melting point. Overall, this study highlights the potential of ML in improving solubility prediction and supports its application in early-stage drug development and green pharmaceutical processing.
通过先进的计算智能技术探索抗癌和支持剂在超临界CO2中的溶解度潜力
准确预测固体药物在超临界二氧化碳(SC-CO₂)中的溶解度对于优化制药工艺至关重要,特别是在环境可持续的药物配方和纯化中。本研究开发了一个机器学习(ML)框架,用于使用744个实验数据点(520个训练点,112个验证点,112个测试点)预测SC-CO₂的溶解度。四个特征-熔点,分子量,压力和温度-被用作模型输入。对传统回归方法(Linear, Ridge, Lasso, Elastic Net)和高级ML算法(包括支持向量机,k近邻,决策树,随机森林,梯度增强,XGBoost, LightGBM, CatBoost,高斯过程回归,人工神经网络和卷积神经网络(CNN))进行了比较评估。结果表明,基于树的集成和深度学习方法明显优于线性模型。值得注意的是,CNN模型的测试性能最好,R²= 0.9839,MSE = 0.0800,其次是CatBoost (R²= 0.9795)和高斯过程回归(R²= 0.9751)。使用SHAP进行特征重要性分析,发现分子量是影响最大的变量,其次是压力、温度和熔点。总之,本研究突出了ML在提高溶解度预测方面的潜力,并支持其在早期药物开发和绿色药物加工中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of CO2 Utilization
Journal of CO2 Utilization CHEMISTRY, MULTIDISCIPLINARY-ENGINEERING, CHEMICAL
CiteScore
13.90
自引率
10.40%
发文量
406
审稿时长
2.8 months
期刊介绍: The Journal of CO2 Utilization offers a single, multi-disciplinary, scholarly platform for the exchange of novel research in the field of CO2 re-use for scientists and engineers in chemicals, fuels and materials. The emphasis is on the dissemination of leading-edge research from basic science to the development of new processes, technologies and applications. The Journal of CO2 Utilization publishes original peer-reviewed research papers, reviews, and short communications, including experimental and theoretical work, and analytical models and simulations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信