Application of deep learning and machine learning models with enhanced feature extraction for the prediction of plant extraction yields using supercritical CO2: An optimization and comparative analysis

IF 4.4 3区 工程技术 Q2 CHEMISTRY, PHYSICAL
Ilhem Bouaziz , Mohamed Hentabli , Mohamed Kouider Amar , Maamar Laidi , Amel Bouzidi , Hakim Bouzemlal , Ahmed Chabane , Abdeltif Amrane , Salah Hanini
{"title":"Application of deep learning and machine learning models with enhanced feature extraction for the prediction of plant extraction yields using supercritical CO2: An optimization and comparative analysis","authors":"Ilhem Bouaziz ,&nbsp;Mohamed Hentabli ,&nbsp;Mohamed Kouider Amar ,&nbsp;Maamar Laidi ,&nbsp;Amel Bouzidi ,&nbsp;Hakim Bouzemlal ,&nbsp;Ahmed Chabane ,&nbsp;Abdeltif Amrane ,&nbsp;Salah Hanini","doi":"10.1016/j.supflu.2025.106755","DOIUrl":null,"url":null,"abstract":"<div><div>The efficient extraction of essential oils (EOs), particularly volatile compounds, from medicinal, aromatic, or oil-rich crop plants using supercritical carbon dioxide extraction (scCO<sub>2</sub>) is crucial for industries such as pharmaceuticals, cosmetics, and food. However, optimizing this process presents challenges due to the intricate molecular diversity of the compounds and the complex interplay of scCO<sub>2</sub> parameters. To address these limitations, this study introduces a hybrid predictive framework that combines deep learning and machine learning, utilizing 694 scCO<sub>2</sub> experimental data points sourced from the literature across 21 plant species. Four major molecular compounds per plant were selected as input features, alongside key process parameters, including temperature, pressure, extraction time, co-solvent ratio, and CO<sub>2</sub> flow rate. Morgan fingerprints were computed for these compounds, and a convolutional neural network (CNN) was utilized to extract their high-level representations into compact vectors. These vectors were integrated with normalized process parameters and fed into a CNN-Multilayer Perceptron (CNN-MLP) hybrid architecture. Performance was compared with Support Vector Regression (SVR), Random Forest (RF), Gaussian Process Regression (GPR), and XGBoost, all optimized using OPTUNA. The CNN-MLP achieved the best performance, with an R<sup>2</sup> of 0.974 and a Root Mean Squared Error (RMSE) of 1.431 on the test set. A paired t-test (p = 0.810) and Bland–Altman analysis (mean difference: 9.35 %) confirmed the model's robustness. To further assess generalizability, external validations were conducted using unseen experimental conditions. The CNN-MLP was tested on three extraction profiles and demonstrated strong predictive performance, with Pearson correlations ranging from 0.95 to 0.98.</div></div>","PeriodicalId":17078,"journal":{"name":"Journal of Supercritical Fluids","volume":"227 ","pages":"Article 106755"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Supercritical Fluids","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0896844625002426","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The efficient extraction of essential oils (EOs), particularly volatile compounds, from medicinal, aromatic, or oil-rich crop plants using supercritical carbon dioxide extraction (scCO2) is crucial for industries such as pharmaceuticals, cosmetics, and food. However, optimizing this process presents challenges due to the intricate molecular diversity of the compounds and the complex interplay of scCO2 parameters. To address these limitations, this study introduces a hybrid predictive framework that combines deep learning and machine learning, utilizing 694 scCO2 experimental data points sourced from the literature across 21 plant species. Four major molecular compounds per plant were selected as input features, alongside key process parameters, including temperature, pressure, extraction time, co-solvent ratio, and CO2 flow rate. Morgan fingerprints were computed for these compounds, and a convolutional neural network (CNN) was utilized to extract their high-level representations into compact vectors. These vectors were integrated with normalized process parameters and fed into a CNN-Multilayer Perceptron (CNN-MLP) hybrid architecture. Performance was compared with Support Vector Regression (SVR), Random Forest (RF), Gaussian Process Regression (GPR), and XGBoost, all optimized using OPTUNA. The CNN-MLP achieved the best performance, with an R2 of 0.974 and a Root Mean Squared Error (RMSE) of 1.431 on the test set. A paired t-test (p = 0.810) and Bland–Altman analysis (mean difference: 9.35 %) confirmed the model's robustness. To further assess generalizability, external validations were conducted using unseen experimental conditions. The CNN-MLP was tested on three extraction profiles and demonstrated strong predictive performance, with Pearson correlations ranging from 0.95 to 0.98.
深度学习和机器学习模型与增强特征提取的应用在超临界CO2植物提取产量预测中的应用:优化和比较分析
利用超临界二氧化碳萃取(scCO2)技术从药用、芳香或富油作物植物中高效提取精油(EOs),特别是挥发性化合物,对于制药、化妆品和食品等行业至关重要。然而,由于化合物复杂的分子多样性和scCO2参数复杂的相互作用,优化这一过程面临挑战。为了解决这些限制,本研究引入了一个结合深度学习和机器学习的混合预测框架,利用了来自21种植物的694个scCO2实验数据点。每株植物选择四种主要分子化合物作为输入特征,以及关键工艺参数,包括温度、压力、提取时间、共溶剂比和二氧化碳流速。计算这些化合物的摩根指纹,并利用卷积神经网络(CNN)将其高级表示提取为紧凑向量。将这些向量与归一化过程参数集成,并将其输入到cnn -多层感知器(CNN-MLP)混合架构中。比较了采用OPTUNA优化的支持向量回归(SVR)、随机森林(RF)、高斯过程回归(GPR)和XGBoost算法的性能。CNN-MLP在测试集上表现最佳,R2为0.974,均方根误差(RMSE)为1.431。配对t检验(p = 0.810)和Bland-Altman分析(平均差值:9.35 %)证实了模型的稳健性。为了进一步评估通用性,使用未见过的实验条件进行了外部验证。CNN-MLP在三个提取剖面上进行了测试,显示出很强的预测性能,Pearson相关性在0.95到0.98之间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Supercritical Fluids
Journal of Supercritical Fluids 工程技术-工程:化工
CiteScore
7.60
自引率
10.30%
发文量
236
审稿时长
56 days
期刊介绍: The Journal of Supercritical Fluids is an international journal devoted to the fundamental and applied aspects of supercritical fluids and processes. Its aim is to provide a focused platform for academic and industrial researchers to report their findings and to have ready access to the advances in this rapidly growing field. Its coverage is multidisciplinary and includes both basic and applied topics. Thermodynamics and phase equilibria, reaction kinetics and rate processes, thermal and transport properties, and all topics related to processing such as separations (extraction, fractionation, purification, chromatography) nucleation and impregnation are within the scope. Accounts of specific engineering applications such as those encountered in food, fuel, natural products, minerals, pharmaceuticals and polymer industries are included. Topics related to high pressure equipment design, analytical techniques, sensors, and process control methodologies are also within the scope of the journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信