Predicting CO2 Solubility in Diverse Ionic Liquids: A Data-Driven Approach Using Machine Learning Algorithms

IF 5.2 3区工程技术 Q2 ENERGY & FUELS

Energy & Fuels Pub Date : 2025-06-03 DOI:10.1021/acs.energyfuels.5c0134510.1021/acs.energyfuels.5c01345

Zahra Bastami, Mohammad Amin Sobati* and Mahdieh Amereh,

{"title":"Predicting CO2 Solubility in Diverse Ionic Liquids: A Data-Driven Approach Using Machine Learning Algorithms","authors":"Zahra Bastami, Mohammad Amin Sobati* and Mahdieh Amereh, ","doi":"10.1021/acs.energyfuels.5c0134510.1021/acs.energyfuels.5c01345","DOIUrl":null,"url":null,"abstract":"In this study, new machine-learning-based models have been developed for the prediction of carbon dioxide (CO2) solubility in different Ionic Liquids (ILs). An extensive data set comprising 16,480 experimental data points of CO2 solubility in 296 ILs, consisting of 103 different cation and 78 different anion structures, was utilized for this purpose. Quantitative Structure–Property Relationship (QSPR) models were developed using linear and nonlinear methods based on this large data set. To consider the effect of cation and anion structures on the CO2 solubility, basic descriptors, including zero-dimensional, one-dimensional, and fingerprint descriptors (a category of two-dimensional descriptors), were calculated. Subsequently, the most relevant variables were identified through the StepWise Regression (SWR), resulting in the selection of 18 categories of cationic and anionic descriptors, in addition to temperature and pressure, as inputs for nonlinear Machine Learning (ML) models such as MultiLayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RF), and Least-Squares Boosting (LSBoost). Internal and external validation of the models indicated that the LSBoost model displayed the highest accuracy in predicting CO2 solubility and demonstrated superior capability in modeling complex data. R2 and MSE values for this model were 0.9962 and 0.0070 for the training set and 0.9243 and 0.1277 for the test set, respectively. Furthermore, comparisons between the LSBoost model and the available models in the literature demonstrated that the LSBoost model surpasses the other models in performance, proving to be reliable for predicting CO2 solubility in new ILs, thereby aiding in the design and selection of ILs for CO2 capture.","PeriodicalId":35,"journal":{"name":"Energy & Fuels","volume":"39 23","pages":"11256–11278 11256–11278"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy & Fuels","FirstCategoryId":"5","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.energyfuels.5c01345","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, new machine-learning-based models have been developed for the prediction of carbon dioxide (CO₂) solubility in different Ionic Liquids (ILs). An extensive data set comprising 16,480 experimental data points of CO₂ solubility in 296 ILs, consisting of 103 different cation and 78 different anion structures, was utilized for this purpose. Quantitative Structure–Property Relationship (QSPR) models were developed using linear and nonlinear methods based on this large data set. To consider the effect of cation and anion structures on the CO₂ solubility, basic descriptors, including zero-dimensional, one-dimensional, and fingerprint descriptors (a category of two-dimensional descriptors), were calculated. Subsequently, the most relevant variables were identified through the StepWise Regression (SWR), resulting in the selection of 18 categories of cationic and anionic descriptors, in addition to temperature and pressure, as inputs for nonlinear Machine Learning (ML) models such as MultiLayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RF), and Least-Squares Boosting (LSBoost). Internal and external validation of the models indicated that the LSBoost model displayed the highest accuracy in predicting CO₂ solubility and demonstrated superior capability in modeling complex data. R² and MSE values for this model were 0.9962 and 0.0070 for the training set and 0.9243 and 0.1277 for the test set, respectively. Furthermore, comparisons between the LSBoost model and the available models in the literature demonstrated that the LSBoost model surpasses the other models in performance, proving to be reliable for predicting CO₂ solubility in new ILs, thereby aiding in the design and selection of ILs for CO₂ capture.

查看原文本刊更多论文

预测二氧化碳在不同离子液体中的溶解度：使用机器学习算法的数据驱动方法

在这项研究中，已经开发了新的基于机器学习的模型，用于预测二氧化碳（CO2）在不同离子液体（ILs）中的溶解度。为了达到这个目的，我们使用了一个广泛的数据集，包括16480个实验数据点，这些数据点是由103种不同的阳离子和78种不同的阴离子结构组成的296种离子中的CO2溶解度。在此基础上，采用线性和非线性方法建立了定量结构-属性关系（QSPR）模型。为了考虑阳离子和阴离子结构对CO2溶解度的影响，计算了基本描述符，包括零维描述符、一维描述符和指纹描述符（一类二维描述符）。随后，通过逐步回归（SWR）确定最相关的变量，除了温度和压力外，还选择了18类阳离子和阴离子描述符，作为非线性机器学习（ML）模型的输入，如多层感知器（MLP）、径向基函数（RBF）、随机森林（RF）和最小二乘增强（LSBoost）。模型的内部和外部验证表明，LSBoost模型在预测CO2溶解度方面具有最高的准确性，并且在模拟复杂数据方面表现出优越的能力。该模型的训练集R2和MSE分别为0.9962和0.0070，测试集R2和MSE分别为0.9243和0.1277。此外，LSBoost模型与文献中现有模型的比较表明，LSBoost模型在性能上优于其他模型，证明了LSBoost模型在预测CO2在新il中的溶解度方面是可靠的，从而有助于设计和选择用于CO2捕集的il。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Energy & Fuels 工程技术-工程：化工

CiteScore

9.20

自引率

13.20%

发文量

1101

审稿时长

2.1 months

期刊介绍： Energy & Fuels publishes reports of research in the technical area defined by the intersection of the disciplines of chemistry and chemical engineering and the application domain of non-nuclear energy and fuels. This includes research directed at the formation of, exploration for, and production of fossil fuels and biomass; the properties and structure or molecular composition of both raw fuels and refined products; the chemistry involved in the processing and utilization of fuels; fuel cells and their applications; and the analytical and instrumental techniques used in investigations of the foregoing areas.