Zahra Bastami, Mohammad Amin Sobati* and Mahdieh Amereh,
{"title":"Predicting CO2 Solubility in Diverse Ionic Liquids: A Data-Driven Approach Using Machine Learning Algorithms","authors":"Zahra Bastami, Mohammad Amin Sobati* and Mahdieh Amereh, ","doi":"10.1021/acs.energyfuels.5c0134510.1021/acs.energyfuels.5c01345","DOIUrl":null,"url":null,"abstract":"<p >In this study, new machine-learning-based models have been developed for the prediction of carbon dioxide (CO<sub>2</sub>) solubility in different Ionic Liquids (ILs). An extensive data set comprising 16,480 experimental data points of CO<sub>2</sub> solubility in 296 ILs, consisting of 103 different cation and 78 different anion structures, was utilized for this purpose. Quantitative Structure–Property Relationship (QSPR) models were developed using linear and nonlinear methods based on this large data set. To consider the effect of cation and anion structures on the CO<sub>2</sub> solubility, basic descriptors, including zero-dimensional, one-dimensional, and fingerprint descriptors (a category of two-dimensional descriptors), were calculated. Subsequently, the most relevant variables were identified through the StepWise Regression (SWR), resulting in the selection of 18 categories of cationic and anionic descriptors, in addition to temperature and pressure, as inputs for nonlinear Machine Learning (ML) models such as MultiLayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RF), and Least-Squares Boosting (LSBoost). Internal and external validation of the models indicated that the LSBoost model displayed the highest accuracy in predicting CO<sub>2</sub> solubility and demonstrated superior capability in modeling complex data. <i>R</i><sup>2</sup> and MSE values for this model were 0.9962 and 0.0070 for the training set and 0.9243 and 0.1277 for the test set, respectively. Furthermore, comparisons between the LSBoost model and the available models in the literature demonstrated that the LSBoost model surpasses the other models in performance, proving to be reliable for predicting CO<sub>2</sub> solubility in new ILs, thereby aiding in the design and selection of ILs for CO<sub>2</sub> capture.</p>","PeriodicalId":35,"journal":{"name":"Energy & Fuels","volume":"39 23","pages":"11256–11278 11256–11278"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy & Fuels","FirstCategoryId":"5","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.energyfuels.5c01345","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, new machine-learning-based models have been developed for the prediction of carbon dioxide (CO2) solubility in different Ionic Liquids (ILs). An extensive data set comprising 16,480 experimental data points of CO2 solubility in 296 ILs, consisting of 103 different cation and 78 different anion structures, was utilized for this purpose. Quantitative Structure–Property Relationship (QSPR) models were developed using linear and nonlinear methods based on this large data set. To consider the effect of cation and anion structures on the CO2 solubility, basic descriptors, including zero-dimensional, one-dimensional, and fingerprint descriptors (a category of two-dimensional descriptors), were calculated. Subsequently, the most relevant variables were identified through the StepWise Regression (SWR), resulting in the selection of 18 categories of cationic and anionic descriptors, in addition to temperature and pressure, as inputs for nonlinear Machine Learning (ML) models such as MultiLayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RF), and Least-Squares Boosting (LSBoost). Internal and external validation of the models indicated that the LSBoost model displayed the highest accuracy in predicting CO2 solubility and demonstrated superior capability in modeling complex data. R2 and MSE values for this model were 0.9962 and 0.0070 for the training set and 0.9243 and 0.1277 for the test set, respectively. Furthermore, comparisons between the LSBoost model and the available models in the literature demonstrated that the LSBoost model surpasses the other models in performance, proving to be reliable for predicting CO2 solubility in new ILs, thereby aiding in the design and selection of ILs for CO2 capture.
期刊介绍:
Energy & Fuels publishes reports of research in the technical area defined by the intersection of the disciplines of chemistry and chemical engineering and the application domain of non-nuclear energy and fuels. This includes research directed at the formation of, exploration for, and production of fossil fuels and biomass; the properties and structure or molecular composition of both raw fuels and refined products; the chemistry involved in the processing and utilization of fuels; fuel cells and their applications; and the analytical and instrumental techniques used in investigations of the foregoing areas.