Kun Liu , Xiao-Qiang Bian , Jing Chen , Jian Li , Yu-Peng Wang
{"title":"Predicting CO2 solubility in water and brines using advanced machine learning models","authors":"Kun Liu , Xiao-Qiang Bian , Jing Chen , Jian Li , Yu-Peng Wang","doi":"10.1016/j.geoen.2025.214234","DOIUrl":null,"url":null,"abstract":"<div><div>Carbon capture and storage (CCS) is a crucial technology for reducing industrial emissions, and its effectiveness is related to the prediction of CO<sub>2</sub> solubility. However, existing studies are typically limited to modeling specific water/brine systems and lack practical engineering validation. Therefore, this study proposes a CO<sub>2</sub> solubility model applicable to pure water, single-salt solutions, and mixed-salt solutions. A database containing 3383 experimental data entries was constructed from the published literature. Input variables include temperature, pressure, and salt concentrations of NaCl, KCl, Na<sub>2</sub>SO<sub>4</sub>, MgCl<sub>2</sub>, and CaCl<sub>2</sub>, and the output variable is CO<sub>2</sub> solubility. To improve the accuracy of CO<sub>2</sub> solubility prediction, we apply three optimization algorithms—Artificial Lemming Algorithm (ALA), Black-winged Kite Algorithm (BKA), and IVY Algorithm (IVYA)—to fine-tune two machine learning (ML) models: Light Gradient Boosting Machine (LightGBM) and eXtreme Gradient Boosting (XGBoost). The results were compared with the Cubic-Plus-Association (CPA) equation of state (EoS) combined with the MHV1 mixing rule (CPA-MHV1). Findings indicate that among all the models considered in this paper, BKA-LightGBM stands out for its high accuracy and low computational cost, with R<sup>2</sup> = 0.9930, RMSE = 0.0007 and AARD = 7.41 %, outperforming the CPA-MHV1 model in prediction accuracy. In addition, SHapley Additive exPlanations (SHAP) indicated that pressure is the most influential input parameter for model output. The leverage approach based on the Williams plot verified the reliability of the data, with 90.63 % of the samples distributed within a leverage threshold of 0.3, effectively minimizing the influence of outliers. Cross-validation and external validation demonstrated that the BKA-LightGBM model can be effectively applied to CCS engineering, for applications such as CO<sub>2</sub> capture via dissolution. These results suggest that the BKA-LightGBM model has strong potential to support the development of efficient and practical <span>CCS</span> technologies.</div></div>","PeriodicalId":100578,"journal":{"name":"Geoenergy Science and Engineering","volume":"257 ","pages":"Article 214234"},"PeriodicalIF":4.6000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoenergy Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949891025005925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
Carbon capture and storage (CCS) is a crucial technology for reducing industrial emissions, and its effectiveness is related to the prediction of CO2 solubility. However, existing studies are typically limited to modeling specific water/brine systems and lack practical engineering validation. Therefore, this study proposes a CO2 solubility model applicable to pure water, single-salt solutions, and mixed-salt solutions. A database containing 3383 experimental data entries was constructed from the published literature. Input variables include temperature, pressure, and salt concentrations of NaCl, KCl, Na2SO4, MgCl2, and CaCl2, and the output variable is CO2 solubility. To improve the accuracy of CO2 solubility prediction, we apply three optimization algorithms—Artificial Lemming Algorithm (ALA), Black-winged Kite Algorithm (BKA), and IVY Algorithm (IVYA)—to fine-tune two machine learning (ML) models: Light Gradient Boosting Machine (LightGBM) and eXtreme Gradient Boosting (XGBoost). The results were compared with the Cubic-Plus-Association (CPA) equation of state (EoS) combined with the MHV1 mixing rule (CPA-MHV1). Findings indicate that among all the models considered in this paper, BKA-LightGBM stands out for its high accuracy and low computational cost, with R2 = 0.9930, RMSE = 0.0007 and AARD = 7.41 %, outperforming the CPA-MHV1 model in prediction accuracy. In addition, SHapley Additive exPlanations (SHAP) indicated that pressure is the most influential input parameter for model output. The leverage approach based on the Williams plot verified the reliability of the data, with 90.63 % of the samples distributed within a leverage threshold of 0.3, effectively minimizing the influence of outliers. Cross-validation and external validation demonstrated that the BKA-LightGBM model can be effectively applied to CCS engineering, for applications such as CO2 capture via dissolution. These results suggest that the BKA-LightGBM model has strong potential to support the development of efficient and practical CCS technologies.