Predicting in-situ CO2 solubility in formation brines using Raman spectroscopy and machine learning: Implications for offshore geological carbon storage
Ying Teng , Yiqi Chen , Xiran Lin , Mingkun Bai , Senyou An , Shuyang Liu , Pengfei Wang , Tao Zhang , Songbai Han , Jinlong Zhu , Jianbo Zhu , Heping Xie
{"title":"Predicting in-situ CO2 solubility in formation brines using Raman spectroscopy and machine learning: Implications for offshore geological carbon storage","authors":"Ying Teng , Yiqi Chen , Xiran Lin , Mingkun Bai , Senyou An , Shuyang Liu , Pengfei Wang , Tao Zhang , Songbai Han , Jinlong Zhu , Jianbo Zhu , Heping Xie","doi":"10.1016/j.jgsce.2025.205794","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate estimation of in-situ CO<sub>2</sub> solubility in brine is essential for predicting dissolution trapping efficiency and ensuring the long-term security of geological carbon storage, particularly in deep saline aquifers and offshore reservoirs. Existing experimental and thermodynamic approaches often suffer from limited applicability under high salinity, multi-ion conditions, and diverse reservoir environments, leading to substantial prediction uncertainties. To address this gap, we experimentally determined CO<sub>2</sub> solubility using Raman spectroscopy in both formation brines and synthetic brines under reservoir-relevant conditions (313.15–363.15 K, 7.5–17 MPa) and compiled a comprehensive dataset of 2733 literature entries covering wide salinity and ionic composition ranges. Six machine learning algorithms—LightGBM, XGBoost, CatBoost, SVR, ELM, and KNN—were trained and benchmarked, with LightGBM achieving the highest predictive accuracy. SHAP analysis revealed that pressure, total salinity, and temperature were the dominant factors governing solubility. Model applicability and reliability were confirmed through leverage statistics and Williams plots. Compared with a thermodynamic model, LightGBM delivered superior performance, especially under high-salinity conditions where conventional models often underpredict solubility. The resulting data-driven framework can be readily integrated into reservoir simulation workflows to enable rapid, accurate solubility predictions, optimize injection strategies, and enhance risk assessment for CCS projects in complex geological settings.</div></div>","PeriodicalId":100568,"journal":{"name":"Gas Science and Engineering","volume":"145 ","pages":"Article 205794"},"PeriodicalIF":5.5000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gas Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949908925002584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate estimation of in-situ CO2 solubility in brine is essential for predicting dissolution trapping efficiency and ensuring the long-term security of geological carbon storage, particularly in deep saline aquifers and offshore reservoirs. Existing experimental and thermodynamic approaches often suffer from limited applicability under high salinity, multi-ion conditions, and diverse reservoir environments, leading to substantial prediction uncertainties. To address this gap, we experimentally determined CO2 solubility using Raman spectroscopy in both formation brines and synthetic brines under reservoir-relevant conditions (313.15–363.15 K, 7.5–17 MPa) and compiled a comprehensive dataset of 2733 literature entries covering wide salinity and ionic composition ranges. Six machine learning algorithms—LightGBM, XGBoost, CatBoost, SVR, ELM, and KNN—were trained and benchmarked, with LightGBM achieving the highest predictive accuracy. SHAP analysis revealed that pressure, total salinity, and temperature were the dominant factors governing solubility. Model applicability and reliability were confirmed through leverage statistics and Williams plots. Compared with a thermodynamic model, LightGBM delivered superior performance, especially under high-salinity conditions where conventional models often underpredict solubility. The resulting data-driven framework can be readily integrated into reservoir simulation workflows to enable rapid, accurate solubility predictions, optimize injection strategies, and enhance risk assessment for CCS projects in complex geological settings.