Sparse data-driven knowledge discovery for interpretable prediction of permeability in tight sandstones

IF 8.4 1区 工程技术 Q1 ENGINEERING, GEOLOGICAL
Lulu Xu , Zhengyang Du , Meifeng Cai , Shangxian Yin , Shuning Dong , Hung Vo Thanh , Kenneth C. Carroll , Mohamad Reza Soltanian , Zhenxue Dai
{"title":"Sparse data-driven knowledge discovery for interpretable prediction of permeability in tight sandstones","authors":"Lulu Xu ,&nbsp;Zhengyang Du ,&nbsp;Meifeng Cai ,&nbsp;Shangxian Yin ,&nbsp;Shuning Dong ,&nbsp;Hung Vo Thanh ,&nbsp;Kenneth C. Carroll ,&nbsp;Mohamad Reza Soltanian ,&nbsp;Zhenxue Dai","doi":"10.1016/j.enggeo.2025.108151","DOIUrl":null,"url":null,"abstract":"<div><div>Permeability (<em>k</em>) is crucial for subsurface fluid flow, but predicting <em>k</em>-values in tight sandstones remains challenging due to their complex pore structure and heterogeneity. Although machine learning (ML) has shown promise, it faces significant challenges, including limited high-quality data, high computational costs, and unclear prediction mechanisms. This study proposes a sparse data-driven knowledge discovery framework aimed at enhancing the accuracy and interpretability of <em>k</em>-value predictions in tight sandstone formations. We integrate ML models with data augmentation (ML-DA), using Extreme Gradient Boosting (XGBoost-DA) and Least Squares Support Vector Regression (LSSVR-DA), optimized through genetic algorithms (GA), particle swarm optimization (PSO), and Bayesian optimization (BO). SHapley Additive Explanations (SHAP) are employed to elucidate the interactions between key factors influencing predictions. Monte Carlo simulations demonstrate the robust performance of our ML-DA models, even under data constraints. SHAP analysis identifies key predictors, including porosity, displacement pressure, median pore throat radius, median pressure, and carbonate content. Partial dependence plots (PDPs) reveal a significant interaction between porosity and carbonate content, as well as a decrease in model stability at low carbonate content. This study presents an interpretable ML framework with data augmentation, enabling improved predictions from sparse data while exploring the interactions between key factors. The framework can be adapted to other domains facing similar challenges, enhancing the accuracy and transparency of model predictions.</div></div>","PeriodicalId":11567,"journal":{"name":"Engineering Geology","volume":"353 ","pages":"Article 108151"},"PeriodicalIF":8.4000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Geology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0013795225002479","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Permeability (k) is crucial for subsurface fluid flow, but predicting k-values in tight sandstones remains challenging due to their complex pore structure and heterogeneity. Although machine learning (ML) has shown promise, it faces significant challenges, including limited high-quality data, high computational costs, and unclear prediction mechanisms. This study proposes a sparse data-driven knowledge discovery framework aimed at enhancing the accuracy and interpretability of k-value predictions in tight sandstone formations. We integrate ML models with data augmentation (ML-DA), using Extreme Gradient Boosting (XGBoost-DA) and Least Squares Support Vector Regression (LSSVR-DA), optimized through genetic algorithms (GA), particle swarm optimization (PSO), and Bayesian optimization (BO). SHapley Additive Explanations (SHAP) are employed to elucidate the interactions between key factors influencing predictions. Monte Carlo simulations demonstrate the robust performance of our ML-DA models, even under data constraints. SHAP analysis identifies key predictors, including porosity, displacement pressure, median pore throat radius, median pressure, and carbonate content. Partial dependence plots (PDPs) reveal a significant interaction between porosity and carbonate content, as well as a decrease in model stability at low carbonate content. This study presents an interpretable ML framework with data augmentation, enabling improved predictions from sparse data while exploring the interactions between key factors. The framework can be adapted to other domains facing similar challenges, enhancing the accuracy and transparency of model predictions.
稀疏数据驱动的致密砂岩渗透率可解释预测知识发现
渗透率(k)对地下流体流动至关重要,但由于致密砂岩的孔隙结构和非均质性复杂,预测其k值仍然具有挑战性。尽管机器学习(ML)已经显示出前景,但它面临着重大挑战,包括有限的高质量数据、高计算成本和不明确的预测机制。本研究提出了一个稀疏数据驱动的知识发现框架,旨在提高致密砂岩地层中k值预测的准确性和可解释性。我们使用极端梯度增强(XGBoost-DA)和最小二乘支持向量回归(LSSVR-DA),通过遗传算法(GA)、粒子群优化(PSO)和贝叶斯优化(BO)进行优化,将ML模型与数据增强(ML- da)集成在一起。SHapley加性解释(SHAP)用于解释影响预测的关键因素之间的相互作用。蒙特卡罗模拟证明了我们的ML-DA模型的鲁棒性,即使在数据约束下也是如此。SHAP分析确定了关键的预测指标,包括孔隙度、驱替压力、中孔喉半径、中压力和碳酸盐含量。部分依赖图(pdp)揭示了孔隙度与碳酸盐含量之间的显著相互作用,以及低碳酸盐含量时模型稳定性的降低。本研究提出了一个具有数据增强功能的可解释ML框架,在探索关键因素之间的相互作用的同时,能够从稀疏数据中改进预测。该框架可以适用于面临类似挑战的其他领域,从而提高模型预测的准确性和透明度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering Geology
Engineering Geology 地学-地球科学综合
CiteScore
13.70
自引率
12.20%
发文量
327
审稿时长
5.6 months
期刊介绍: Engineering Geology, an international interdisciplinary journal, serves as a bridge between earth sciences and engineering, focusing on geological and geotechnical engineering. It welcomes studies with relevance to engineering, environmental concerns, and safety, catering to engineering geologists with backgrounds in geology or civil/mining engineering. Topics include applied geomorphology, structural geology, geophysics, geochemistry, environmental geology, hydrogeology, land use planning, natural hazards, remote sensing, soil and rock mechanics, and applied geotechnical engineering. The journal provides a platform for research at the intersection of geology and engineering disciplines.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信