Yuzhen Hong , Shaogui Deng , Zhijun Li , Yueqin Guo , Zhoutuo Wei
{"title":"An explainable experience-driven hybrid model for TOC prediction in shale reservoirs based on data augmentation","authors":"Yuzhen Hong , Shaogui Deng , Zhijun Li , Yueqin Guo , Zhoutuo Wei","doi":"10.1016/j.jappgeo.2025.105977","DOIUrl":null,"url":null,"abstract":"<div><div>Total Organic Carbon (TOC) content is a measure of the carbon content in organic compounds, commonly used as a critical indicator for assessing unconventional shale resources. Therefore, an accurate TOC prediction model can help evaluate the reservoir's hydrocarbon potential at a low cost and improve the development efficiency. However, the sparsity of experimental data and the high heterogeneity of reservoirs present challenges for TOC prediction. This study proposes combining data enhancement techniques and expert experience-driven machine learning models for accurate TOC prediction in complex shale reservoirs. Firstly, we propose a set of data enhancement methods to address the problems of weak logging response and insufficient TOC experimental data. We enrich the training dataset by introducing reconstruction curves to visualize the response and designing Generative Adversarial Network (GAN) simulations to generate high-quality data. In the experience-driven model construction, we optimized the traditional ΔlogR method by integrating expert knowledge and a detailed analysis of the physical properties of shale reservoirs. We proposed a density-gamma modified ΔlogR method as the core of the experience-driven approach. Furthermore, we integrated the empirical formula into the fitness function of the Grey Wolf Optimizer (GWO). We combined it with a Support Vector Regression (SVR) model to build a hybrid model. The hybrid method was tested in the Dongying Depression. The R<sup>2</sup> values for wells A and B were 0.95 and 0.97, with Root Mean Square Error (RMSE) values of 0.31 and 0.29, and Mean Absolute Error (MAE) values below 0.3. The prediction results demonstrated significant improvement over any single method. We also analyzed the correlation between well logging curves and prediction results using the SHapley Additive exPlanations (SHAP) method. By revealing the decision-making mechanism within the model, we verified the reasonableness of the experience-driven and enhanced the model's credibility.</div></div>","PeriodicalId":54882,"journal":{"name":"Journal of Applied Geophysics","volume":"243 ","pages":"Article 105977"},"PeriodicalIF":2.1000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Geophysics","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926985125003581","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Total Organic Carbon (TOC) content is a measure of the carbon content in organic compounds, commonly used as a critical indicator for assessing unconventional shale resources. Therefore, an accurate TOC prediction model can help evaluate the reservoir's hydrocarbon potential at a low cost and improve the development efficiency. However, the sparsity of experimental data and the high heterogeneity of reservoirs present challenges for TOC prediction. This study proposes combining data enhancement techniques and expert experience-driven machine learning models for accurate TOC prediction in complex shale reservoirs. Firstly, we propose a set of data enhancement methods to address the problems of weak logging response and insufficient TOC experimental data. We enrich the training dataset by introducing reconstruction curves to visualize the response and designing Generative Adversarial Network (GAN) simulations to generate high-quality data. In the experience-driven model construction, we optimized the traditional ΔlogR method by integrating expert knowledge and a detailed analysis of the physical properties of shale reservoirs. We proposed a density-gamma modified ΔlogR method as the core of the experience-driven approach. Furthermore, we integrated the empirical formula into the fitness function of the Grey Wolf Optimizer (GWO). We combined it with a Support Vector Regression (SVR) model to build a hybrid model. The hybrid method was tested in the Dongying Depression. The R2 values for wells A and B were 0.95 and 0.97, with Root Mean Square Error (RMSE) values of 0.31 and 0.29, and Mean Absolute Error (MAE) values below 0.3. The prediction results demonstrated significant improvement over any single method. We also analyzed the correlation between well logging curves and prediction results using the SHapley Additive exPlanations (SHAP) method. By revealing the decision-making mechanism within the model, we verified the reasonableness of the experience-driven and enhanced the model's credibility.
期刊介绍:
The Journal of Applied Geophysics with its key objective of responding to pertinent and timely needs, places particular emphasis on methodological developments and innovative applications of geophysical techniques for addressing environmental, engineering, and hydrological problems. Related topical research in exploration geophysics and in soil and rock physics is also covered by the Journal of Applied Geophysics.