Roufeida Bennani , Min Wang , Xin Wang , Tianyi Li
{"title":"Porosity estimation using machine learning approaches for shale reservoirs: A case study of the Lianggaoshan Formation, Sichuan Basin, Western China","authors":"Roufeida Bennani , Min Wang , Xin Wang , Tianyi Li","doi":"10.1016/j.jappgeo.2025.105702","DOIUrl":null,"url":null,"abstract":"<div><div>Shale porosity is a key petrophysical property that controls the production of hydrocarbons in shale reserves. Accurate determination of this parameter in such formations is challenging due to the complex pore structures, diverse mineral compositions, and high organic content, which complicate the establishment of a physical relationship between reservoir properties and logging data. This study addresses these challenges by developing machine learning models to estimate shale porosity logs using core and well-logging data. Three supervised machine learning algorithms were employed: support vector regressor, multilayer perceptron, and random forest with different ranges of data proportions. These models were evaluated using the correlation coefficient and root mean square error (RMSE) scores for both training and testing datasets. Among these, the random forest model demonstrated its effectiveness by combining predictions from multiple decision trees and handling nonlinear relationships within the input data. It required minimal preprocessing and parameter tuning, enabling accurate shale porosity predictions, with a high data correlation of 93.8 % and a low RMSE of 0.206. These results confirmed the model's suitability for managing limited and complex datasets. In contrast, the multilayer perceptron and support vector regressor were more sensitive to hyperparameter configurations and prone to overfitting. These limitations resulted in reduced accuracy and weaker correlation values compared to the random forest model.</div><div>In addition, a randomization process was introduced during the training phase with an accurate data proportion, to assess the model's reliability and minimize overfitting. The results indicated that this process had no significant impact on data performance, confirming its effectiveness in ensuring data accuracy.</div></div>","PeriodicalId":54882,"journal":{"name":"Journal of Applied Geophysics","volume":"237 ","pages":"Article 105702"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Geophysics","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926985125000837","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Shale porosity is a key petrophysical property that controls the production of hydrocarbons in shale reserves. Accurate determination of this parameter in such formations is challenging due to the complex pore structures, diverse mineral compositions, and high organic content, which complicate the establishment of a physical relationship between reservoir properties and logging data. This study addresses these challenges by developing machine learning models to estimate shale porosity logs using core and well-logging data. Three supervised machine learning algorithms were employed: support vector regressor, multilayer perceptron, and random forest with different ranges of data proportions. These models were evaluated using the correlation coefficient and root mean square error (RMSE) scores for both training and testing datasets. Among these, the random forest model demonstrated its effectiveness by combining predictions from multiple decision trees and handling nonlinear relationships within the input data. It required minimal preprocessing and parameter tuning, enabling accurate shale porosity predictions, with a high data correlation of 93.8 % and a low RMSE of 0.206. These results confirmed the model's suitability for managing limited and complex datasets. In contrast, the multilayer perceptron and support vector regressor were more sensitive to hyperparameter configurations and prone to overfitting. These limitations resulted in reduced accuracy and weaker correlation values compared to the random forest model.
In addition, a randomization process was introduced during the training phase with an accurate data proportion, to assess the model's reliability and minimize overfitting. The results indicated that this process had no significant impact on data performance, confirming its effectiveness in ensuring data accuracy.
期刊介绍:
The Journal of Applied Geophysics with its key objective of responding to pertinent and timely needs, places particular emphasis on methodological developments and innovative applications of geophysical techniques for addressing environmental, engineering, and hydrological problems. Related topical research in exploration geophysics and in soil and rock physics is also covered by the Journal of Applied Geophysics.