Xin Luo , Xinghua Ci , Jianmeng Sun , Chengyu Dan , Peng Chi , Ruikang Cui
{"title":"Enhancing reservoir parameter prediction workflows via advanced core data augmentation","authors":"Xin Luo , Xinghua Ci , Jianmeng Sun , Chengyu Dan , Peng Chi , Ruikang Cui","doi":"10.1016/j.marpetgeo.2025.107605","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning models that rely on core data as dataset labels have become a mainstream method for predicting reservoir parameters. However, the high costs and insufficient spatial sampling density associated with core data acquisition often result in weak nonlinear representation, poor generalization ability, and overfitting in these models. To address limited core data challenges, we propose a reliability analysis-driven workflow that optimally selects multiple core data augmentation (CDA) methods to enhance reservoir parameter prediction. This workflow achieves two primary advancements: Firstly, it mitigates data scarcity by treating core data as a minority class and applying diverse tabular data augmentation techniques to generate and rigorously evaluate reliable synthetic data. This effectively expands the useable core dataset. Secondly, leveraging this augmented data, the workflow integrates machine learning with pre-trained language models (PLMs) to develop and apply multiple combinations of augmentation-prediction models for both lithology classification and physical property parameter prediction. Field data applications demonstrate that the combination of Tabular Denoising Diffusion Probabilistic Model (TabDDPM) and Tabular Prior Data Fitting Network (TabPFN) in CDA achieves outstanding performance in evaluation metrics and case studies for lithology classification and petrophysical parameter prediction. This study provides a reproducible framework for enhancing small-sample reservoir parameter prediction in oil and gas exploration, proving that synthetic data augmentation can effectively mitigate data scarcity and open new pathways for geophysical data analysis.</div></div>","PeriodicalId":18189,"journal":{"name":"Marine and Petroleum Geology","volume":"182 ","pages":"Article 107605"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Marine and Petroleum Geology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0264817225003228","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning models that rely on core data as dataset labels have become a mainstream method for predicting reservoir parameters. However, the high costs and insufficient spatial sampling density associated with core data acquisition often result in weak nonlinear representation, poor generalization ability, and overfitting in these models. To address limited core data challenges, we propose a reliability analysis-driven workflow that optimally selects multiple core data augmentation (CDA) methods to enhance reservoir parameter prediction. This workflow achieves two primary advancements: Firstly, it mitigates data scarcity by treating core data as a minority class and applying diverse tabular data augmentation techniques to generate and rigorously evaluate reliable synthetic data. This effectively expands the useable core dataset. Secondly, leveraging this augmented data, the workflow integrates machine learning with pre-trained language models (PLMs) to develop and apply multiple combinations of augmentation-prediction models for both lithology classification and physical property parameter prediction. Field data applications demonstrate that the combination of Tabular Denoising Diffusion Probabilistic Model (TabDDPM) and Tabular Prior Data Fitting Network (TabPFN) in CDA achieves outstanding performance in evaluation metrics and case studies for lithology classification and petrophysical parameter prediction. This study provides a reproducible framework for enhancing small-sample reservoir parameter prediction in oil and gas exploration, proving that synthetic data augmentation can effectively mitigate data scarcity and open new pathways for geophysical data analysis.
期刊介绍:
Marine and Petroleum Geology is the pre-eminent international forum for the exchange of multidisciplinary concepts, interpretations and techniques for all concerned with marine and petroleum geology in industry, government and academia. Rapid bimonthly publication allows early communications of papers or short communications to the geoscience community.
Marine and Petroleum Geology is essential reading for geologists, geophysicists and explorationists in industry, government and academia working in the following areas: marine geology; basin analysis and evaluation; organic geochemistry; reserve/resource estimation; seismic stratigraphy; thermal models of basic evolution; sedimentary geology; continental margins; geophysical interpretation; structural geology/tectonics; formation evaluation techniques; well logging.