Digitalization of Legacy Datasets and Machine Learning Regression Yields Insights for Reservoir Property Prediction and Submarine-Fan Evolution: A Subsurface Example From the Lewis Shale, Wyoming

The Sedimentary Record Pub Date : 2022-07-13 DOI:10.2110/001c.36638

T. Martin, Jared Tadla, Z. Jobe

{"title":"Digitalization of Legacy Datasets and Machine Learning Regression Yields Insights for Reservoir Property Prediction and Submarine-Fan Evolution: A Subsurface Example From the Lewis Shale, Wyoming","authors":"T. Martin, Jared Tadla, Z. Jobe","doi":"10.2110/001c.36638","DOIUrl":null,"url":null,"abstract":"Machine-learning algorithms have long aided in geologic property prediction from well-log data, but are primarily used to classify lithology, facies, formation, and rock types. However, more detailed properties (e.g., porosity, grain size) that are important for evaluating hydrocarbon exploration and development activities, as well as subsurface geothermal, CO2 sequestration, and hydrological studies have not been a focus of machine-learning predictions. This study focuses on improving machine-learning regression-based workflows for quantitative geological property prediction (porosity, grain size, XRF geochemistry), using a robust dataset from the Dad Sandstone Member of the Lewis Shale in the Green River Basin, Wyoming. Twelve slabbed cores collected from wells targeting turbiditic sandstones and mudstones of the Dad Sandstone member provide 1212.2 ft. of well-log and core data to test the efficacy of five machine-learning models, ranging in complexity from multivariate linear regression to deep neural networks. Our results demonstrate that gradient-boosted decision-tree models (e.g., CatBoost, XGBoost) are flexible in terms of input data completeness, do not require scaled data, and are reliably accurate, with the lowest or second lowest root mean squared error (RMSE) for every test. Deep neural networks, while used commonly for these applications, never achieved lowest error for any of the testing. We also utilize newly collected XRF geochemistry and grain-size data to constrain spatiotemporal sediment routing, sand-mud partitioning, and paleo-oceanographic redox conditions in the Green River Basin. Test-train dataset splitting traditionally uses randomized inter-well data, but a blind well testing strategy is more applicable to most geoscience applications that aim to predict properties of new, unseen well locations. We find that using inter-well training datasets are more optimistic when applied to blind wells, with a median difference of 0.58 RMSE when predicting grain size in phi units. Using these data and results, we establish a baseline workflow for applying machine-learning regression algorithms to core-based reservoir properties from well-log and core-image data. We hope that our findings and open-source code and datasets released with this paper will serve as a baseline for further research to improve geological property prediction for sustainable earth-resource modeling.","PeriodicalId":137898,"journal":{"name":"The Sedimentary Record","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Sedimentary Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2110/001c.36638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine-learning algorithms have long aided in geologic property prediction from well-log data, but are primarily used to classify lithology, facies, formation, and rock types. However, more detailed properties (e.g., porosity, grain size) that are important for evaluating hydrocarbon exploration and development activities, as well as subsurface geothermal, CO2 sequestration, and hydrological studies have not been a focus of machine-learning predictions. This study focuses on improving machine-learning regression-based workflows for quantitative geological property prediction (porosity, grain size, XRF geochemistry), using a robust dataset from the Dad Sandstone Member of the Lewis Shale in the Green River Basin, Wyoming. Twelve slabbed cores collected from wells targeting turbiditic sandstones and mudstones of the Dad Sandstone member provide 1212.2 ft. of well-log and core data to test the efficacy of five machine-learning models, ranging in complexity from multivariate linear regression to deep neural networks. Our results demonstrate that gradient-boosted decision-tree models (e.g., CatBoost, XGBoost) are flexible in terms of input data completeness, do not require scaled data, and are reliably accurate, with the lowest or second lowest root mean squared error (RMSE) for every test. Deep neural networks, while used commonly for these applications, never achieved lowest error for any of the testing. We also utilize newly collected XRF geochemistry and grain-size data to constrain spatiotemporal sediment routing, sand-mud partitioning, and paleo-oceanographic redox conditions in the Green River Basin. Test-train dataset splitting traditionally uses randomized inter-well data, but a blind well testing strategy is more applicable to most geoscience applications that aim to predict properties of new, unseen well locations. We find that using inter-well training datasets are more optimistic when applied to blind wells, with a median difference of 0.58 RMSE when predicting grain size in phi units. Using these data and results, we establish a baseline workflow for applying machine-learning regression algorithms to core-based reservoir properties from well-log and core-image data. We hope that our findings and open-source code and datasets released with this paper will serve as a baseline for further research to improve geological property prediction for sustainable earth-resource modeling.

查看原文本刊更多论文

传统数据集的数字化和机器学习回归为储层属性预测和海底扇演化提供了新的见解:以怀俄明州Lewis页岩的地下为例

长期以来，机器学习算法一直有助于从测井数据中预测地质属性，但主要用于分类岩性、相、地层和岩石类型。然而，对于评估油气勘探和开发活动非常重要的更详细的属性(例如孔隙度、粒度)，以及地下地热、二氧化碳封存和水文研究，并不是机器学习预测的重点。本研究的重点是改进基于机器学习回归的定量地质属性预测工作流程(孔隙度、粒度、XRF地球化学)，使用来自怀俄明州格林河盆地Lewis页岩Dad砂岩成员的强大数据集。从Dad砂岩段浊积砂岩和泥岩的井中收集的12个片状岩心提供了1212.2英尺的测井和岩心数据，用于测试五种机器学习模型的有效性，这些模型的复杂性从多元线性回归到深度神经网络。我们的结果表明，梯度增强决策树模型(例如CatBoost, XGBoost)在输入数据完整性方面是灵活的，不需要缩放数据，并且可靠准确，每次测试的均方根误差(RMSE)都是最低或第二低的。虽然深度神经网络通常用于这些应用程序，但在任何测试中都没有达到最低的误差。我们还利用新收集的XRF地球化学和粒度数据来约束绿河流域的时空沉积路径、沙泥分配和古海洋氧化还原条件。测试训练数据集分割传统上使用随机井间数据，但盲井测试策略更适用于大多数旨在预测新井位属性的地球科学应用。我们发现，在盲井中使用井间训练数据集更为乐观，在以phi为单位预测粒度时，中位数误差为0.58 RMSE。利用这些数据和结果，我们建立了一个基线工作流，将机器学习回归算法应用于基于测井和岩心图像数据的基于岩心的储层属性。我们希望我们的发现和与本文一起发布的开源代码和数据集将作为进一步研究的基线，以改进地质性质预测，用于可持续地球资源建模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Sedimentary Record

自引率

0.00%

发文量