Optimized feature selection assists lithofacies machine learning with sparse well log data combined with calculated attributes in a gradational fluvial sequence

David A. Wood
{"title":"Optimized feature selection assists lithofacies machine learning with sparse well log data combined with calculated attributes in a gradational fluvial sequence","authors":"David A. Wood","doi":"10.1016/j.aiig.2022.11.003","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning (ML) to predict lithofacies from sparse suites of well-log data is difficult in laterally and vertically heterogeneous reservoir formations in oil and gas fields. Meandering, braided fluviatile depositional environments tend to form clastic sequences with laterally discontinuous layers due to the continuous shifting of relatively narrow sandstone channels. Three cored wellbores drilled through such a reservoir in a large oil field, with just four recorded well logs available, are used to classify four lithofacies using ML models. To augment the well-log data, six derivative and volatility attributes were calculated from the recorded gamma ray and density logs, providing sixteen log features for the ML models to select from. A novel, multiple-optimizer feature selection technique was developed to identify high-performing feature combinations with which seven ML models were used to predict lithofacies assisted by multi-k-fold cross validation. Feature combinations with just seven to nine selected log features achieved overall ML lithofacies accuracy of 0.87 for two wells used for training and validation. When the trained ML models were applied to a third well for testing, lithofacies ML prediction accuracy declined to 0.65 for the best performing extreme gradient boosting model with seven features. However, an accuracy of ∼0.76 was achieved by that model in predicting the presence of the pay bearing sandstone and siltstone lithofacies in the test well. A model using only the four recorded well logs was only able to predict the pay-bearing lithofacies with ∼0.6 accuracy. Annotated confusion matrices and feature importance analysis provide additional insight to ML model performance and identify the log attributes that are most influential in enhancing lithofacies prediction.</p></div>","PeriodicalId":100124,"journal":{"name":"Artificial Intelligence in Geosciences","volume":"3 ","pages":"Pages 132-147"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666544122000326/pdfft?md5=47841f260127b1f2246f19d39a782263&pid=1-s2.0-S2666544122000326-main.pdf","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666544122000326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Machine learning (ML) to predict lithofacies from sparse suites of well-log data is difficult in laterally and vertically heterogeneous reservoir formations in oil and gas fields. Meandering, braided fluviatile depositional environments tend to form clastic sequences with laterally discontinuous layers due to the continuous shifting of relatively narrow sandstone channels. Three cored wellbores drilled through such a reservoir in a large oil field, with just four recorded well logs available, are used to classify four lithofacies using ML models. To augment the well-log data, six derivative and volatility attributes were calculated from the recorded gamma ray and density logs, providing sixteen log features for the ML models to select from. A novel, multiple-optimizer feature selection technique was developed to identify high-performing feature combinations with which seven ML models were used to predict lithofacies assisted by multi-k-fold cross validation. Feature combinations with just seven to nine selected log features achieved overall ML lithofacies accuracy of 0.87 for two wells used for training and validation. When the trained ML models were applied to a third well for testing, lithofacies ML prediction accuracy declined to 0.65 for the best performing extreme gradient boosting model with seven features. However, an accuracy of ∼0.76 was achieved by that model in predicting the presence of the pay bearing sandstone and siltstone lithofacies in the test well. A model using only the four recorded well logs was only able to predict the pay-bearing lithofacies with ∼0.6 accuracy. Annotated confusion matrices and feature importance analysis provide additional insight to ML model performance and identify the log attributes that are most influential in enhancing lithofacies prediction.

优化的特征选择有助于岩相机器学习,并结合稀疏测井数据和分级河流层序的计算属性
在油气田横向和纵向非均质储层中,利用稀疏测井数据预测岩相的机器学习(ML)是很困难的。曲流、辫状流质沉积环境由于相对狭窄的砂岩河道的不断移动,容易形成横向不连续层的碎屑层序。在一个大油田的储层中钻了三口取心井,只有四口测井记录,使用ML模型对四种岩相进行了分类。为了增加测井数据,从记录的伽马射线和密度测井数据中计算了6个导数和波动性属性,为ML模型提供了16个测井特征。开发了一种新型的多优化器特征选择技术,用于识别高性能特征组合,并用7个ML模型在多重交叉验证的辅助下预测岩相。在用于训练和验证的两口井中,仅使用7到9个选定的测井特征组合,就实现了0.87的总体ML岩相精度。当将训练好的ML模型应用于第三口井进行测试时,具有7个特征的极端梯度增强模型的岩相ML预测精度降至0.65。然而,该模型在预测测试井中是否存在含油层砂岩和粉砂岩岩相方面的精度达到了~ 0.76。仅使用4口记录的测井曲线的模型只能以~ 0.6的精度预测产油岩相。带注释的混淆矩阵和特征重要性分析为ML模型的性能提供了额外的见解,并确定了对增强岩相预测最有影响的日志属性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信