Rock Type Classification Models Interpretability Using Shapley Values

Day 3 Wed, November 17, 2021 Pub Date : 2021-12-09 DOI:10.2118/207707-ms

A. Voskresenskiy, N. Bukhanov, M. A. Kuntsevich, O. Popova, Alexey S. Goncharov

{"title":"Rock Type Classification Models Interpretability Using Shapley Values","authors":"A. Voskresenskiy, N. Bukhanov, M. A. Kuntsevich, O. Popova, Alexey S. Goncharov","doi":"10.2118/207707-ms","DOIUrl":null,"url":null,"abstract":"\n We propose a methodology to improve rock type classification using machine learning (ML) techniques and to reveal causal inferences between reservoir quality and well log measurements. Rock type classification is an essential step in accurate reservoir modeling and forecasting. Machine learning approaches allow to automate rock type classification based on different well logs and core data. In order to choose the best model which does not progradate uncertainty further into the workflow it is important to interpret machine learning results. Feature importance and feature selection methods are usually employed for that. We propose an extension to existing approaches - model agnostic sensitivity algorithm based on Shapley values.\n The paper describes a full workflow to rock type prediction using well log data: from data preparation, model building, feature selection to causal inference analysis. We made ML models that classify rock types using well logs (sonic, gamma, density, photoelectric and resistivity) from 21 wells as predictors and conduct a causal inference analysis between reservoir quality and well logs responses using Shapley values (a concept from a game theory). As a result of feature selection, we obtained predictors which are statistically significant and at the same time relevant in causal relation context.\n Macro F1-score of the best obtained models for both cases is 0.79 and 0.85 respectively. It was found that the ML models can infer domain knowledge, which allows us to confirm the adequacy of the built ML model for rock types prediction. Our insight was to recognize the need to properly account for the underlying causal structure between the features and rock types in order to derive meaningful and relevant predictors that carry a significant amount of information contributing to the final outcome. Also, we demonstrate the robustness of revealed patterns by applying the Shapley values methodology to a number of ML models and show consistency in order of the most important predictors.\n Our analysis shows that machine learning classifiers gaining high accuracy tend to mimic physical principles behind different logging tools, in particular: the longer the travel time of an acoustic wave the higher probability that media is represented by reservoir rock and vice versa. On the contrary lower values of natural radioactivity and density of rock highlight the presence of a reservoir.\n The article presents causal inference analysis of ML classification models using Shapley values on 2 real-world reservoirs. The rock class labels from core data are used to train a supervised machine learning algorithm to predict classes from well log response. The aim of supervised learning is to label a small portion of a dataset and allow the algorithm to automate the rest. Such data-driven analysis may optimize well logging, coring, and core analysis programs. This algorithm can be extended to any other reservoir to improve rock type prediction.\n The novelty of the paper is that such analysis reveals the nature of decisions made by the ML model and allows to apply truly robust and reliable petrophysics-consistent ML models for rock type classification.","PeriodicalId":10959,"journal":{"name":"Day 3 Wed, November 17, 2021","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Wed, November 17, 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/207707-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We propose a methodology to improve rock type classification using machine learning (ML) techniques and to reveal causal inferences between reservoir quality and well log measurements. Rock type classification is an essential step in accurate reservoir modeling and forecasting. Machine learning approaches allow to automate rock type classification based on different well logs and core data. In order to choose the best model which does not progradate uncertainty further into the workflow it is important to interpret machine learning results. Feature importance and feature selection methods are usually employed for that. We propose an extension to existing approaches - model agnostic sensitivity algorithm based on Shapley values. The paper describes a full workflow to rock type prediction using well log data: from data preparation, model building, feature selection to causal inference analysis. We made ML models that classify rock types using well logs (sonic, gamma, density, photoelectric and resistivity) from 21 wells as predictors and conduct a causal inference analysis between reservoir quality and well logs responses using Shapley values (a concept from a game theory). As a result of feature selection, we obtained predictors which are statistically significant and at the same time relevant in causal relation context. Macro F1-score of the best obtained models for both cases is 0.79 and 0.85 respectively. It was found that the ML models can infer domain knowledge, which allows us to confirm the adequacy of the built ML model for rock types prediction. Our insight was to recognize the need to properly account for the underlying causal structure between the features and rock types in order to derive meaningful and relevant predictors that carry a significant amount of information contributing to the final outcome. Also, we demonstrate the robustness of revealed patterns by applying the Shapley values methodology to a number of ML models and show consistency in order of the most important predictors. Our analysis shows that machine learning classifiers gaining high accuracy tend to mimic physical principles behind different logging tools, in particular: the longer the travel time of an acoustic wave the higher probability that media is represented by reservoir rock and vice versa. On the contrary lower values of natural radioactivity and density of rock highlight the presence of a reservoir. The article presents causal inference analysis of ML classification models using Shapley values on 2 real-world reservoirs. The rock class labels from core data are used to train a supervised machine learning algorithm to predict classes from well log response. The aim of supervised learning is to label a small portion of a dataset and allow the algorithm to automate the rest. Such data-driven analysis may optimize well logging, coring, and core analysis programs. This algorithm can be extended to any other reservoir to improve rock type prediction. The novelty of the paper is that such analysis reveals the nature of decisions made by the ML model and allows to apply truly robust and reliable petrophysics-consistent ML models for rock type classification.

查看原文本刊更多论文

岩石类型分类模型使用Shapley值的可解释性

我们提出了一种使用机器学习(ML)技术改进岩石类型分类的方法，并揭示储层质量和测井测量之间的因果关系。岩石类型分类是油藏准确建模和预测的重要步骤。机器学习方法可以根据不同的测井曲线和岩心数据自动进行岩石类型分类。为了选择不将不确定性进一步扩展到工作流中的最佳模型，对机器学习结果进行解释是很重要的。通常采用特征重要性和特征选择方法。本文提出了一种基于Shapley值的模型不可知灵敏度算法。本文描述了利用测井数据进行岩石类型预测的完整工作流程:从数据准备、模型建立、特征选择到因果推理分析。我们利用21口井的测井数据(声波、伽马、密度、光电和电阻率)作为预测指标，建立了ML模型，对岩石类型进行分类，并利用Shapley值(博弈论中的一个概念)对储层质量和测井响应进行因果推理分析。作为特征选择的结果，我们获得了统计显著的预测因子，同时在因果关系上下文中相关。两种情况下获得的最佳模型的宏观f1得分分别为0.79和0.85。发现机器学习模型可以推断出领域知识，这使我们能够确认所建立的机器学习模型用于岩石类型预测的充分性。我们的见解是认识到需要适当地考虑特征和岩石类型之间的潜在因果结构，以便获得有意义和相关的预测因子，这些预测因子携带大量有助于最终结果的信息。此外，我们通过将Shapley值方法应用于许多ML模型，证明了揭示模式的鲁棒性，并显示了最重要预测因子顺序的一致性。我们的分析表明，获得高精度的机器学习分类器倾向于模拟不同测井工具背后的物理原理，特别是:声波的传播时间越长，介质由储层岩石表示的可能性就越大，反之亦然。相反，较低的天然放射性值和岩石密度突出表明储层的存在。本文利用Shapley值对两个实际油藏的ML分类模型进行了因果推理分析。岩心数据中的岩石类别标签用于训练监督机器学习算法，以根据测井响应预测岩石类别。监督学习的目的是标记数据集的一小部分，并允许算法将其余部分自动化。这种数据驱动的分析可以优化测井、取心和岩心分析程序。该算法可以推广到其他任何储层，以提高岩石类型预测的准确性。该论文的新颖之处在于，这种分析揭示了ML模型所做决策的本质，并允许应用真正强大和可靠的岩石物理学一致的ML模型进行岩石类型分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Day 3 Wed, November 17, 2021

自引率

0.00%

发文量