Decision Tree Ensembles for Automatic Identification of Lithology

Day 1 Tue, January 17, 2023 Pub Date : 2023-01-19 DOI:10.2118/214460-ms

M. Desouky, A. Alqubalee, Ahmed Gowida

{"title":"Decision Tree Ensembles for Automatic Identification of Lithology","authors":"M. Desouky, A. Alqubalee, Ahmed Gowida","doi":"10.2118/214460-ms","DOIUrl":null,"url":null,"abstract":"\n Lithology types identification is one of the processes geoscientists rely on to understand the subsurface formations and better evaluate the quality of reservoirs and aquifers. However, direct lithological identification processes usually require more effort and time. Therefore, researchers developed several machine learning models based on well-logging data to avoid challenges associated with direct lithological identification and increase identification accuracy. Nevertheless, high uncertainty and low accuracy are commonly encountered issues due to the heterogeneous nature of lithology types. This work aims to employ decision tree ensemble techniques to predict the lithologies more accurately in time saving and cost-efficient manner, accounting for the uncertainty.\n This study investigated the real-world well logs dataset from the public Athabasca Oil Sands Database to identify and extract the relevant features. Then, we conducted a thorough training using grid search to optimize the hyperparameters of the ensemble decision tree models. This paper evaluated two ensemble techniques: random forest (RF) and extreme gradient boosting (XGB). We picked metrics such as accuracy, precision, and recall to assess the developed models' performance using 5-fold cross-validation. Finally, we performed a chi-squared test to test our hypothesis of the identical performance of the developed models.\n The XGB and RF models have 94% and 93% accuracy, respectively. Also, the extreme gradient boost model's weighted average recall and precision of 93% and 93% are only 5% and 4% higher than the RF model. In addition, the chi-squared test resulted in a p-value as low as 0.013, suggesting a low probability of difference in both models' performance. Classification of sand and coal formations is more straightforward than sandy shale and cemented sand. The dataset's low representation of sandy shale and cemented sand can be the reason behind their prediction errors. The developed models can classify the studied field lithologies with an overall accuracy of 94%. In addition, there is no statistically significant evidence of a difference in prediction performance between extreme gradient boost and random forest.","PeriodicalId":393098,"journal":{"name":"Day 1 Tue, January 17, 2023","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 1 Tue, January 17, 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2118/214460-ms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Lithology types identification is one of the processes geoscientists rely on to understand the subsurface formations and better evaluate the quality of reservoirs and aquifers. However, direct lithological identification processes usually require more effort and time. Therefore, researchers developed several machine learning models based on well-logging data to avoid challenges associated with direct lithological identification and increase identification accuracy. Nevertheless, high uncertainty and low accuracy are commonly encountered issues due to the heterogeneous nature of lithology types. This work aims to employ decision tree ensemble techniques to predict the lithologies more accurately in time saving and cost-efficient manner, accounting for the uncertainty. This study investigated the real-world well logs dataset from the public Athabasca Oil Sands Database to identify and extract the relevant features. Then, we conducted a thorough training using grid search to optimize the hyperparameters of the ensemble decision tree models. This paper evaluated two ensemble techniques: random forest (RF) and extreme gradient boosting (XGB). We picked metrics such as accuracy, precision, and recall to assess the developed models' performance using 5-fold cross-validation. Finally, we performed a chi-squared test to test our hypothesis of the identical performance of the developed models. The XGB and RF models have 94% and 93% accuracy, respectively. Also, the extreme gradient boost model's weighted average recall and precision of 93% and 93% are only 5% and 4% higher than the RF model. In addition, the chi-squared test resulted in a p-value as low as 0.013, suggesting a low probability of difference in both models' performance. Classification of sand and coal formations is more straightforward than sandy shale and cemented sand. The dataset's low representation of sandy shale and cemented sand can be the reason behind their prediction errors. The developed models can classify the studied field lithologies with an overall accuracy of 94%. In addition, there is no statistically significant evidence of a difference in prediction performance between extreme gradient boost and random forest.

查看原文本刊更多论文

用于岩性自动识别的决策树集成

岩性类型识别是地球科学家了解地下地层、更好地评价储层和含水层质量的重要手段之一。然而，直接岩性识别过程通常需要更多的精力和时间。因此，研究人员开发了几种基于测井数据的机器学习模型，以避免与直接岩性识别相关的挑战，并提高识别精度。然而，由于岩性类型的非均质性，高不确定性和低精度是常见的问题。这项工作旨在采用决策树集成技术，以节省时间和成本效益的方式更准确地预测岩性，考虑到不确定性。该研究调查了来自Athabasca油砂数据库的真实测井数据集，以识别和提取相关特征。然后，我们使用网格搜索进行了彻底的训练，以优化集成决策树模型的超参数。本文评价了随机森林(RF)和极端梯度增强(XGB)两种集成技术。我们选择了准确性、精密度和召回率等指标，使用5倍交叉验证来评估开发的模型的性能。最后，我们进行了卡方检验来检验我们的假设，即所开发的模型具有相同的性能。XGB和RF模型的准确率分别为94%和93%。极端梯度增强模型的加权平均查全率和查准率分别为93%和93%，仅比射频模型高5%和4%。此外，卡方检验的p值低至0.013，表明两种模型性能差异的概率较低。砂和煤地层的分类比砂质页岩和胶结砂更直接。该数据集对砂质页岩和胶结砂的代表性较低，这可能是其预测误差背后的原因。所建立的模型可以对所研究的油田岩性进行分类，总体精度为94%。此外，没有统计学上显著的证据表明极端梯度增强和随机森林在预测性能上存在差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Day 1 Tue, January 17, 2023

自引率

0.00%

发文量