Effects of tuning decision trees in random forest regression on predicting porosity of a hydrocarbon reservoir. A case study: volve oil field, north sea

IF 3.2 Q2 CHEMISTRY, PHYSICAL
Energy advances Pub Date : 2024-08-08 DOI:10.1039/D4YA00313F
Kushan Sandunil, Ziad Bennour, Hisham Ben Mahmud and Ausama Giwelli
{"title":"Effects of tuning decision trees in random forest regression on predicting porosity of a hydrocarbon reservoir. A case study: volve oil field, north sea","authors":"Kushan Sandunil, Ziad Bennour, Hisham Ben Mahmud and Ausama Giwelli","doi":"10.1039/D4YA00313F","DOIUrl":null,"url":null,"abstract":"<p >Machine learning (ML) has emerged as a powerful tool in petroleum engineering for automatically interpreting well logs and characterizing reservoir properties such as porosity. As a result, researchers are trying to enhance the performance of ML models further to widen their applicability in the real world. Random forest regression (RFR) is one such widely used ML technique that was developed by combining multiple decision trees. To improve its performance, one of its hyperparameters, the number of trees in the forest (<em>n_estimators</em>), is tuned during model optimization. However, the existing literature lacks in-depth studies on the influence of <em>n_estimators</em> on the RFR model when used for predicting porosity, given that <em>n_estimators</em> is one of the most influential hyperparameters that can be tuned to optimize the RFR algorithm. In this study, the effects of <em>n_estimators</em> on the RFR model in porosity prediction were investigated. Furthermore, <em>n_estimators</em>’ interactions with two other key hyperparameters, namely the number of features considered for the best split (<em>max_features</em>) and the minimum number of samples required to be at a leaf node (<em>min_samples_leaf</em>) were explored. The RFR models were developed using 4 input features, namely, resistivity log (RES), neutron porosity log (NPHI), gamma ray log (GR), and the corresponding depths obtained from the Volve oil field in the North Sea, and calculated porosity was used as the target data. The methodology consisted of 4 approaches. In the first approach, only <em>n_estimators</em> were changed; in the second approach, <em>n_estimators</em> were changed along with <em>max_features</em>; in the third approach, <em>n_estimators</em> were changed along with <em>min_samples_leaf</em>; and in the final approach, all three hyperparameters were tuned. Altogether 24 RFR models were developed, and models were evaluated using adjusted <em>R</em><small><sup>2</sup></small> (adj. <em>R</em><small><sup>2</sup></small>), root mean squared error (RMSE), and their computational times. The obtained results showed that the highest performance with an adj. <em>R</em><small><sup>2</sup></small> value of 0.8505 was achieved when <em>n_estimators</em> was 81, <em>max_features</em> was 2 and <em>min_samples_leaf</em> was 1. In approach 2, when <em>n_estimators’</em> upper limit was increased from 10 to 100, there was a test model performance growth of more than 1.60%, whereas increasing <em>n_estimators’</em> upper limit from 100 to 1000 showed a performance drop of around 0.4%. Models developed by tuning <em>n_estimators</em> from 1 to 100 in intervals of 10 had healthy test model adj. <em>R</em><small><sup>2</sup></small> values and lower computational times, making them the best <em>n_estimators’</em> range and interval when both performances and computational times were taken into consideration to predict the porosity of the Volve oil field in the North Sea. Thus, it was concluded that by tuning only <em>n_estimators</em> and <em>max_features</em>, the performance of RFR models can be increased significantly.</p>","PeriodicalId":72913,"journal":{"name":"Energy advances","volume":" 9","pages":" 2335-2347"},"PeriodicalIF":3.2000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/ya/d4ya00313f?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy advances","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/ya/d4ya00313f","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) has emerged as a powerful tool in petroleum engineering for automatically interpreting well logs and characterizing reservoir properties such as porosity. As a result, researchers are trying to enhance the performance of ML models further to widen their applicability in the real world. Random forest regression (RFR) is one such widely used ML technique that was developed by combining multiple decision trees. To improve its performance, one of its hyperparameters, the number of trees in the forest (n_estimators), is tuned during model optimization. However, the existing literature lacks in-depth studies on the influence of n_estimators on the RFR model when used for predicting porosity, given that n_estimators is one of the most influential hyperparameters that can be tuned to optimize the RFR algorithm. In this study, the effects of n_estimators on the RFR model in porosity prediction were investigated. Furthermore, n_estimators’ interactions with two other key hyperparameters, namely the number of features considered for the best split (max_features) and the minimum number of samples required to be at a leaf node (min_samples_leaf) were explored. The RFR models were developed using 4 input features, namely, resistivity log (RES), neutron porosity log (NPHI), gamma ray log (GR), and the corresponding depths obtained from the Volve oil field in the North Sea, and calculated porosity was used as the target data. The methodology consisted of 4 approaches. In the first approach, only n_estimators were changed; in the second approach, n_estimators were changed along with max_features; in the third approach, n_estimators were changed along with min_samples_leaf; and in the final approach, all three hyperparameters were tuned. Altogether 24 RFR models were developed, and models were evaluated using adjusted R2 (adj. R2), root mean squared error (RMSE), and their computational times. The obtained results showed that the highest performance with an adj. R2 value of 0.8505 was achieved when n_estimators was 81, max_features was 2 and min_samples_leaf was 1. In approach 2, when n_estimators’ upper limit was increased from 10 to 100, there was a test model performance growth of more than 1.60%, whereas increasing n_estimators’ upper limit from 100 to 1000 showed a performance drop of around 0.4%. Models developed by tuning n_estimators from 1 to 100 in intervals of 10 had healthy test model adj. R2 values and lower computational times, making them the best n_estimators’ range and interval when both performances and computational times were taken into consideration to predict the porosity of the Volve oil field in the North Sea. Thus, it was concluded that by tuning only n_estimators and max_features, the performance of RFR models can be increased significantly.

Abstract Image

在随机森林回归中调整决策树对预测油气藏孔隙度的影响。案例研究:北海沃尔维油田
机器学习(ML)已成为石油工程中预测孔隙度等储层属性的有力工具。随机森林回归(RFR)就是这样一种广泛使用的 ML 技术。为了优化其性能,需要调整其超参数之一,即森林中的树数(n_estimators)。现有文献缺乏对用于预测孔隙率的 n_estimators 对 RFR 模型影响的深入研究。本研究调查了 n_estimators 在孔隙度预测中对 RFR 模型的影响。此外,还探讨了 n_estimators 与另外两个关键超参数(即最佳分割所考虑的特征数(max_features)和叶节点所需的最小样本数(min_samples_leaf))之间的相互作用。RFR 模型是利用从 Volve 油田获得的 4 个输入特征开发的,即电阻率测井、中子孔隙度测井、伽马射线测井和相应的深度。计算出的孔隙度被用作目标数据。该方法包括 4 种方法。在第一种方法中,只对 n_estimators 进行了修改;在第二种方法中,对 n_estimators 和 max_features 进行了修改;在第三种方法中,对 n_estimators 和 min_samples_leaf 进行了修改;在最后一种方法中,对所有三个超参数进行了调整。使用调整后的 R2 (adj.R2)、均方根误差和计算时间对模型进行了评估。结果显示,当 n_estimators 为 81、max_features 为 2、min_samples_leaf 为 1 时,性能最高,adj. R2 值为 0.8505。 在方法 2 中,当 n_estimators 上限从 10 增加到 100 时,测试模型的性能增长超过 1.60%,而当 n_estimators 上限从 100 增加到 1000 时,性能下降约 0.4%。将 n_estimators 以 10 为间隔从 1 调整到 100 所建立的模型具有较好的测试模型辅助 R2 值和较低的计算时间,因此,当同时考虑性能和计算时间时,它们是预测 Volve 油田孔隙度的最佳 n_estimators 范围和间隔。此外,研究还得出结论,只需调整 n_estimators 和 max_features,即可显著提高 RFR 模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信