基于机器学习的碎屑相分类算法：以苏丹拉瓦特盆地为例

IF 3.6

Energy Geoscience Pub Date : 2024-10-26 DOI:10.1016/j.engeos.2024.100353

Anas Mohamed Abaker Babai , Olugbenga Ajayi Ehinola , Omer.I.M. Fadul Abul Gebbayin , Mohammed Abdalla Elsharif Ibrahim

{"title":"基于机器学习的碎屑相分类算法：以苏丹拉瓦特盆地为例","authors":"Anas Mohamed Abaker Babai , Olugbenga Ajayi Ehinola , Omer.I.M. Fadul Abul Gebbayin , Mohammed Abdalla Elsharif Ibrahim","doi":"10.1016/j.engeos.2024.100353","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature (seven well logs and one facies log) were used to classify four facies. Data pre-processing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting (GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree (DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of 67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and 66.12% for testing. For validation, the Gradient Boosting (GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest (RF) and Gradient Boosting (GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.</div></div>","PeriodicalId":100469,"journal":{"name":"Energy Geoscience","volume":"6 1","pages":"Article 100353"},"PeriodicalIF":3.6000,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clastic facies classification using machine learning-based algorithms: A case study from Rawat Basin, Sudan\",\"authors\":\"Anas Mohamed Abaker Babai , Olugbenga Ajayi Ehinola , Omer.I.M. Fadul Abul Gebbayin , Mohammed Abdalla Elsharif Ibrahim\",\"doi\":\"10.1016/j.engeos.2024.100353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature (seven well logs and one facies log) were used to classify four facies. Data pre-processing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting (GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree (DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of 67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and 66.12% for testing. For validation, the Gradient Boosting (GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest (RF) and Gradient Boosting (GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.</div></div>\",\"PeriodicalId\":100469,\"journal\":{\"name\":\"Energy Geoscience\",\"volume\":\"6 1\",\"pages\":\"Article 100353\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Energy Geoscience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666759224000684\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy Geoscience","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666759224000684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

使用机器学习技术和苏丹Rawat油田5口井的数据集，每个特征包含93925个样本（7条测井曲线和1条相测井曲线），对4种相进行了分类。数据的预处理和准备包括两个过程：数据清洗和特征缩放。几种机器学习算法，包括线性回归（LR）、决策树（DT）、支持向量机（SVM）、随机森林（RF）和梯度增强（GB）分类，使用不同的迭代和各种特征和参数的组合进行了测试。支持向量径向核训练模型在无网格搜索和有网格搜索的情况下准确率分别为72.49%和64.02%，盲井测试得分分别为71.01%和69.67%。决策树（DT）超参数优化模型的训练准确率为64.15%，测试准确率为67.45%。相比之下，结合网格搜索的决策树获得了更好的结果，训练分数为69.91%，测试分数为67.89%。采用盲井验证方法对模型进行验证，准确率达到69.81%。采用三种算法生成梯度增强模型。在训练过程中，Gradient Boosting分类器的准确率达到71.57%，在测试过程中，准确率达到69.89%。在测试过程中，网格搜索模型的准确率达到了72.14%。极端梯度增强模型的准确率最低，训练准确率仅为66.13%，测试准确率为66.12%。为了验证，Gradient Boosting （GB）分类器模型在盲井测试中获得了75.41%的准确率，而Gradient Boosting with Grid Search的准确率达到了71.36%。增强型随机森林和随机森林Bagging算法的有效性最高，验证准确率分别为78.30%和79.18%。然而，随机森林和带有网格搜索的随机森林模型在其训练分数和测试分数之间显示出显着差异，表明可能存在过拟合。随机森林（Random Forest， RF）和梯度增强（Gradient Boosting， GB）处理复杂关系，预测精度高，是相分类的有效方法。两者之间的选择取决于具体的项目需求，包括可解释性、计算资源和数据性质。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Clastic facies classification using machine learning-based algorithms: A case study from Rawat Basin, Sudan

查看原文本刊更多论文

Clastic facies classification using machine learning-based algorithms: A case study from Rawat Basin, Sudan

Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature (seven well logs and one facies log) were used to classify four facies. Data pre-processing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting (GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree (DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of 67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and 66.12% for testing. For validation, the Gradient Boosting (GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest (RF) and Gradient Boosting (GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Energy Geoscience

CiteScore

8.20

自引率

0.00%

发文量