The Impact of Data Splitting Strategy on Drilling Rate Prediction in the Rumaila Oil Field

IF 1.3 4区工程技术 Q3 CHEMISTRY, ORGANIC

Petroleum Chemistry Pub Date : 2024-09-26 DOI:10.1134/S0965544124050025

Ameen Kareem Salih, Ali Khaleel Faraj, Mohammed A. Ahmed, Ali Nahi Abed Al-Hasnawi

{"title":"The Impact of Data Splitting Strategy on Drilling Rate Prediction in the Rumaila Oil Field","authors":"Ameen Kareem Salih, Ali Khaleel Faraj, Mohammed A. Ahmed, Ali Nahi Abed Al-Hasnawi","doi":"10.1134/S0965544124050025","DOIUrl":null,"url":null,"abstract":"<p>Supervised machine learning is one of the important tools that has helped solve many problems facing humanity, especially problems that cannot be solved by humans. Building a successful and high-accuracy model depends on several things, such as the collected data, choosing the appropriate model, the method of data splitting to be used in training and evaluating the model, and choosing the appropriate hyperparameters. Data splitting is one of the most important things to do to obtain a high-accuracy model and to avoid overfitting, which produces a model with high training accuracy but fails in testing and prediction. This paper investigates the impact of different data splitting strategies such as hold-out with different testing sizes, K-Fold, and shuffle split on the effectiveness of a supervised machine learning model for prediction drilling rate in Rumaila oil field in southern Iraq and selecting the optimal data splitting strategy. The highest testing accuracy obtained was 0.827 when the shuffle split strategy was used.</p>","PeriodicalId":725,"journal":{"name":"Petroleum Chemistry","volume":"64 7","pages":"781 - 786"},"PeriodicalIF":1.3000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Petroleum Chemistry","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1134/S0965544124050025","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}

引用次数: 0

Abstract

Supervised machine learning is one of the important tools that has helped solve many problems facing humanity, especially problems that cannot be solved by humans. Building a successful and high-accuracy model depends on several things, such as the collected data, choosing the appropriate model, the method of data splitting to be used in training and evaluating the model, and choosing the appropriate hyperparameters. Data splitting is one of the most important things to do to obtain a high-accuracy model and to avoid overfitting, which produces a model with high training accuracy but fails in testing and prediction. This paper investigates the impact of different data splitting strategies such as hold-out with different testing sizes, K-Fold, and shuffle split on the effectiveness of a supervised machine learning model for prediction drilling rate in Rumaila oil field in southern Iraq and selecting the optimal data splitting strategy. The highest testing accuracy obtained was 0.827 when the shuffle split strategy was used.

Abstract Image

查看原文本刊更多论文

数据分割策略对鲁迈拉油田钻井速率预测的影响

有监督机器学习是帮助解决人类面临的许多问题，尤其是人类无法解决的问题的重要工具之一。建立一个成功的高精度模型取决于几个方面，如收集的数据、选择合适的模型、用于训练和评估模型的数据分割方法以及选择合适的超参数。数据拆分是获得高精度模型和避免过拟合的最重要工作之一，过拟合会产生训练精度高但测试和预测失败的模型。本文研究了不同的数据拆分策略，如不同测试规模的hold-out、K-Fold和shuffle split，对有监督机器学习模型预测伊拉克南部鲁迈拉油田钻井率效果的影响，并选择了最佳的数据拆分策略。采用洗牌分割策略时，测试精度最高，为 0.827。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Petroleum Chemistry 工程技术-工程：化工

CiteScore

2.50

自引率

21.40%

发文量

102

审稿时长

6-12 weeks

期刊介绍： Petroleum Chemistry (Neftekhimiya), founded in 1961, offers original papers on and reviews of theoretical and experimental studies concerned with current problems of petroleum chemistry and processing such as chemical composition of crude oils and natural gas liquids; petroleum refining (cracking, hydrocracking, and catalytic reforming); catalysts for petrochemical processes (hydrogenation, isomerization, oxidation, hydroformylation, etc.); activation and catalytic transformation of hydrocarbons and other components of petroleum, natural gas, and other complex organic mixtures; new petrochemicals including lubricants and additives; environmental problems; and information on scientific meetings relevant to these areas. Petroleum Chemistry publishes articles on these topics from members of the scientific community of the former Soviet Union.