Ameen Kareem Salih, Ali Khaleel Faraj, Mohammed A. Ahmed, Ali Nahi Abed Al-Hasnawi
{"title":"The Impact of Data Splitting Strategy on Drilling Rate Prediction in the Rumaila Oil Field","authors":"Ameen Kareem Salih, Ali Khaleel Faraj, Mohammed A. Ahmed, Ali Nahi Abed Al-Hasnawi","doi":"10.1134/S0965544124050025","DOIUrl":null,"url":null,"abstract":"<p>Supervised machine learning is one of the important tools that has helped solve many problems facing humanity, especially problems that cannot be solved by humans. Building a successful and high-accuracy model depends on several things, such as the collected data, choosing the appropriate model, the method of data splitting to be used in training and evaluating the model, and choosing the appropriate hyperparameters. Data splitting is one of the most important things to do to obtain a high-accuracy model and to avoid overfitting, which produces a model with high training accuracy but fails in testing and prediction. This paper investigates the impact of different data splitting strategies such as hold-out with different testing sizes, K-Fold, and shuffle split on the effectiveness of a supervised machine learning model for prediction drilling rate in Rumaila oil field in southern Iraq and selecting the optimal data splitting strategy. The highest testing accuracy obtained was 0.827 when the shuffle split strategy was used.</p>","PeriodicalId":725,"journal":{"name":"Petroleum Chemistry","volume":"64 7","pages":"781 - 786"},"PeriodicalIF":1.3000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Petroleum Chemistry","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1134/S0965544124050025","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}
引用次数: 0
Abstract
Supervised machine learning is one of the important tools that has helped solve many problems facing humanity, especially problems that cannot be solved by humans. Building a successful and high-accuracy model depends on several things, such as the collected data, choosing the appropriate model, the method of data splitting to be used in training and evaluating the model, and choosing the appropriate hyperparameters. Data splitting is one of the most important things to do to obtain a high-accuracy model and to avoid overfitting, which produces a model with high training accuracy but fails in testing and prediction. This paper investigates the impact of different data splitting strategies such as hold-out with different testing sizes, K-Fold, and shuffle split on the effectiveness of a supervised machine learning model for prediction drilling rate in Rumaila oil field in southern Iraq and selecting the optimal data splitting strategy. The highest testing accuracy obtained was 0.827 when the shuffle split strategy was used.
期刊介绍:
Petroleum Chemistry (Neftekhimiya), founded in 1961, offers original papers on and reviews of theoretical and experimental studies concerned with current problems of petroleum chemistry and processing such as chemical composition of crude oils and natural gas liquids; petroleum refining (cracking, hydrocracking, and catalytic reforming); catalysts for petrochemical processes (hydrogenation, isomerization, oxidation, hydroformylation, etc.); activation and catalytic transformation of hydrocarbons and other components of petroleum, natural gas, and other complex organic mixtures; new petrochemicals including lubricants and additives; environmental problems; and information on scientific meetings relevant to these areas.
Petroleum Chemistry publishes articles on these topics from members of the scientific community of the former Soviet Union.