一种实用的机器学习分类问题——陈图签名识别

Day 3 Thu, March 28, 2019 Pub Date : 2019-03-22 DOI:10.2523/IPTC-19143-MS

C. A. Garcia, A. Mukhanov, Henry Torres

{"title":"一种实用的机器学习分类问题——陈图签名识别","authors":"C. A. Garcia, A. Mukhanov, Henry Torres","doi":"10.2523/IPTC-19143-MS","DOIUrl":null,"url":null,"abstract":"\n Creating Chan water control diagnostic plots is a common well surveillance activity to search for signatures that distinguish and explain mechanisms behind excessive water production in oil wells. The technique involves an engineer who visually classifies patterns or signatures related to a water production mechanism. This study shows how the Chan plot signature identification can be approached as a machine learning (ML) classification problem, where a well can be characterized by the slopes of water-oil ratio (WOR) and WOR time derivative (WOR’) curves. A model tries to find the pattern category to which that well belongs. Having ML models that can predict whether a well belongs to a specific Chan plot signature, or pattern, would be valuable as a well surveillance tool, especially in high-well-count fields.\n Our previous work focused on using the shape of the Chan plot as features for a radial basis function (RBF) support vector machines (SVM) model. In this study, we examine how features to identify Chan plot signatures can be simplified and how different ML models compare in accuracy. ML models used in this study were: nearest neighbor, SVM, decision tree, random forest, logistic regression, and Naive Bayes. In this study, we use the slopes of WOR and WOR’ as features. As a result, we observed an increase in the accuracy of the ML models that we used. By performing the quality check on the data set after selecting slopes as features, we identified that the dataset contained several incorrectly labeled examples, which we adjusted before we trained the ML models. By comparing the models’ metrics in the context of the test set, we identified that the ML model with the highest f1-score was nearest neighbor at 0.93, whereas the RBF SVM model achieved a value of 0.90. We also compared models’ decision boundaries to find how they differ among all ML models.\n We obtained an improved accuracy of an ML model by simplifying features as well as raising the quality of data used in the Chan plot signature identification problem. These ML models could be useful in automatic classification whether a well exhibits a specific Chan plot signature, to flag it for a review within a broader petroleum engineering decision framework.","PeriodicalId":11267,"journal":{"name":"Day 3 Thu, March 28, 2019","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Chan Plot Signature Identification as a Practical Machine Learning Classification Problem\",\"authors\":\"C. A. Garcia, A. Mukhanov, Henry Torres\",\"doi\":\"10.2523/IPTC-19143-MS\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Creating Chan water control diagnostic plots is a common well surveillance activity to search for signatures that distinguish and explain mechanisms behind excessive water production in oil wells. The technique involves an engineer who visually classifies patterns or signatures related to a water production mechanism. This study shows how the Chan plot signature identification can be approached as a machine learning (ML) classification problem, where a well can be characterized by the slopes of water-oil ratio (WOR) and WOR time derivative (WOR’) curves. A model tries to find the pattern category to which that well belongs. Having ML models that can predict whether a well belongs to a specific Chan plot signature, or pattern, would be valuable as a well surveillance tool, especially in high-well-count fields.\\n Our previous work focused on using the shape of the Chan plot as features for a radial basis function (RBF) support vector machines (SVM) model. In this study, we examine how features to identify Chan plot signatures can be simplified and how different ML models compare in accuracy. ML models used in this study were: nearest neighbor, SVM, decision tree, random forest, logistic regression, and Naive Bayes. In this study, we use the slopes of WOR and WOR’ as features. As a result, we observed an increase in the accuracy of the ML models that we used. By performing the quality check on the data set after selecting slopes as features, we identified that the dataset contained several incorrectly labeled examples, which we adjusted before we trained the ML models. By comparing the models’ metrics in the context of the test set, we identified that the ML model with the highest f1-score was nearest neighbor at 0.93, whereas the RBF SVM model achieved a value of 0.90. We also compared models’ decision boundaries to find how they differ among all ML models.\\n We obtained an improved accuracy of an ML model by simplifying features as well as raising the quality of data used in the Chan plot signature identification problem. These ML models could be useful in automatic classification whether a well exhibits a specific Chan plot signature, to flag it for a review within a broader petroleum engineering decision framework.\",\"PeriodicalId\":11267,\"journal\":{\"name\":\"Day 3 Thu, March 28, 2019\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Day 3 Thu, March 28, 2019\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2523/IPTC-19143-MS\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Day 3 Thu, March 28, 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2523/IPTC-19143-MS","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

创建Chan水控制诊断图是一种常见的油井监测活动，用于寻找区分和解释油井过量产水背后机制的特征。该技术涉及一名工程师，他可以直观地对与产水机制相关的模式或特征进行分类。该研究展示了如何将Chan图特征识别作为机器学习(ML)分类问题来处理，其中井可以通过水油比(WOR)和WOR时间导数(WOR ')曲线的斜率来表征。模型试图找到该井所属的模式类别。机器学习模型可以预测一口井是否属于特定的Chan地块特征或模式，这将是一种有价值的井监测工具，特别是在高井数油田。我们之前的工作集中在使用Chan图的形状作为径向基函数(RBF)支持向量机(SVM)模型的特征。在这项研究中，我们研究了如何简化识别陈图特征的特征，以及如何比较不同的ML模型的准确性。本研究中使用的机器学习模型有:最近邻、支持向量机、决策树、随机森林、逻辑回归和朴素贝叶斯。在本研究中，我们使用WOR和WOR '的斜率作为特征。因此，我们观察到我们使用的ML模型的准确性有所提高。通过在选择斜率作为特征后对数据集进行质量检查，我们确定数据集包含几个错误标记的示例，我们在训练ML模型之前对其进行了调整。通过比较模型在测试集上下文中的指标，我们确定具有最高f1分数的ML模型是最近邻，为0.93，而RBF SVM模型的值为0.90。我们还比较了模型的决策边界，以发现它们在所有ML模型之间的差异。我们通过简化特征以及提高在Chan图签名识别问题中使用的数据质量来提高ML模型的准确性。这些机器学习模型可以用于自动分类一口井是否显示特定的Chan地块特征，以便在更广泛的石油工程决策框架中进行审查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Chan Plot Signature Identification as a Practical Machine Learning Classification Problem

Creating Chan water control diagnostic plots is a common well surveillance activity to search for signatures that distinguish and explain mechanisms behind excessive water production in oil wells. The technique involves an engineer who visually classifies patterns or signatures related to a water production mechanism. This study shows how the Chan plot signature identification can be approached as a machine learning (ML) classification problem, where a well can be characterized by the slopes of water-oil ratio (WOR) and WOR time derivative (WOR’) curves. A model tries to find the pattern category to which that well belongs. Having ML models that can predict whether a well belongs to a specific Chan plot signature, or pattern, would be valuable as a well surveillance tool, especially in high-well-count fields. Our previous work focused on using the shape of the Chan plot as features for a radial basis function (RBF) support vector machines (SVM) model. In this study, we examine how features to identify Chan plot signatures can be simplified and how different ML models compare in accuracy. ML models used in this study were: nearest neighbor, SVM, decision tree, random forest, logistic regression, and Naive Bayes. In this study, we use the slopes of WOR and WOR’ as features. As a result, we observed an increase in the accuracy of the ML models that we used. By performing the quality check on the data set after selecting slopes as features, we identified that the dataset contained several incorrectly labeled examples, which we adjusted before we trained the ML models. By comparing the models’ metrics in the context of the test set, we identified that the ML model with the highest f1-score was nearest neighbor at 0.93, whereas the RBF SVM model achieved a value of 0.90. We also compared models’ decision boundaries to find how they differ among all ML models. We obtained an improved accuracy of an ML model by simplifying features as well as raising the quality of data used in the Chan plot signature identification problem. These ML models could be useful in automatic classification whether a well exhibits a specific Chan plot signature, to flag it for a review within a broader petroleum engineering decision framework.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Day 3 Thu, March 28, 2019

自引率

0.00%

发文量