A novel class-imbalance learning framework for fluid recognition: Application to Qingshimao-Gaoshawo tight-sand gas reservoirs in the Ordos Basin, China
IF 4.4 2区 地球科学Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Jun Yi , ZhongLi Qi , XiangChengZhen Li , Fuqiang Lai , Wei Zhou
{"title":"A novel class-imbalance learning framework for fluid recognition: Application to Qingshimao-Gaoshawo tight-sand gas reservoirs in the Ordos Basin, China","authors":"Jun Yi , ZhongLi Qi , XiangChengZhen Li , Fuqiang Lai , Wei Zhou","doi":"10.1016/j.cageo.2025.105993","DOIUrl":null,"url":null,"abstract":"<div><div>The mathematical model-based methods used for conventional oil and gas resources often perform poorly in fluid recognition of tight-sand reservoir, due to the mutual interference of various factors such as reservoir lithology and pore structure. Booming artificial intelligence technologies and accumulating logging data provide a solid foundation for the application of machine learning methods as new tools for fluid identification. However, there is often a serious class imbalance, which can easily lead to the inability to achieve ideal classification results, in the proportion of categories of the collected well logging data. Consequently, this issue has become a huge challenge for the academic and industrial communities. To address this, a novel class-imbalance learning framework for fluid recognition (CILF) is proposed to tight-sand gas reservoirs of Qingshimao-Gaoshawo area of Ordos Basin, in China. Specifically, an improved label propagation algorithm based on semi-supervised learning (SS-LPA) is designed at the data level, which can reduce the imbalance rate of raw data to some extent after assigning high-confidence labels to unlabeled samples. At the model level, <span><math><mi>Q</mi></math></span>-network, as an effective reinforcement learning approach, is introduced into ensemble learning framework (QNEL), which can enhance the multi-classification accuracy of fluid identification by training multiple baseline models that are given different weights for feedback on imbalanced data. The experimental results from 35 tight-sand wells in Qingshimao-Gaoshawo area of Ordos Basin validate the effectiveness of the proposed framework. Specifically, the performance of CILF is the best on all three typical evaluation metrics, and it outperforms others in 12 out of a total of 18 categories. In terms of the average scores for six categories, the precision, recall rate, and F1 score of the proposed framework reach 0.988, 0.984, and 0.985, respectively.</div></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"205 ","pages":"Article 105993"},"PeriodicalIF":4.4000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300425001438","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The mathematical model-based methods used for conventional oil and gas resources often perform poorly in fluid recognition of tight-sand reservoir, due to the mutual interference of various factors such as reservoir lithology and pore structure. Booming artificial intelligence technologies and accumulating logging data provide a solid foundation for the application of machine learning methods as new tools for fluid identification. However, there is often a serious class imbalance, which can easily lead to the inability to achieve ideal classification results, in the proportion of categories of the collected well logging data. Consequently, this issue has become a huge challenge for the academic and industrial communities. To address this, a novel class-imbalance learning framework for fluid recognition (CILF) is proposed to tight-sand gas reservoirs of Qingshimao-Gaoshawo area of Ordos Basin, in China. Specifically, an improved label propagation algorithm based on semi-supervised learning (SS-LPA) is designed at the data level, which can reduce the imbalance rate of raw data to some extent after assigning high-confidence labels to unlabeled samples. At the model level, -network, as an effective reinforcement learning approach, is introduced into ensemble learning framework (QNEL), which can enhance the multi-classification accuracy of fluid identification by training multiple baseline models that are given different weights for feedback on imbalanced data. The experimental results from 35 tight-sand wells in Qingshimao-Gaoshawo area of Ordos Basin validate the effectiveness of the proposed framework. Specifically, the performance of CILF is the best on all three typical evaluation metrics, and it outperforms others in 12 out of a total of 18 categories. In terms of the average scores for six categories, the precision, recall rate, and F1 score of the proposed framework reach 0.988, 0.984, and 0.985, respectively.
期刊介绍:
Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.