预测含水层渗透系数K的混合学习策略

IF 4.2 2区地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Geosciences Pub Date : 2025-02-01 DOI:10.1016/j.cageo.2024.105819

Kouao Laurent Kouadio , Jianxin Liu , Wenxiang Liu , Rong Liu

{"title":"预测含水层渗透系数K的混合学习策略","authors":"Kouao Laurent Kouadio , Jianxin Liu , Wenxiang Liu , Rong Liu","doi":"10.1016/j.cageo.2024.105819","DOIUrl":null,"url":null,"abstract":"<div><div>Aquifers permeability coefficient (K) is critical for understanding, managing, and protecting groundwater resources. However, obtaining reliable K values directly from pumping tests is costly and time-consuming, often yielding suboptimal results that lead to significant financial losses. Recent advances in machine learning offer an alternative, cost-effective approach for estimating K. Yet, the primary challenge lies in the substantial proportion of missing K data, as K measurements can only be recorded in aquifer layers. Such sparse and incomplete data severely limit the effectiveness of classical supervised learning methods. To address this challenge, we propose a mixture learning strategy (MXS) that combines unsupervised and supervised techniques to improve K prediction. First, a K-Means clustering approach is applied to delineate a naïve group of aquifers (NGA), effectively generating proxy labels for layers where direct K measurements are unavailable. Next, these NGA labels are integrated with existing K values to form enhanced input features for supervised prediction. We then apply support vector machines (SVMs) and extreme gradient boosting (XGB) to predict K more accurately. Experimental results show that both SVMs and XGB achieve prediction accuracies exceeding 80% when evaluated using confusion matrices and micro- and macro-averaged precision-recall metrics. Testing the MXS approach on an independent borehole dataset confirms its robustness and effectiveness. By enabling accurate K predictions in the presence of significant data gaps, MXS supports more informed decision-making, reduces the likelihood of unsuccessful pumping tests, and aids in the sustainable planning and management of groundwater resources.</div></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"196 ","pages":"Article 105819"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A mixture learning strategy for predicting aquifer permeability coefficient K\",\"authors\":\"Kouao Laurent Kouadio , Jianxin Liu , Wenxiang Liu , Rong Liu\",\"doi\":\"10.1016/j.cageo.2024.105819\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Aquifers permeability coefficient (K) is critical for understanding, managing, and protecting groundwater resources. However, obtaining reliable K values directly from pumping tests is costly and time-consuming, often yielding suboptimal results that lead to significant financial losses. Recent advances in machine learning offer an alternative, cost-effective approach for estimating K. Yet, the primary challenge lies in the substantial proportion of missing K data, as K measurements can only be recorded in aquifer layers. Such sparse and incomplete data severely limit the effectiveness of classical supervised learning methods. To address this challenge, we propose a mixture learning strategy (MXS) that combines unsupervised and supervised techniques to improve K prediction. First, a K-Means clustering approach is applied to delineate a naïve group of aquifers (NGA), effectively generating proxy labels for layers where direct K measurements are unavailable. Next, these NGA labels are integrated with existing K values to form enhanced input features for supervised prediction. We then apply support vector machines (SVMs) and extreme gradient boosting (XGB) to predict K more accurately. Experimental results show that both SVMs and XGB achieve prediction accuracies exceeding 80% when evaluated using confusion matrices and micro- and macro-averaged precision-recall metrics. Testing the MXS approach on an independent borehole dataset confirms its robustness and effectiveness. By enabling accurate K predictions in the presence of significant data gaps, MXS supports more informed decision-making, reduces the likelihood of unsuccessful pumping tests, and aids in the sustainable planning and management of groundwater resources.</div></div>\",\"PeriodicalId\":55221,\"journal\":{\"name\":\"Computers & Geosciences\",\"volume\":\"196 \",\"pages\":\"Article 105819\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Geosciences\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098300424003029\",\"RegionNum\":2,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424003029","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

含水层渗透系数(K)对于理解、管理和保护地下水资源至关重要。然而，直接从泵送测试中获得可靠的K值是昂贵且耗时的，通常会产生次优结果，从而导致重大的经济损失。机器学习的最新进展为估算K提供了另一种经济有效的方法。然而，主要的挑战在于大量缺失的K数据，因为K测量只能在含水层中记录。这种稀疏和不完整的数据严重限制了经典监督学习方法的有效性。为了解决这一挑战，我们提出了一种混合学习策略（MXS），该策略结合了无监督和有监督技术来改进K预测。首先，应用K- means聚类方法来描绘naïve含水层（NGA）组，有效地为无法直接测量K的层生成代理标签。接下来，将这些NGA标签与现有的K值集成，形成用于监督预测的增强输入特征。然后，我们应用支持向量机（svm）和极端梯度增强（XGB）来更准确地预测K。实验结果表明，当使用混淆矩阵和微观和宏观平均精度召回率指标进行评估时，支持向量机和XGB的预测准确率都超过80%。在独立井眼数据集上测试MXS方法证实了其鲁棒性和有效性。通过在存在重大数据缺口的情况下实现准确的K值预测，MXS支持更明智的决策，减少抽水测试失败的可能性，并有助于地下水资源的可持续规划和管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A mixture learning strategy for predicting aquifer permeability coefficient K

Aquifers permeability coefficient (K) is critical for understanding, managing, and protecting groundwater resources. However, obtaining reliable K values directly from pumping tests is costly and time-consuming, often yielding suboptimal results that lead to significant financial losses. Recent advances in machine learning offer an alternative, cost-effective approach for estimating K. Yet, the primary challenge lies in the substantial proportion of missing K data, as K measurements can only be recorded in aquifer layers. Such sparse and incomplete data severely limit the effectiveness of classical supervised learning methods. To address this challenge, we propose a mixture learning strategy (MXS) that combines unsupervised and supervised techniques to improve K prediction. First, a K-Means clustering approach is applied to delineate a naïve group of aquifers (NGA), effectively generating proxy labels for layers where direct K measurements are unavailable. Next, these NGA labels are integrated with existing K values to form enhanced input features for supervised prediction. We then apply support vector machines (SVMs) and extreme gradient boosting (XGB) to predict K more accurately. Experimental results show that both SVMs and XGB achieve prediction accuracies exceeding 80% when evaluated using confusion matrices and micro- and macro-averaged precision-recall metrics. Testing the MXS approach on an independent borehole dataset confirms its robustness and effectiveness. By enabling accurate K predictions in the presence of significant data gaps, MXS supports more informed decision-making, reduces the likelihood of unsuccessful pumping tests, and aids in the sustainable planning and management of groundwater resources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Geosciences 地学-地球科学综合

CiteScore

9.30

自引率

6.80%

发文量

164

审稿时长

3.4 months

期刊介绍： Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.