Application of bagging and boosting ensemble machine learning techniques for groundwater potential mapping in a drought-prone agriculture region of eastern India

IF 6 3区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Krishnagopal Halder, Amit Kumar Srivastava, Anitabha Ghosh, Ranajit Nabik, Subrata Pan, Uday Chatterjee, Dipak Bisai, Subodh Chandra Pal, Wenzhi Zeng, Frank Ewert, Thomas Gaiser, Chaitanya Baliram Pande, Abu Reza Md. Towfiqul Islam, Edris Alam, Md Kamrul Islam
{"title":"Application of bagging and boosting ensemble machine learning techniques for groundwater potential mapping in a drought-prone agriculture region of eastern India","authors":"Krishnagopal Halder,&nbsp;Amit Kumar Srivastava,&nbsp;Anitabha Ghosh,&nbsp;Ranajit Nabik,&nbsp;Subrata Pan,&nbsp;Uday Chatterjee,&nbsp;Dipak Bisai,&nbsp;Subodh Chandra Pal,&nbsp;Wenzhi Zeng,&nbsp;Frank Ewert,&nbsp;Thomas Gaiser,&nbsp;Chaitanya Baliram Pande,&nbsp;Abu Reza Md. Towfiqul Islam,&nbsp;Edris Alam,&nbsp;Md Kamrul Islam","doi":"10.1186/s12302-024-00981-y","DOIUrl":null,"url":null,"abstract":"<div><p>Groundwater is a primary source of drinking water for billions worldwide. It plays a crucial role in irrigation, domestic, and industrial uses, and significantly contributes to drought resilience in various regions. However, excessive groundwater discharge has left many areas vulnerable to potable water shortages. Therefore, assessing groundwater potential zones (GWPZ) is essential for implementing sustainable management practices to ensure the availability of groundwater for present and future generations. This study aims to delineate areas with high groundwater potential in the Bankura district of West Bengal using four machine learning methods: Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), and Voting Ensemble (VE). The models used 161 data points, comprising 70% of the training dataset, to identify significant correlations between the presence and absence of groundwater in the region. Among the methods, Random Forest (RF) and Extreme Gradient Boosting (XGBoost) proved to be the most effective in mapping groundwater potential, suggesting their applicability in other regions with similar hydrogeological conditions. The performance metrics for RF are very good with a precision of 0.919, recall of 0.971, F1-score of 0.944, and accuracy of 0.943. This indicates a strong capability to accurately predict groundwater zones with minimal false positives and negatives. Adaptive Boosting (AdaBoost) demonstrated comparable performance across all metrics (precision: 0.919, recall: 0.971, F1-score: 0.944, accuracy: 0.943), highlighting its effectiveness in predicting groundwater potential areas accurately; whereas, Extreme Gradient Boosting (XGBoost) outperformed the other models slightly, with higher values in all metrics: precision (0.944), recall (0.971), F1-score (0.958), and accuracy (0.957), suggesting a more refined model performance. The Voting Ensemble (VE) approach also showed enhanced performance, mirroring XGBoost's metrics (precision: 0.944, recall: 0.971, F1-score: 0.958, accuracy: 0.957). This indicates that combining the strengths of individual models leads to better predictions. The groundwater potentiality zoning across the Bankura district varied significantly, with areas of very low potentiality accounting for 41.81% and very high potentiality at 24.35%. The uncertainty in predictions ranged from 0.0 to 0.75 across the study area, reflecting the variability in groundwater availability and the need for targeted management strategies.</p><p>In summary, this study highlights the critical need for assessing and managing groundwater resources effectively using advanced machine learning techniques. The findings provide a foundation for better groundwater management practices, ensuring sustainable use and conservation in Bankura district and beyond.</p></div>","PeriodicalId":546,"journal":{"name":"Environmental Sciences Europe","volume":null,"pages":null},"PeriodicalIF":6.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1186/s12302-024-00981-y.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Sciences Europe","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1186/s12302-024-00981-y","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Groundwater is a primary source of drinking water for billions worldwide. It plays a crucial role in irrigation, domestic, and industrial uses, and significantly contributes to drought resilience in various regions. However, excessive groundwater discharge has left many areas vulnerable to potable water shortages. Therefore, assessing groundwater potential zones (GWPZ) is essential for implementing sustainable management practices to ensure the availability of groundwater for present and future generations. This study aims to delineate areas with high groundwater potential in the Bankura district of West Bengal using four machine learning methods: Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), and Voting Ensemble (VE). The models used 161 data points, comprising 70% of the training dataset, to identify significant correlations between the presence and absence of groundwater in the region. Among the methods, Random Forest (RF) and Extreme Gradient Boosting (XGBoost) proved to be the most effective in mapping groundwater potential, suggesting their applicability in other regions with similar hydrogeological conditions. The performance metrics for RF are very good with a precision of 0.919, recall of 0.971, F1-score of 0.944, and accuracy of 0.943. This indicates a strong capability to accurately predict groundwater zones with minimal false positives and negatives. Adaptive Boosting (AdaBoost) demonstrated comparable performance across all metrics (precision: 0.919, recall: 0.971, F1-score: 0.944, accuracy: 0.943), highlighting its effectiveness in predicting groundwater potential areas accurately; whereas, Extreme Gradient Boosting (XGBoost) outperformed the other models slightly, with higher values in all metrics: precision (0.944), recall (0.971), F1-score (0.958), and accuracy (0.957), suggesting a more refined model performance. The Voting Ensemble (VE) approach also showed enhanced performance, mirroring XGBoost's metrics (precision: 0.944, recall: 0.971, F1-score: 0.958, accuracy: 0.957). This indicates that combining the strengths of individual models leads to better predictions. The groundwater potentiality zoning across the Bankura district varied significantly, with areas of very low potentiality accounting for 41.81% and very high potentiality at 24.35%. The uncertainty in predictions ranged from 0.0 to 0.75 across the study area, reflecting the variability in groundwater availability and the need for targeted management strategies.

In summary, this study highlights the critical need for assessing and managing groundwater resources effectively using advanced machine learning techniques. The findings provide a foundation for better groundwater management practices, ensuring sustainable use and conservation in Bankura district and beyond.

Abstract Image

在印度东部干旱易发农业区应用装袋和提升集合机器学习技术绘制地下水潜能图
地下水是全球数十亿人的主要饮用水源。它在灌溉、家庭和工业用水方面发挥着至关重要的作用,并极大地促进了不同地区的抗旱能力。然而,地下水的过度排放导致许多地区饮用水短缺。因此,评估地下水潜势区(GWPZ)对于实施可持续管理措施以确保今世后代的地下水供应至关重要。本研究旨在使用四种机器学习方法,在西孟加拉邦班库拉地区划定地下水潜力高的区域:随机森林 (RF)、自适应提升 (AdaBoost)、极端梯度提升 (XGBoost) 和投票集合 (VE)。这些模型使用了 161 个数据点(占训练数据集的 70%)来识别该地区地下水存在与否之间的显著相关性。在这些方法中,随机森林(RF)和极端梯度提升(XGBoost)被证明是绘制地下水潜势图最有效的方法,这表明它们适用于具有类似水文地质条件的其他地区。RF 的性能指标非常好,精确度为 0.919,召回率为 0.971,F1 分数为 0.944,准确度为 0.943。这表明 RF 具有很强的准确预测地下水区的能力,误报和漏报极少。自适应提升(AdaBoost)在所有指标上都表现出相当的性能(精确度:0.919,召回率:0.971,F1-分数:0.944,准确度:0.943),突出了其在预测地下水区方面的有效性。而极端梯度提升模型(XGBoost)在所有指标上的表现略优于其他模型,其精确度(0.944)、召回率(0.971)、F1-分数(0.958)和准确率(0.957)的数值都更高,这表明该模型的性能更加精细。投票合集(VE)方法也显示出更高的性能,与 XGBoost 的指标(精确度:0.944;召回率:0.971;F1-分数:0.958;准确率:0.957)如出一辙。这表明,结合单个模型的优势可以获得更好的预测结果。班库拉地区的地下水潜力分区差异很大,极低潜力区占 41.81%,极高潜力区占 24.35%。整个研究区域的预测不确定性从 0.0 到 0.75 不等,反映了地下水可用性的多变性和有针对性的管理策略的必要性。研究结果为更好的地下水管理实践奠定了基础,确保了班库拉地区及其他地区地下水的可持续利用和保护。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Sciences Europe
Environmental Sciences Europe Environmental Science-Pollution
CiteScore
11.20
自引率
1.70%
发文量
110
审稿时长
13 weeks
期刊介绍: ESEU is an international journal, focusing primarily on Europe, with a broad scope covering all aspects of environmental sciences, including the main topic regulation. ESEU will discuss the entanglement between environmental sciences and regulation because, in recent years, there have been misunderstandings and even disagreement between stakeholders in these two areas. ESEU will help to improve the comprehension of issues between environmental sciences and regulation. ESEU will be an outlet from the German-speaking (DACH) countries to Europe and an inlet from Europe to the DACH countries regarding environmental sciences and regulation. Moreover, ESEU will facilitate the exchange of ideas and interaction between Europe and the DACH countries regarding environmental regulatory issues. Although Europe is at the center of ESEU, the journal will not exclude the rest of the world, because regulatory issues pertaining to environmental sciences can be fully seen only from a global perspective.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信