Development of a rapid and cost-effective groundwater quality assessment model based on hybrid ensemble learning

IF 7 2区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Xiaoyun Wang, Jing Su, Yue Liu, Yao Ji, Qiuling Dang, Yuanyuan Sun, Quanli Liu
{"title":"Development of a rapid and cost-effective groundwater quality assessment model based on hybrid ensemble learning","authors":"Xiaoyun Wang,&nbsp;Jing Su,&nbsp;Yue Liu,&nbsp;Yao Ji,&nbsp;Qiuling Dang,&nbsp;Yuanyuan Sun,&nbsp;Quanli Liu","doi":"10.1016/j.ecolind.2025.113894","DOIUrl":null,"url":null,"abstract":"<div><div>Assessing groundwater quality and health risks using machine learning is receiving widespread concern. However, assessment accuracy and cost-effectiveness are key factors in determining the model implementation. Therefore, the main purpose of this study is to develop a convenient, low-cost, and accurate hybrid ensemble model to predict water quality index (WQI) and hazard index (HI). Firstly, Pearson correlation matrix and ‘SHAP’ value were compared to select the Optimum feature combination. Secondly, base learners were selected from 12 different machine learning candidates. And then select eXtreme Gradient Boosting (XGB) as meta learner to construct stacking and blending ensemble model. The prediction results of the base learners are averaged to obtain the prediction results of averaging ensemble model. Finally, evaluation matrix (R<sup>2</sup> and RMSE), <em>t</em>-test and probabilistic forecasting were integrated to assess models’ performance. The results show TDS, HCO<sub>3</sub><sup>–</sup>, Mg<sup>2+</sup>, SO<sub>4</sub><sup>2-</sup> is the best feature combination for WQI prediction, and Na<sup>+</sup>, Ca<sup>2+</sup>, Mg<sup>2+</sup>, HCO<sub>3</sub><sup>–</sup> is the best feature combination for HI prediction. SHAP value perform better than Pearson correlation matrix in reducing the number of input variables and improving model accuracy. The accuracy of stacking ensemble model on test/validation sets (average R<sup>2</sup> = 0.966/0.921 and 0.835/0.714 for WQI and HI respectively) significantly (p &lt; 0.05) higher than the other models. The Stacking ensemble model developed in this study provides supports for governments to assess groundwater quality and formulate rational policies. Meanwhile, the integration of evaluation metrics and statistical analysis also offers new ideas for model evaluation in the environmental field.</div></div>","PeriodicalId":11459,"journal":{"name":"Ecological Indicators","volume":"178 ","pages":"Article 113894"},"PeriodicalIF":7.0000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Indicators","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1470160X25008246","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Assessing groundwater quality and health risks using machine learning is receiving widespread concern. However, assessment accuracy and cost-effectiveness are key factors in determining the model implementation. Therefore, the main purpose of this study is to develop a convenient, low-cost, and accurate hybrid ensemble model to predict water quality index (WQI) and hazard index (HI). Firstly, Pearson correlation matrix and ‘SHAP’ value were compared to select the Optimum feature combination. Secondly, base learners were selected from 12 different machine learning candidates. And then select eXtreme Gradient Boosting (XGB) as meta learner to construct stacking and blending ensemble model. The prediction results of the base learners are averaged to obtain the prediction results of averaging ensemble model. Finally, evaluation matrix (R2 and RMSE), t-test and probabilistic forecasting were integrated to assess models’ performance. The results show TDS, HCO3, Mg2+, SO42- is the best feature combination for WQI prediction, and Na+, Ca2+, Mg2+, HCO3 is the best feature combination for HI prediction. SHAP value perform better than Pearson correlation matrix in reducing the number of input variables and improving model accuracy. The accuracy of stacking ensemble model on test/validation sets (average R2 = 0.966/0.921 and 0.835/0.714 for WQI and HI respectively) significantly (p < 0.05) higher than the other models. The Stacking ensemble model developed in this study provides supports for governments to assess groundwater quality and formulate rational policies. Meanwhile, the integration of evaluation metrics and statistical analysis also offers new ideas for model evaluation in the environmental field.

Abstract Image

基于混合集成学习的快速、经济的地下水质量评价模型的建立
利用机器学习评估地下水质量和健康风险正受到广泛关注。然而,评估的准确性和成本效益是决定模型实现的关键因素。因此,本研究的主要目的是建立一种方便、低成本、准确的混合集合模型来预测水质指数(WQI)和危害指数(HI)。首先,将Pearson相关矩阵与“SHAP”值进行比较,选择最优特征组合;其次,从12个不同的机器学习候选对象中选择基础学习器。然后选择极限梯度增强(eXtreme Gradient Boosting, XGB)作为元学习器,构建叠加和混合集成模型。对基学习器的预测结果进行平均,得到平均集成模型的预测结果。最后,结合评价矩阵(R2和RMSE)、t检验和概率预测对模型的性能进行评价。结果表明,TDS、HCO3 -、Mg2+、SO42-是预测WQI的最佳特征组合,Na+、Ca2+、Mg2+、HCO3 -是预测HI的最佳特征组合。SHAP值在减少输入变量数量和提高模型精度方面优于Pearson相关矩阵。堆叠集成模型在检验/验证集上的准确性显著(WQI和HI的平均R2分别= 0.966/0.921和0.835/0.714)(p <;0.05),高于其他模型。本文建立的叠加系综模型可为政府评价地下水水质和制定合理的政策提供支持。同时,评价指标与统计分析的结合也为环境领域的模型评价提供了新的思路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ecological Indicators
Ecological Indicators 环境科学-环境科学
CiteScore
11.80
自引率
8.70%
发文量
1163
审稿时长
78 days
期刊介绍: The ultimate aim of Ecological Indicators is to integrate the monitoring and assessment of ecological and environmental indicators with management practices. The journal provides a forum for the discussion of the applied scientific development and review of traditional indicator approaches as well as for theoretical, modelling and quantitative applications such as index development. Research into the following areas will be published. • All aspects of ecological and environmental indicators and indices. • New indicators, and new approaches and methods for indicator development, testing and use. • Development and modelling of indices, e.g. application of indicator suites across multiple scales and resources. • Analysis and research of resource, system- and scale-specific indicators. • Methods for integration of social and other valuation metrics for the production of scientifically rigorous and politically-relevant assessments using indicator-based monitoring and assessment programs. • How research indicators can be transformed into direct application for management purposes. • Broader assessment objectives and methods, e.g. biodiversity, biological integrity, and sustainability, through the use of indicators. • Resource-specific indicators such as landscape, agroecosystems, forests, wetlands, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信