Machine learning prediction of health risk and spatial dependence of geogenic contaminated groundwater from the Hetao Basin, China

IF 3.4 2区 地球科学 Q1 GEOCHEMISTRY & GEOPHYSICS
Peng Xia , Yifu Zhao , Xianjun Xie , Junxia Li , Kun Qian , Haoyu You , Jingxian Zhang , Weili Ge , Hongjie Pan , Yanxin Wang
{"title":"Machine learning prediction of health risk and spatial dependence of geogenic contaminated groundwater from the Hetao Basin, China","authors":"Peng Xia ,&nbsp;Yifu Zhao ,&nbsp;Xianjun Xie ,&nbsp;Junxia Li ,&nbsp;Kun Qian ,&nbsp;Haoyu You ,&nbsp;Jingxian Zhang ,&nbsp;Weili Ge ,&nbsp;Hongjie Pan ,&nbsp;Yanxin Wang","doi":"10.1016/j.gexplo.2024.107497","DOIUrl":null,"url":null,"abstract":"<div><p>Geogenic contaminated groundwater (GCG), characterized by elevated arsenic, fluoride, and iodine levels, present a significant challenge to public health and government management. Conventional survey-based approaches of collecting groundwater samples, conducting physicochemical tests, and performing spatial interpolation to obtain regional groundwater chemical component maps are inefficient and costly. More importantly, it does not take into account the actual hydrogeological conditions or the characteristics of pollutant transport and enrichment. To address this issue, we utilized Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost) to analyze the likelihood of occurrence of arsenic, fluoride, and iodine as well as their spatial distribution in shallow groundwater from the Hetao Basin. Our study incorporated 20 indicators related to meteorology, soil physicochemical properties, and groundwater conditions, along with 1505 labeled samples consisting of groundwater arsenic, fluoride, and iodine concentrations and their corresponding coordinates. Subsequently, the study automatically analyzed the meteorological, soil physicochemical properties and groundwater conditions by constructing a machine learning model using the available data. In order to optimise and select the best prediction model, this paper presents a quantitative evaluation of the prediction performance of various machine learning models. The accuracy (AC), area under curve (AUC) and mean squared error (MSE) were calculated to predict the spatial distribution of CGC. Subsequently, the optimized model for predicting the spatial distribution of GCG was selected. The results showed that the XGBoost algorithm provided optimal predictions for groundwater with arsenic concentrations above 10 μg/L and fluoride concentrations exceeding 1.5 mg/L, whereas the RF model provided the best predictions for groundwater with arsenic concentrations surpassing 50 μg/L and iodine concentrations exceeding 100 μg/L. Subsequently, groundwater health risk zones were delineated based on an optimal prediction model, and demographic analysis was conducted in both the direct and potential groundwater risk zones. Model predictions indicated that hundreds of thousands of people in the Hetao Basin were facing a public health crisis caused by high concentrations of arsenic, fluoride and iodine in groundwater. These findings underscore the significant health challenge in the study area. Considering the agricultural development and increasing groundwater use in the area, our findings can guide local governments in managing the extent of groundwater development, establishing control zones, and enhancing protection measures for populations at risk from groundwater contamination.</p></div>","PeriodicalId":16336,"journal":{"name":"Journal of Geochemical Exploration","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geochemical Exploration","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0375674224001134","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

Geogenic contaminated groundwater (GCG), characterized by elevated arsenic, fluoride, and iodine levels, present a significant challenge to public health and government management. Conventional survey-based approaches of collecting groundwater samples, conducting physicochemical tests, and performing spatial interpolation to obtain regional groundwater chemical component maps are inefficient and costly. More importantly, it does not take into account the actual hydrogeological conditions or the characteristics of pollutant transport and enrichment. To address this issue, we utilized Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost) to analyze the likelihood of occurrence of arsenic, fluoride, and iodine as well as their spatial distribution in shallow groundwater from the Hetao Basin. Our study incorporated 20 indicators related to meteorology, soil physicochemical properties, and groundwater conditions, along with 1505 labeled samples consisting of groundwater arsenic, fluoride, and iodine concentrations and their corresponding coordinates. Subsequently, the study automatically analyzed the meteorological, soil physicochemical properties and groundwater conditions by constructing a machine learning model using the available data. In order to optimise and select the best prediction model, this paper presents a quantitative evaluation of the prediction performance of various machine learning models. The accuracy (AC), area under curve (AUC) and mean squared error (MSE) were calculated to predict the spatial distribution of CGC. Subsequently, the optimized model for predicting the spatial distribution of GCG was selected. The results showed that the XGBoost algorithm provided optimal predictions for groundwater with arsenic concentrations above 10 μg/L and fluoride concentrations exceeding 1.5 mg/L, whereas the RF model provided the best predictions for groundwater with arsenic concentrations surpassing 50 μg/L and iodine concentrations exceeding 100 μg/L. Subsequently, groundwater health risk zones were delineated based on an optimal prediction model, and demographic analysis was conducted in both the direct and potential groundwater risk zones. Model predictions indicated that hundreds of thousands of people in the Hetao Basin were facing a public health crisis caused by high concentrations of arsenic, fluoride and iodine in groundwater. These findings underscore the significant health challenge in the study area. Considering the agricultural development and increasing groundwater use in the area, our findings can guide local governments in managing the extent of groundwater development, establishing control zones, and enhancing protection measures for populations at risk from groundwater contamination.

Abstract Image

中国河套盆地受地质污染地下水的健康风险和空间依赖性的机器学习预测
以砷、氟和碘含量升高为特征的地质污染地下水(GCG)给公共卫生和政府管理带来了巨大挑战。传统的调查方法是收集地下水样本,进行物理化学测试,然后进行空间插值以获得区域地下水化学成分图,这种方法既低效又昂贵。更重要的是,这种方法没有考虑到实际的水文地质条件或污染物迁移和富集的特点。针对这一问题,我们利用支持向量机(SVM)、随机森林(RF)、自适应提升(AdaBoost)和极梯度提升(XGBoost)等方法分析了河套盆地浅层地下水中砷、氟和碘的出现概率及其空间分布。我们的研究纳入了 20 个与气象、土壤理化性质和地下水条件相关的指标,以及 1505 个由地下水砷、氟和碘浓度及其相应坐标组成的标记样本。随后,该研究利用现有数据构建了一个机器学习模型,自动分析了气象、土壤理化性质和地下水条件。为了优化和选择最佳预测模型,本文对各种机器学习模型的预测性能进行了定量评估。通过计算准确度 (AC)、曲线下面积 (AUC) 和均方误差 (MSE) 来预测 CGC 的空间分布。随后,选出了预测 GCG 空间分布的优化模型。结果表明,XGBoost 算法对砷浓度超过 10 μg/L 和氟浓度超过 1.5 mg/L 的地下水提供了最佳预测,而 RF 模型对砷浓度超过 50 μg/L 和碘浓度超过 100 μg/L 的地下水提供了最佳预测。随后,根据最佳预测模型划定了地下水健康风险区,并对直接和潜在的地下水风险区进行了人口统计分析。模型预测结果表明,由于地下水中砷、氟和碘浓度较高,河套盆地数十万人面临着公共卫生危机。这些发现凸显了研究地区面临的重大健康挑战。考虑到该地区的农业发展和日益增长的地下水使用量,我们的研究结果可以指导地方政府管理地下水开发程度、建立控制区并加强对面临地下水污染风险的人群的保护措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Geochemical Exploration
Journal of Geochemical Exploration 地学-地球化学与地球物理
CiteScore
7.40
自引率
7.70%
发文量
148
审稿时长
8.1 months
期刊介绍: Journal of Geochemical Exploration is mostly dedicated to publication of original studies in exploration and environmental geochemistry and related topics. Contributions considered of prevalent interest for the journal include researches based on the application of innovative methods to: define the genesis and the evolution of mineral deposits including transfer of elements in large-scale mineralized areas. analyze complex systems at the boundaries between bio-geochemistry, metal transport and mineral accumulation. evaluate effects of historical mining activities on the surface environment. trace pollutant sources and define their fate and transport models in the near-surface and surface environments involving solid, fluid and aerial matrices. assess and quantify natural and technogenic radioactivity in the environment. determine geochemical anomalies and set baseline reference values using compositional data analysis, multivariate statistics and geo-spatial analysis. assess the impacts of anthropogenic contamination on ecosystems and human health at local and regional scale to prioritize and classify risks through deterministic and stochastic approaches. Papers dedicated to the presentation of newly developed methods in analytical geochemistry to be applied in the field or in laboratory are also within the topics of interest for the journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信