Using unsupervised machine learning and positive matrix factorization models to drive groundwater chemistry and associated health risks in a coal − mining rural region
Yuting Yan , Yunhui Zhang , Zhanxue Sun , Zhan Xie , Rongwen Yao , Si Chen , Md Galal Uddin , Yujun Pu , Chang Yang , Ying Wang , Yangshuang Wang
{"title":"Using unsupervised machine learning and positive matrix factorization models to drive groundwater chemistry and associated health risks in a coal − mining rural region","authors":"Yuting Yan , Yunhui Zhang , Zhanxue Sun , Zhan Xie , Rongwen Yao , Si Chen , Md Galal Uddin , Yujun Pu , Chang Yang , Ying Wang , Yangshuang Wang","doi":"10.1016/j.jhydrol.2025.133691","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying and quantifying the geogenic sources and anthropogenic sources of heavy metals and nitrate from groundwater is essential for securing the groundwater environment in mining rural areas. However, the integrated approaches for clarifying groundwater chemistry, specific pollutant sources, and associated probabilistic health risks in the mining rural area have yet to be raised. In this study, unsupervised machine learning, compositional data analysis based on principal component analysis, positive matrix factorization, and Monte-Carlo simulating health risks were used to quantify pollutant sources and groundwater drinking suitability in a coal − mining region of northeastern Chongqing, SW China. Three groups of groundwater samples were recognized by unsupervised machine learning. Group A was Ca − HCO<sub>3</sub> type. Group B was dominated by Ca − SO<sub>4</sub> and mixed Ca − Na − HCO<sub>3</sub> types. Group C consisted of Ca − HCO<sub>3</sub> type and Ca − SO<sub>4</sub> types. Group A and Group C were controlled by carbonate rocks and silicate dissolution, while Group B was dominated by the dissolution of silicate rocks, pyrite and oxides of heavy metals. Positive cation exchange was identified in all types of groundwater samples. Agricultural activity and mining sewage discharge was the primary sources of nitrate contamination. Compositional data analysis (CoDa) based on principal component analysis (PCA) and positive matrix factorization (PMF) model identified three primary hydrochemical processes and five factors for all hydrochemical components, respectively. CoDa-PCA corroborated the former analysis of hydrochemical diagram, mineral saturation index. According to the PMF analysis for all hydrochemical components, natural background levels (NBLs) and the PMF (five factors) for heavy metals indicated the concentrations of Fe and Mn originated from the dissolution of Fe and Mn oxides in red beds (25.91 %). The concentrations of Co, Ni, Ba, Zn, Cu, and Hg were derived from the dissolution of oxides (22.35 %), barite (17.87 %), sphalerite (17.85 %), chalcopyrite and cinnabar (16.02 %). The combined weighted water quality index (CWQI) and heavy metal pollution index (HPI) values of all groundwater samples satisfied the drinking permission limit, revealing the groundwater was suitable for drinking purposes in the study area. The hazard index (HI) values depicted that there was approximately a 6.01 % probability of groundwater posed health risks above the acceptable limit (>1) to children. The most sensitive factors to human health risks were exposure frequency to contaminated water and NO<sub>3</sub><sup>−</sup> concentration. Our study is expected to provide a reliable and robust basis for groundwater sustainable management in mining rural regions.</div></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":"661 ","pages":"Article 133691"},"PeriodicalIF":5.9000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169425010297","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying and quantifying the geogenic sources and anthropogenic sources of heavy metals and nitrate from groundwater is essential for securing the groundwater environment in mining rural areas. However, the integrated approaches for clarifying groundwater chemistry, specific pollutant sources, and associated probabilistic health risks in the mining rural area have yet to be raised. In this study, unsupervised machine learning, compositional data analysis based on principal component analysis, positive matrix factorization, and Monte-Carlo simulating health risks were used to quantify pollutant sources and groundwater drinking suitability in a coal − mining region of northeastern Chongqing, SW China. Three groups of groundwater samples were recognized by unsupervised machine learning. Group A was Ca − HCO3 type. Group B was dominated by Ca − SO4 and mixed Ca − Na − HCO3 types. Group C consisted of Ca − HCO3 type and Ca − SO4 types. Group A and Group C were controlled by carbonate rocks and silicate dissolution, while Group B was dominated by the dissolution of silicate rocks, pyrite and oxides of heavy metals. Positive cation exchange was identified in all types of groundwater samples. Agricultural activity and mining sewage discharge was the primary sources of nitrate contamination. Compositional data analysis (CoDa) based on principal component analysis (PCA) and positive matrix factorization (PMF) model identified three primary hydrochemical processes and five factors for all hydrochemical components, respectively. CoDa-PCA corroborated the former analysis of hydrochemical diagram, mineral saturation index. According to the PMF analysis for all hydrochemical components, natural background levels (NBLs) and the PMF (five factors) for heavy metals indicated the concentrations of Fe and Mn originated from the dissolution of Fe and Mn oxides in red beds (25.91 %). The concentrations of Co, Ni, Ba, Zn, Cu, and Hg were derived from the dissolution of oxides (22.35 %), barite (17.87 %), sphalerite (17.85 %), chalcopyrite and cinnabar (16.02 %). The combined weighted water quality index (CWQI) and heavy metal pollution index (HPI) values of all groundwater samples satisfied the drinking permission limit, revealing the groundwater was suitable for drinking purposes in the study area. The hazard index (HI) values depicted that there was approximately a 6.01 % probability of groundwater posed health risks above the acceptable limit (>1) to children. The most sensitive factors to human health risks were exposure frequency to contaminated water and NO3− concentration. Our study is expected to provide a reliable and robust basis for groundwater sustainable management in mining rural regions.
期刊介绍:
The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.