Integrated machine learning based groundwater quality prediction through groundwater quality index for drinking purposes in a semi-arid river basin of south India.
D Karunanidhi, M Rhishi Hari Raj, Priyadarsi D Roy, T Subramani
{"title":"Integrated machine learning based groundwater quality prediction through groundwater quality index for drinking purposes in a semi-arid river basin of south India.","authors":"D Karunanidhi, M Rhishi Hari Raj, Priyadarsi D Roy, T Subramani","doi":"10.1007/s10653-025-02425-9","DOIUrl":null,"url":null,"abstract":"<p><p>The main objective of this study is to predict and monitor groundwater quality through the use of modern Machine Learning (ML) techniques. By employing ML techniques, the research effectively evaluates groundwater quality to forecast its future trends. Five machine learning models Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient and Boosting (XGBoost) were used here to predict the water quality by assessing the physical and chemical parameters such as electrical conductivity (EC), hydrogen ion (pH) concentration, total dissolved solids (TDS), chemical parameters such as, sodium (Na<sup>+</sup>), magnesium (Mg<sup>2+</sup>), calcium (Ca<sup>2+</sup>), potassium (K<sup>+</sup>), bicarbonates (HCO<sub>3</sub><sup>-</sup>), fluoride (F<sup>-</sup>), sulphate (SO<sub>4</sub><sup>2-</sup>), chloride (Cl<sup>-</sup>), and nitrate (NO<sub>3</sub><sup>-</sup>) in 94 dug and bore wells from the semi-arid river basin (Arjunanadi) in Tamil Nadu, India. The pH of the samples is alkaline nature. Gibb's diagram suggested the rock-water dominance and minor influence of evaporation and crystallization on the hydrochemistry. From water quality index, 599.75 km<sup>2</sup> (53%) of area has a good quality and 536.75 km<sup>2</sup> (47%) of area has poor water quality. Water Quality Index values (WQI) of water quality formed baseline data for the prediction models as a dependent variable, and the physicochemical parameters were used as independent variables. The model efficacies were assessed using statistical error such as Relative Squared Residual (RSR) error, Nash-Sutcliffe efficiency (NSE), Mean Absolute Percentage Error (MAPE), Coefficient of determination (R<sup>2</sup>) and final accuracy. In this study, the LR model provided the minimal error (RSR = 0.22, NSE = 0.95, MAPE = 1.3) with an accuracy of 95% in predicting the water quality. The performance of the ML models is in the sequence of SVM > Adaboost > XGBoost > RF. This study helps the lawmakers and administrators for creating awareness on modern techniques for predicting and monitoring groundwater quality on the general public and supporting to achieve the sustainable development goals 3 and 6 for clean and healthy community.</p>","PeriodicalId":11759,"journal":{"name":"Environmental Geochemistry and Health","volume":"47 4","pages":"119"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Geochemistry and Health","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s10653-025-02425-9","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
The main objective of this study is to predict and monitor groundwater quality through the use of modern Machine Learning (ML) techniques. By employing ML techniques, the research effectively evaluates groundwater quality to forecast its future trends. Five machine learning models Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient and Boosting (XGBoost) were used here to predict the water quality by assessing the physical and chemical parameters such as electrical conductivity (EC), hydrogen ion (pH) concentration, total dissolved solids (TDS), chemical parameters such as, sodium (Na+), magnesium (Mg2+), calcium (Ca2+), potassium (K+), bicarbonates (HCO3-), fluoride (F-), sulphate (SO42-), chloride (Cl-), and nitrate (NO3-) in 94 dug and bore wells from the semi-arid river basin (Arjunanadi) in Tamil Nadu, India. The pH of the samples is alkaline nature. Gibb's diagram suggested the rock-water dominance and minor influence of evaporation and crystallization on the hydrochemistry. From water quality index, 599.75 km2 (53%) of area has a good quality and 536.75 km2 (47%) of area has poor water quality. Water Quality Index values (WQI) of water quality formed baseline data for the prediction models as a dependent variable, and the physicochemical parameters were used as independent variables. The model efficacies were assessed using statistical error such as Relative Squared Residual (RSR) error, Nash-Sutcliffe efficiency (NSE), Mean Absolute Percentage Error (MAPE), Coefficient of determination (R2) and final accuracy. In this study, the LR model provided the minimal error (RSR = 0.22, NSE = 0.95, MAPE = 1.3) with an accuracy of 95% in predicting the water quality. The performance of the ML models is in the sequence of SVM > Adaboost > XGBoost > RF. This study helps the lawmakers and administrators for creating awareness on modern techniques for predicting and monitoring groundwater quality on the general public and supporting to achieve the sustainable development goals 3 and 6 for clean and healthy community.
期刊介绍:
Environmental Geochemistry and Health publishes original research papers and review papers across the broad field of environmental geochemistry. Environmental geochemistry and health establishes and explains links between the natural or disturbed chemical composition of the earth’s surface and the health of plants, animals and people.
Beneficial elements regulate or promote enzymatic and hormonal activity whereas other elements may be toxic. Bedrock geochemistry controls the composition of soil and hence that of water and vegetation. Environmental issues, such as pollution, arising from the extraction and use of mineral resources, are discussed. The effects of contaminants introduced into the earth’s geochemical systems are examined. Geochemical surveys of soil, water and plants show how major and trace elements are distributed geographically. Associated epidemiological studies reveal the possibility of causal links between the natural or disturbed geochemical environment and disease. Experimental research illuminates the nature or consequences of natural or disturbed geochemical processes.
The journal particularly welcomes novel research linking environmental geochemistry and health issues on such topics as: heavy metals (including mercury), persistent organic pollutants (POPs), and mixed chemicals emitted through human activities, such as uncontrolled recycling of electronic-waste; waste recycling; surface-atmospheric interaction processes (natural and anthropogenic emissions, vertical transport, deposition, and physical-chemical interaction) of gases and aerosols; phytoremediation/restoration of contaminated sites; food contamination and safety; environmental effects of medicines; effects and toxicity of mixed pollutants; speciation of heavy metals/metalloids; effects of mining; disturbed geochemistry from human behavior, natural or man-made hazards; particle and nanoparticle toxicology; risk and the vulnerability of populations, etc.