{"title":"Enhancing groundwater quality index prediction in data-scarce regions: Application of advanced artificial intelligence models in Nagaland, India","authors":"Subhrajyoti Deb","doi":"10.1016/j.dynatmoce.2025.101579","DOIUrl":null,"url":null,"abstract":"<div><div>The Groundwater Quality Index (GQI) serves as a critical benchmark for assessing the long-term impacts of anthropogenic activities and natural processes on groundwater quality. However, calculating GQI from irregular datasets containing multiple parameters is often prone to errors. Despite growing interest in machine learning for water quality assessment, very few studies have explored groundwater quality prediction in data-scarce topographically complex regions. Moreover, limited efforts have been made to compare a wide range of Artificial Intelligence (AI) models under variable input scenarios using actual field data. To address this research gap, this study employs eight advanced AI models—Artificial Neural Network (ANN), Autoregressive Model (AR), Locally-weighted Linear Regression (LLR), M5P tree, Multiple Linear Regression (MLR), Random Forest (RF), Random Subspace (RS), and Support Vector Machine (SVM)—to predict GQI in Nagaland, a data-scarce hilly region in northeastern India. The research focuses on identifying an optimal subset regression for two scenarios: one optimizing GQI computation time by incorporating all water quality parameters, and the other exploring variations using the most sensitive parameters. Key findings reveal strong linear relationships between hydro-chemical parameters and GQI, with significant correlations such as Na<sup>+</sup> with TDS (0.936) and Mg<sup>2+</sup> with GQI (0.922). Sensitivity analysis identifies TDS and TH as primary determinants of GQI. Among the models, MLR achieves higher accuracy in the first scenario, with performance metrics of R (correlation coefficient) = 0.9999, MAE (Mean Absolute Error) = 0.0001, and RMSE (Root Mean Square Error) = 0.0002 %. In contrast, ANN performs better in the second scenario, with MAE = 2.4718, R = 0.9977, and RAE = 3.5463 %. These results highlight the efficacy of advanced AI models in enhancing GQI prediction accuracy, particularly in data-scarce regions like Nagaland.</div></div>","PeriodicalId":50563,"journal":{"name":"Dynamics of Atmospheres and Oceans","volume":"111 ","pages":"Article 101579"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dynamics of Atmospheres and Oceans","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377026525000545","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
The Groundwater Quality Index (GQI) serves as a critical benchmark for assessing the long-term impacts of anthropogenic activities and natural processes on groundwater quality. However, calculating GQI from irregular datasets containing multiple parameters is often prone to errors. Despite growing interest in machine learning for water quality assessment, very few studies have explored groundwater quality prediction in data-scarce topographically complex regions. Moreover, limited efforts have been made to compare a wide range of Artificial Intelligence (AI) models under variable input scenarios using actual field data. To address this research gap, this study employs eight advanced AI models—Artificial Neural Network (ANN), Autoregressive Model (AR), Locally-weighted Linear Regression (LLR), M5P tree, Multiple Linear Regression (MLR), Random Forest (RF), Random Subspace (RS), and Support Vector Machine (SVM)—to predict GQI in Nagaland, a data-scarce hilly region in northeastern India. The research focuses on identifying an optimal subset regression for two scenarios: one optimizing GQI computation time by incorporating all water quality parameters, and the other exploring variations using the most sensitive parameters. Key findings reveal strong linear relationships between hydro-chemical parameters and GQI, with significant correlations such as Na+ with TDS (0.936) and Mg2+ with GQI (0.922). Sensitivity analysis identifies TDS and TH as primary determinants of GQI. Among the models, MLR achieves higher accuracy in the first scenario, with performance metrics of R (correlation coefficient) = 0.9999, MAE (Mean Absolute Error) = 0.0001, and RMSE (Root Mean Square Error) = 0.0002 %. In contrast, ANN performs better in the second scenario, with MAE = 2.4718, R = 0.9977, and RAE = 3.5463 %. These results highlight the efficacy of advanced AI models in enhancing GQI prediction accuracy, particularly in data-scarce regions like Nagaland.
期刊介绍:
Dynamics of Atmospheres and Oceans is an international journal for research related to the dynamical and physical processes governing atmospheres, oceans and climate.
Authors are invited to submit articles, short contributions or scholarly reviews in the following areas:
•Dynamic meteorology
•Physical oceanography
•Geophysical fluid dynamics
•Climate variability and climate change
•Atmosphere-ocean-biosphere-cryosphere interactions
•Prediction and predictability
•Scale interactions
Papers of theoretical, computational, experimental and observational investigations are invited, particularly those that explore the fundamental nature - or bring together the interdisciplinary and multidisciplinary aspects - of dynamical and physical processes at all scales. Papers that explore air-sea interactions and the coupling between atmospheres, oceans, and other components of the climate system are particularly welcome.