Enhancing groundwater quality index prediction in data-scarce regions: Application of advanced artificial intelligence models in Nagaland, India

IF 1.9 4区地球科学 Q2 GEOCHEMISTRY & GEOPHYSICS

Dynamics of Atmospheres and Oceans Pub Date : 2025-07-11 DOI:10.1016/j.dynatmoce.2025.101579

Subhrajyoti Deb

{"title":"Enhancing groundwater quality index prediction in data-scarce regions: Application of advanced artificial intelligence models in Nagaland, India","authors":"Subhrajyoti Deb","doi":"10.1016/j.dynatmoce.2025.101579","DOIUrl":null,"url":null,"abstract":"<div><div>The Groundwater Quality Index (GQI) serves as a critical benchmark for assessing the long-term impacts of anthropogenic activities and natural processes on groundwater quality. However, calculating GQI from irregular datasets containing multiple parameters is often prone to errors. Despite growing interest in machine learning for water quality assessment, very few studies have explored groundwater quality prediction in data-scarce topographically complex regions. Moreover, limited efforts have been made to compare a wide range of Artificial Intelligence (AI) models under variable input scenarios using actual field data. To address this research gap, this study employs eight advanced AI models—Artificial Neural Network (ANN), Autoregressive Model (AR), Locally-weighted Linear Regression (LLR), M5P tree, Multiple Linear Regression (MLR), Random Forest (RF), Random Subspace (RS), and Support Vector Machine (SVM)—to predict GQI in Nagaland, a data-scarce hilly region in northeastern India. The research focuses on identifying an optimal subset regression for two scenarios: one optimizing GQI computation time by incorporating all water quality parameters, and the other exploring variations using the most sensitive parameters. Key findings reveal strong linear relationships between hydro-chemical parameters and GQI, with significant correlations such as Na<sup>+</sup> with TDS (0.936) and Mg<sup>2+</sup> with GQI (0.922). Sensitivity analysis identifies TDS and TH as primary determinants of GQI. Among the models, MLR achieves higher accuracy in the first scenario, with performance metrics of R (correlation coefficient) = 0.9999, MAE (Mean Absolute Error) = 0.0001, and RMSE (Root Mean Square Error) = 0.0002 %. In contrast, ANN performs better in the second scenario, with MAE = 2.4718, R = 0.9977, and RAE = 3.5463 %. These results highlight the efficacy of advanced AI models in enhancing GQI prediction accuracy, particularly in data-scarce regions like Nagaland.</div></div>","PeriodicalId":50563,"journal":{"name":"Dynamics of Atmospheres and Oceans","volume":"111 ","pages":"Article 101579"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dynamics of Atmospheres and Oceans","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377026525000545","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}

引用次数: 0

Abstract

The Groundwater Quality Index (GQI) serves as a critical benchmark for assessing the long-term impacts of anthropogenic activities and natural processes on groundwater quality. However, calculating GQI from irregular datasets containing multiple parameters is often prone to errors. Despite growing interest in machine learning for water quality assessment, very few studies have explored groundwater quality prediction in data-scarce topographically complex regions. Moreover, limited efforts have been made to compare a wide range of Artificial Intelligence (AI) models under variable input scenarios using actual field data. To address this research gap, this study employs eight advanced AI models—Artificial Neural Network (ANN), Autoregressive Model (AR), Locally-weighted Linear Regression (LLR), M5P tree, Multiple Linear Regression (MLR), Random Forest (RF), Random Subspace (RS), and Support Vector Machine (SVM)—to predict GQI in Nagaland, a data-scarce hilly region in northeastern India. The research focuses on identifying an optimal subset regression for two scenarios: one optimizing GQI computation time by incorporating all water quality parameters, and the other exploring variations using the most sensitive parameters. Key findings reveal strong linear relationships between hydro-chemical parameters and GQI, with significant correlations such as Na⁺ with TDS (0.936) and Mg²⁺ with GQI (0.922). Sensitivity analysis identifies TDS and TH as primary determinants of GQI. Among the models, MLR achieves higher accuracy in the first scenario, with performance metrics of R (correlation coefficient) = 0.9999, MAE (Mean Absolute Error) = 0.0001, and RMSE (Root Mean Square Error) = 0.0002 %. In contrast, ANN performs better in the second scenario, with MAE = 2.4718, R = 0.9977, and RAE = 3.5463 %. These results highlight the efficacy of advanced AI models in enhancing GQI prediction accuracy, particularly in data-scarce regions like Nagaland.

查看原文本刊更多论文

加强数据稀缺地区地下水质量指数预测：先进人工智能模型在印度那加兰邦的应用

地下水质量指数（GQI）是评价人类活动和自然过程对地下水质量长期影响的重要基准。然而，从包含多个参数的不规则数据集计算GQI往往容易出错。尽管人们对水质评估的机器学习越来越感兴趣，但很少有研究在数据稀缺的地形复杂地区探索地下水质量预测。此外，在使用实际现场数据的可变输入场景下，对各种人工智能（AI）模型进行比较的努力有限。为了弥补这一研究空白，本研究采用了八种先进的人工智能模型——人工神经网络（ANN）、自回归模型（AR）、局部加权线性回归（LLR）、M5P树、多元线性回归（MLR）、随机森林（RF）、随机子空间（RS）和支持向量机（SVM）——来预测印度东北部数据稀缺的山区那加兰邦的GQI。研究的重点是确定两种情况下的最优子集回归：一种是通过合并所有水质参数来优化GQI计算时间，另一种是使用最敏感的参数来探索变化。关键发现表明，水化学参数与GQI之间存在较强的线性关系，其中Na+与TDS（0.936）、Mg2+与GQI（0.922）的相关性显著。敏感性分析表明TDS和TH是GQI的主要决定因素。其中，MLR模型在第一种场景下的准确率较高，其性能指标R（相关系数）= 0.9999，MAE（平均绝对误差）= 0.0001，RMSE（均方根误差）= 0.0002 %。相比之下，ANN在第二种场景下表现更好，MAE = 2.4718,R = 0.9977,RAE = 3.5463 %。这些结果突出了先进的人工智能模型在提高GQI预测准确性方面的功效，特别是在那加兰邦等数据稀缺地区。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Dynamics of Atmospheres and Oceans 地学-地球化学与地球物理

CiteScore

3.10

自引率

5.90%

发文量

审稿时长

>12 weeks

期刊介绍： Dynamics of Atmospheres and Oceans is an international journal for research related to the dynamical and physical processes governing atmospheres, oceans and climate. Authors are invited to submit articles, short contributions or scholarly reviews in the following areas: •Dynamic meteorology •Physical oceanography •Geophysical fluid dynamics •Climate variability and climate change •Atmosphere-ocean-biosphere-cryosphere interactions •Prediction and predictability •Scale interactions Papers of theoretical, computational, experimental and observational investigations are invited, particularly those that explore the fundamental nature - or bring together the interdisciplinary and multidisciplinary aspects - of dynamical and physical processes at all scales. Papers that explore air-sea interactions and the coupling between atmospheres, oceans, and other components of the climate system are particularly welcome.