Enhancing groundwater quality index prediction in data-scarce regions: Application of advanced artificial intelligence models in Nagaland, India

IF 1.9 4区 地球科学 Q2 GEOCHEMISTRY & GEOPHYSICS
Subhrajyoti Deb
{"title":"Enhancing groundwater quality index prediction in data-scarce regions: Application of advanced artificial intelligence models in Nagaland, India","authors":"Subhrajyoti Deb","doi":"10.1016/j.dynatmoce.2025.101579","DOIUrl":null,"url":null,"abstract":"<div><div>The Groundwater Quality Index (GQI) serves as a critical benchmark for assessing the long-term impacts of anthropogenic activities and natural processes on groundwater quality. However, calculating GQI from irregular datasets containing multiple parameters is often prone to errors. Despite growing interest in machine learning for water quality assessment, very few studies have explored groundwater quality prediction in data-scarce topographically complex regions. Moreover, limited efforts have been made to compare a wide range of Artificial Intelligence (AI) models under variable input scenarios using actual field data. To address this research gap, this study employs eight advanced AI models—Artificial Neural Network (ANN), Autoregressive Model (AR), Locally-weighted Linear Regression (LLR), M5P tree, Multiple Linear Regression (MLR), Random Forest (RF), Random Subspace (RS), and Support Vector Machine (SVM)—to predict GQI in Nagaland, a data-scarce hilly region in northeastern India. The research focuses on identifying an optimal subset regression for two scenarios: one optimizing GQI computation time by incorporating all water quality parameters, and the other exploring variations using the most sensitive parameters. Key findings reveal strong linear relationships between hydro-chemical parameters and GQI, with significant correlations such as Na<sup>+</sup> with TDS (0.936) and Mg<sup>2+</sup> with GQI (0.922). Sensitivity analysis identifies TDS and TH as primary determinants of GQI. Among the models, MLR achieves higher accuracy in the first scenario, with performance metrics of R (correlation coefficient) = 0.9999, MAE (Mean Absolute Error) = 0.0001, and RMSE (Root Mean Square Error) = 0.0002 %. In contrast, ANN performs better in the second scenario, with MAE = 2.4718, R = 0.9977, and RAE = 3.5463 %. These results highlight the efficacy of advanced AI models in enhancing GQI prediction accuracy, particularly in data-scarce regions like Nagaland.</div></div>","PeriodicalId":50563,"journal":{"name":"Dynamics of Atmospheres and Oceans","volume":"111 ","pages":"Article 101579"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dynamics of Atmospheres and Oceans","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377026525000545","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

The Groundwater Quality Index (GQI) serves as a critical benchmark for assessing the long-term impacts of anthropogenic activities and natural processes on groundwater quality. However, calculating GQI from irregular datasets containing multiple parameters is often prone to errors. Despite growing interest in machine learning for water quality assessment, very few studies have explored groundwater quality prediction in data-scarce topographically complex regions. Moreover, limited efforts have been made to compare a wide range of Artificial Intelligence (AI) models under variable input scenarios using actual field data. To address this research gap, this study employs eight advanced AI models—Artificial Neural Network (ANN), Autoregressive Model (AR), Locally-weighted Linear Regression (LLR), M5P tree, Multiple Linear Regression (MLR), Random Forest (RF), Random Subspace (RS), and Support Vector Machine (SVM)—to predict GQI in Nagaland, a data-scarce hilly region in northeastern India. The research focuses on identifying an optimal subset regression for two scenarios: one optimizing GQI computation time by incorporating all water quality parameters, and the other exploring variations using the most sensitive parameters. Key findings reveal strong linear relationships between hydro-chemical parameters and GQI, with significant correlations such as Na+ with TDS (0.936) and Mg2+ with GQI (0.922). Sensitivity analysis identifies TDS and TH as primary determinants of GQI. Among the models, MLR achieves higher accuracy in the first scenario, with performance metrics of R (correlation coefficient) = 0.9999, MAE (Mean Absolute Error) = 0.0001, and RMSE (Root Mean Square Error) = 0.0002 %. In contrast, ANN performs better in the second scenario, with MAE = 2.4718, R = 0.9977, and RAE = 3.5463 %. These results highlight the efficacy of advanced AI models in enhancing GQI prediction accuracy, particularly in data-scarce regions like Nagaland.
加强数据稀缺地区地下水质量指数预测:先进人工智能模型在印度那加兰邦的应用
地下水质量指数(GQI)是评价人类活动和自然过程对地下水质量长期影响的重要基准。然而,从包含多个参数的不规则数据集计算GQI往往容易出错。尽管人们对水质评估的机器学习越来越感兴趣,但很少有研究在数据稀缺的地形复杂地区探索地下水质量预测。此外,在使用实际现场数据的可变输入场景下,对各种人工智能(AI)模型进行比较的努力有限。为了弥补这一研究空白,本研究采用了八种先进的人工智能模型——人工神经网络(ANN)、自回归模型(AR)、局部加权线性回归(LLR)、M5P树、多元线性回归(MLR)、随机森林(RF)、随机子空间(RS)和支持向量机(SVM)——来预测印度东北部数据稀缺的山区那加兰邦的GQI。研究的重点是确定两种情况下的最优子集回归:一种是通过合并所有水质参数来优化GQI计算时间,另一种是使用最敏感的参数来探索变化。关键发现表明,水化学参数与GQI之间存在较强的线性关系,其中Na+与TDS(0.936)、Mg2+与GQI(0.922)的相关性显著。敏感性分析表明TDS和TH是GQI的主要决定因素。其中,MLR模型在第一种场景下的准确率较高,其性能指标R(相关系数)= 0.9999,MAE(平均绝对误差)= 0.0001,RMSE(均方根误差)= 0.0002 %。相比之下,ANN在第二种场景下表现更好,MAE = 2.4718,R = 0.9977,RAE = 3.5463 %。这些结果突出了先进的人工智能模型在提高GQI预测准确性方面的功效,特别是在那加兰邦等数据稀缺地区。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Dynamics of Atmospheres and Oceans
Dynamics of Atmospheres and Oceans 地学-地球化学与地球物理
CiteScore
3.10
自引率
5.90%
发文量
43
审稿时长
>12 weeks
期刊介绍: Dynamics of Atmospheres and Oceans is an international journal for research related to the dynamical and physical processes governing atmospheres, oceans and climate. Authors are invited to submit articles, short contributions or scholarly reviews in the following areas: •Dynamic meteorology •Physical oceanography •Geophysical fluid dynamics •Climate variability and climate change •Atmosphere-ocean-biosphere-cryosphere interactions •Prediction and predictability •Scale interactions Papers of theoretical, computational, experimental and observational investigations are invited, particularly those that explore the fundamental nature - or bring together the interdisciplinary and multidisciplinary aspects - of dynamical and physical processes at all scales. Papers that explore air-sea interactions and the coupling between atmospheres, oceans, and other components of the climate system are particularly welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信