Abu Reza Md Towfiqul Islam, Md Abdullah-Al Mamun, Mehedi Hasan, Mst Nazneen Aktar, Md Nashir Uddin, Md Abu Bakar Siddique, Mohaiminul Haider Chowdhury, Md Saiful Islam, A B M Mainul Bari, Abubakr M Idris, Venkatramanan Senapathi
{"title":"优化沿海地下水质量预测:一种具有交叉验证、自举和熵分析的新型数据挖掘框架。","authors":"Abu Reza Md Towfiqul Islam, Md Abdullah-Al Mamun, Mehedi Hasan, Mst Nazneen Aktar, Md Nashir Uddin, Md Abu Bakar Siddique, Mohaiminul Haider Chowdhury, Md Saiful Islam, A B M Mainul Bari, Abubakr M Idris, Venkatramanan Senapathi","doi":"10.1016/j.jconhyd.2024.104480","DOIUrl":null,"url":null,"abstract":"<p><p>Investigating the potential of novel data mining algorithms (DMAs) for modeling groundwater quality in coastal areas is an important requirement for groundwater resource management, especially in the coastal region of Bangladesh where groundwater is highly contaminated. In this work, the applicability of DMA, including Gaussian Process Regression (GPR), Bayesian Ridge Regression (BRR) and Artificial Neural Network (ANN), for predicting groundwater quality in coastal areas was investigated. The optuna-based optimized hyperparameter is proposed to improve the accuracy of the models, including optuna-GPR and optuna-BRR as benchmark models. Combined cross-validation (CV) and bootstrapping (B) methods were used to build six predictive models. The entropy-based coastal groundwater quality index (ECWQI) was converted into a normalized index (ECWQIn), which was divided into five classes from very poor to excellent. The self-organizing map (SOM), spatial autocorrelation and fuzzy logic model were used to identify spatial groundwater quality patterns based on 12 physicochemical variables collected from 67 groundwater wells. The SOM analysis identified four distinct spatial patterns, including EC-TDS-Cl<sup>-</sup>, MgpH, Ca<sup>2+</sup>K<sup>+</sup>NO₃<sup>-</sup>, and HCO₃<sup>-</sup>SO₄<sup>2-</sup>Na<sup>+</sup>F<sup>-</sup>. The results showed that both the ANN (CV) and ANN (B) models performed better than other optuna-based models during the test phase (RMSE = 0.041, MAE = 0.026, R2 = 0.971, RAE = 0.15 = 21 and CC = 0.986) and (RMSE = 0.041, MAE = 0.025, R2 = 0.969, RAE = 0.119 and CC = 0.975), respectively. SO<sub>4</sub><sup>2-</sup>, Cl<sup>-</sup> and F<sup>-</sup> played an important role in the prediction accuracy. F- and SO<sub>4</sub><sup>2-</sup> showed higher spatial autocorrelation, which affected groundwater quality degradation. In addition, the ANN (CV) and ANN (B) models showed a Gaussian distribution of model errors (small standard error, <1 %), indicating the stability of the model. These results indicate the efficiency of the ANN model in predicting groundwater quality in coastal areas, which would help regional water managers in real-time monitoring and management of sustainable groundwater resources.</p>","PeriodicalId":15530,"journal":{"name":"Journal of contaminant hydrology","volume":"269 ","pages":"104480"},"PeriodicalIF":3.5000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing coastal groundwater quality predictions: A novel data mining framework with cross-validation, bootstrapping, and entropy analysis.\",\"authors\":\"Abu Reza Md Towfiqul Islam, Md Abdullah-Al Mamun, Mehedi Hasan, Mst Nazneen Aktar, Md Nashir Uddin, Md Abu Bakar Siddique, Mohaiminul Haider Chowdhury, Md Saiful Islam, A B M Mainul Bari, Abubakr M Idris, Venkatramanan Senapathi\",\"doi\":\"10.1016/j.jconhyd.2024.104480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Investigating the potential of novel data mining algorithms (DMAs) for modeling groundwater quality in coastal areas is an important requirement for groundwater resource management, especially in the coastal region of Bangladesh where groundwater is highly contaminated. In this work, the applicability of DMA, including Gaussian Process Regression (GPR), Bayesian Ridge Regression (BRR) and Artificial Neural Network (ANN), for predicting groundwater quality in coastal areas was investigated. The optuna-based optimized hyperparameter is proposed to improve the accuracy of the models, including optuna-GPR and optuna-BRR as benchmark models. Combined cross-validation (CV) and bootstrapping (B) methods were used to build six predictive models. The entropy-based coastal groundwater quality index (ECWQI) was converted into a normalized index (ECWQIn), which was divided into five classes from very poor to excellent. The self-organizing map (SOM), spatial autocorrelation and fuzzy logic model were used to identify spatial groundwater quality patterns based on 12 physicochemical variables collected from 67 groundwater wells. The SOM analysis identified four distinct spatial patterns, including EC-TDS-Cl<sup>-</sup>, MgpH, Ca<sup>2+</sup>K<sup>+</sup>NO₃<sup>-</sup>, and HCO₃<sup>-</sup>SO₄<sup>2-</sup>Na<sup>+</sup>F<sup>-</sup>. The results showed that both the ANN (CV) and ANN (B) models performed better than other optuna-based models during the test phase (RMSE = 0.041, MAE = 0.026, R2 = 0.971, RAE = 0.15 = 21 and CC = 0.986) and (RMSE = 0.041, MAE = 0.025, R2 = 0.969, RAE = 0.119 and CC = 0.975), respectively. SO<sub>4</sub><sup>2-</sup>, Cl<sup>-</sup> and F<sup>-</sup> played an important role in the prediction accuracy. F- and SO<sub>4</sub><sup>2-</sup> showed higher spatial autocorrelation, which affected groundwater quality degradation. In addition, the ANN (CV) and ANN (B) models showed a Gaussian distribution of model errors (small standard error, <1 %), indicating the stability of the model. These results indicate the efficiency of the ANN model in predicting groundwater quality in coastal areas, which would help regional water managers in real-time monitoring and management of sustainable groundwater resources.</p>\",\"PeriodicalId\":15530,\"journal\":{\"name\":\"Journal of contaminant hydrology\",\"volume\":\"269 \",\"pages\":\"104480\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of contaminant hydrology\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jconhyd.2024.104480\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of contaminant hydrology","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.jconhyd.2024.104480","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
研究新型数据挖掘算法(DMAs)在沿海地区模拟地下水质量的潜力是地下水资源管理的重要要求,特别是在地下水严重污染的孟加拉国沿海地区。本文研究了高斯过程回归(GPR)、贝叶斯岭回归(BRR)和人工神经网络(ANN)三种方法在沿海地区地下水水质预测中的适用性。为了提高模型的精度,提出了基于optuna的优化超参数,包括optuna-GPR和optuna-BRR作为基准模型。采用交叉验证(CV)和bootstrapping (B)相结合的方法建立了6个预测模型。将基于熵的沿海地下水水质指数(ECWQI)转化为归一化指数(ECWQIn),将其分为极差到优5个等级。利用自组织图(SOM)、空间自相关和模糊逻辑模型,基于67口地下水水井采集的12个理化变量,对地下水水质空间格局进行了识别。SOM分析确定了四种不同的空间模式,包括EC-TDS-Cl-、MgpH、Ca2+K+NO₃-和HCO₃- so₄2-Na+F-。结果表明,ANN (CV)和ANN (B)模型在测试阶段的表现均优于其他基于optuna的模型(RMSE = 0.041, MAE = 0.026, R2 = 0.971, RAE = 0.15 = 21, CC = 0.986)和(RMSE = 0.041, MAE = 0.025, R2 = 0.969, RAE = 0.119, CC = 0.975)。SO42-、Cl-和F-对预测精度有重要影响。F-和SO42-表现出较高的空间自相关性,影响地下水水质退化。此外,ANN (CV)和ANN (B)模型的模型误差呈高斯分布(标准误差较小,
Optimizing coastal groundwater quality predictions: A novel data mining framework with cross-validation, bootstrapping, and entropy analysis.
Investigating the potential of novel data mining algorithms (DMAs) for modeling groundwater quality in coastal areas is an important requirement for groundwater resource management, especially in the coastal region of Bangladesh where groundwater is highly contaminated. In this work, the applicability of DMA, including Gaussian Process Regression (GPR), Bayesian Ridge Regression (BRR) and Artificial Neural Network (ANN), for predicting groundwater quality in coastal areas was investigated. The optuna-based optimized hyperparameter is proposed to improve the accuracy of the models, including optuna-GPR and optuna-BRR as benchmark models. Combined cross-validation (CV) and bootstrapping (B) methods were used to build six predictive models. The entropy-based coastal groundwater quality index (ECWQI) was converted into a normalized index (ECWQIn), which was divided into five classes from very poor to excellent. The self-organizing map (SOM), spatial autocorrelation and fuzzy logic model were used to identify spatial groundwater quality patterns based on 12 physicochemical variables collected from 67 groundwater wells. The SOM analysis identified four distinct spatial patterns, including EC-TDS-Cl-, MgpH, Ca2+K+NO₃-, and HCO₃-SO₄2-Na+F-. The results showed that both the ANN (CV) and ANN (B) models performed better than other optuna-based models during the test phase (RMSE = 0.041, MAE = 0.026, R2 = 0.971, RAE = 0.15 = 21 and CC = 0.986) and (RMSE = 0.041, MAE = 0.025, R2 = 0.969, RAE = 0.119 and CC = 0.975), respectively. SO42-, Cl- and F- played an important role in the prediction accuracy. F- and SO42- showed higher spatial autocorrelation, which affected groundwater quality degradation. In addition, the ANN (CV) and ANN (B) models showed a Gaussian distribution of model errors (small standard error, <1 %), indicating the stability of the model. These results indicate the efficiency of the ANN model in predicting groundwater quality in coastal areas, which would help regional water managers in real-time monitoring and management of sustainable groundwater resources.
期刊介绍:
The Journal of Contaminant Hydrology is an international journal publishing scientific articles pertaining to the contamination of subsurface water resources. Emphasis is placed on investigations of the physical, chemical, and biological processes influencing the behavior and fate of organic and inorganic contaminants in the unsaturated (vadose) and saturated (groundwater) zones, as well as at groundwater-surface water interfaces. The ecological impacts of contaminants transported both from and to aquifers are of interest. Articles on contamination of surface water only, without a link to groundwater, are out of the scope. Broad latitude is allowed in identifying contaminants of interest, and include legacy and emerging pollutants, nutrients, nanoparticles, pathogenic microorganisms (e.g., bacteria, viruses, protozoa), microplastics, and various constituents associated with energy production (e.g., methane, carbon dioxide, hydrogen sulfide).
The journal''s scope embraces a wide range of topics including: experimental investigations of contaminant sorption, diffusion, transformation, volatilization and transport in the surface and subsurface; characterization of soil and aquifer properties only as they influence contaminant behavior; development and testing of mathematical models of contaminant behaviour; innovative techniques for restoration of contaminated sites; development of new tools or techniques for monitoring the extent of soil and groundwater contamination; transformation of contaminants in the hyporheic zone; effects of contaminants traversing the hyporheic zone on surface water and groundwater ecosystems; subsurface carbon sequestration and/or turnover; and migration of fluids associated with energy production into groundwater.