Ashifur Rahman, M. M. Mahbubul Syeed, Md. Rajaul Karim, Kaniz Fatema, Razib Hayat Khan, Mohammad Faisal Uddin
{"title":"一个优化的集成ML-WQI模型,通过最小化食和模糊问题来可靠地预测水质","authors":"Ashifur Rahman, M. M. Mahbubul Syeed, Md. Rajaul Karim, Kaniz Fatema, Razib Hayat Khan, Mohammad Faisal Uddin","doi":"10.1007/s13201-025-02450-0","DOIUrl":null,"url":null,"abstract":"<div><p>Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared (<span>\\(R^2\\)</span>), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and <span>\\(R^2\\)</span> scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.</p></div>","PeriodicalId":8374,"journal":{"name":"Applied Water Science","volume":"15 5","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s13201-025-02450-0.pdf","citationCount":"0","resultStr":"{\"title\":\"An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues\",\"authors\":\"Ashifur Rahman, M. M. Mahbubul Syeed, Md. Rajaul Karim, Kaniz Fatema, Razib Hayat Khan, Mohammad Faisal Uddin\",\"doi\":\"10.1007/s13201-025-02450-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared (<span>\\\\(R^2\\\\)</span>), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and <span>\\\\(R^2\\\\)</span> scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.</p></div>\",\"PeriodicalId\":8374,\"journal\":{\"name\":\"Applied Water Science\",\"volume\":\"15 5\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s13201-025-02450-0.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Water Science\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s13201-025-02450-0\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"WATER RESOURCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Water Science","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s13201-025-02450-0","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"WATER RESOURCES","Score":null,"Total":0}
引用次数: 0
摘要
监测水质对维持地球上的生态系统和各种生命形式至关重要。水质指数(WQI)模型是目前广泛采用的水质监测方法。然而,由于模型的可靠性和不一致性,他们受到了很多批评,这些批评通常是由重叠和模糊问题引发的。为了解决这些问题,最近,通过集成机器学习或深度学习(ML/DL)技术的数据驱动方法被广泛应用于开发改进的WQI模型。虽然这些模型比传统模型表现得更好,但最近的研究报告指出,由于数据变异性和异常值,所提出的方法往往产生不一致的结果。本研究的目的是定义一个鲁棒可靠的集成ML-WQI模型,该模型经过优化,可以减弱数据变异性、重叠和模糊问题的影响,从而实现准确的水质预测。为了定义集成模型,使用了八个突出的回归ML模型来选择性能最佳的基本估计器和元学习器。研究中使用的爱尔兰WQI数据集包括15年来的29159个样本。每个数据样本记录了11(11)个水质参数以及相应的WQI测量和分类,使用CCME、Brown和SRDD三种传统WQI模型进行计算。为了评估性能,进行了均方误差(MSE)、平均绝对误差(MAE)、均方根误差(RMSE)、r平方(\(R^2\))、五重交叉验证以及与现有ML模型的比较评估。此外,使用WQI分类数据定量评估了对日蚀、模糊和异常值的恢复能力。结果表明,以线性回归(LR)、随机森林(RF)和极端梯度增强(XGB)为基础估计器,以决策树(DT)为元学习器的集成ML-WQI模型具有较高的分类精度,MAE、MSE、RMSE和\(R^2\)分别为0.01、0.001、0.0034和1.00。这种性能度量优于现有的基于回归的ML-WQI模型。此外,该模型通过对所有接近水质总趋势的wqi进行分类,显示出更大的对异常值的弹性。该模型具有非常低的日食效应(23.9)%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.
An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared (\(R^2\)), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and \(R^2\) scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.