{"title":"A novel predictive framework for water quality assessment based on socio-economic indicators and water leaving reflectance","authors":"Hao Chen , Ali P. Yunus","doi":"10.1016/j.gsd.2025.101405","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of water quality at both spatial and temporal scales for large water bodies remains a daunting task with significant implications for human well-being and sustainable development (aligned with SDG 6 - clean water and sanitation). Traditional data-driven models on water quality prediction relied on some degree of field subsistence, which are neither cost-effective nor time-efficient. Socio-economic indicators have been concurrently used as predictor variable for water quality; however, such datasets typically available at coarse temporal resolutions, limiting their applicability for time-sensitive analyses. In this study, we integrated machine learning (ML) models with socio-economic indicators and remote sensing reflectance (R<sub>RS</sub>) to address the challenge of temporality in predicting Biochemical Oxygen Demand (BOD) and Total Coliform Bacteria (TCB) levels across 228 lake systems in the Indian subcontinent. Pearson correlation analysis revealed limited direct correlations (<0.5) between BOD, TCB, and the input variables. However, a stepwise omission and commission analysis demonstrated that incorporating R<sub>RS</sub> into the socio-economic model significantly enhanced predictive performance of the ML models. This approach achieved high classification accuracy for BOD and TCB, with Area Under the Curve (AUC) scores of 0.84 and 0.96, respectively, highlighting strong potential for temporal water quality assessment. Among the supervised learning methods tested, the random forest model outperformed all others in terms of accuracy and robustness. This study presents an integrated framework capable of predicting BOD and TCB with both high temporal and spatial resolution, and offers valuable insights for the effective and efficient management of aquatic ecosystems, enabling timely interventions and informed decision-making.</div></div>","PeriodicalId":37879,"journal":{"name":"Groundwater for Sustainable Development","volume":"28 ","pages":"Article 101405"},"PeriodicalIF":4.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Groundwater for Sustainable Development","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352801X25000025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate prediction of water quality at both spatial and temporal scales for large water bodies remains a daunting task with significant implications for human well-being and sustainable development (aligned with SDG 6 - clean water and sanitation). Traditional data-driven models on water quality prediction relied on some degree of field subsistence, which are neither cost-effective nor time-efficient. Socio-economic indicators have been concurrently used as predictor variable for water quality; however, such datasets typically available at coarse temporal resolutions, limiting their applicability for time-sensitive analyses. In this study, we integrated machine learning (ML) models with socio-economic indicators and remote sensing reflectance (RRS) to address the challenge of temporality in predicting Biochemical Oxygen Demand (BOD) and Total Coliform Bacteria (TCB) levels across 228 lake systems in the Indian subcontinent. Pearson correlation analysis revealed limited direct correlations (<0.5) between BOD, TCB, and the input variables. However, a stepwise omission and commission analysis demonstrated that incorporating RRS into the socio-economic model significantly enhanced predictive performance of the ML models. This approach achieved high classification accuracy for BOD and TCB, with Area Under the Curve (AUC) scores of 0.84 and 0.96, respectively, highlighting strong potential for temporal water quality assessment. Among the supervised learning methods tested, the random forest model outperformed all others in terms of accuracy and robustness. This study presents an integrated framework capable of predicting BOD and TCB with both high temporal and spatial resolution, and offers valuable insights for the effective and efficient management of aquatic ecosystems, enabling timely interventions and informed decision-making.
在空间和时间尺度上准确预测大型水体的水质仍然是一项艰巨的任务,对人类福祉和可持续发展具有重大影响(符合可持续发展目标6 -清洁水和卫生设施)。传统的数据驱动的水质预测模型在一定程度上依赖于野外生存,既不划算也不省时。社会经济指标同时被用作水质的预测变量;然而,这些数据集通常以粗糙的时间分辨率提供,限制了它们对时间敏感分析的适用性。在这项研究中,我们将机器学习(ML)模型与社会经济指标和遥感反射率(RRS)相结合,以解决预测印度次大陆228个湖泊系统的生化需氧量(BOD)和总大肠菌群(TCB)水平的时间性挑战。Pearson相关分析显示,BOD、TCB与输入变量之间的直接相关性有限(<0.5)。然而,逐步遗漏和委托分析表明,将RRS纳入社会经济模型显着提高了ML模型的预测性能。该方法对BOD和TCB的分类精度较高,曲线下面积(Area Under the Curve, AUC)得分分别为0.84和0.96,具有较强的时序水质评价潜力。在测试的监督学习方法中,随机森林模型在准确性和鲁棒性方面优于所有其他方法。本研究提出了一个能够以高时空分辨率预测BOD和TCB的综合框架,为有效和高效的水生生态系统管理提供了有价值的见解,从而实现及时的干预和明智的决策。
期刊介绍:
Groundwater for Sustainable Development is directed to different stakeholders and professionals, including government and non-governmental organizations, international funding agencies, universities, public water institutions, public health and other public/private sector professionals, and other relevant institutions. It is aimed at professionals, academics and students in the fields of disciplines such as: groundwater and its connection to surface hydrology and environment, soil sciences, engineering, ecology, microbiology, atmospheric sciences, analytical chemistry, hydro-engineering, water technology, environmental ethics, economics, public health, policy, as well as social sciences, legal disciplines, or any other area connected with water issues. The objectives of this journal are to facilitate: • The improvement of effective and sustainable management of water resources across the globe. • The improvement of human access to groundwater resources in adequate quantity and good quality. • The meeting of the increasing demand for drinking and irrigation water needed for food security to contribute to a social and economically sound human development. • The creation of a global inter- and multidisciplinary platform and forum to improve our understanding of groundwater resources and to advocate their effective and sustainable management and protection against contamination. • Interdisciplinary information exchange and to stimulate scientific research in the fields of groundwater related sciences and social and health sciences required to achieve the United Nations Millennium Development Goals for sustainable development.