Evaluating the reliability of data interpolation and machine learning methods for water quality management: a SWAT model comparison

IF 2.8 4区 环境科学与生态学 Q3 ENVIRONMENTAL SCIENCES
Shubo Fang, Matthew J. Deitch, Tesfay G. Gebremicael
{"title":"Evaluating the reliability of data interpolation and machine learning methods for water quality management: a SWAT model comparison","authors":"Shubo Fang,&nbsp;Matthew J. Deitch,&nbsp;Tesfay G. Gebremicael","doi":"10.1007/s12665-025-12313-1","DOIUrl":null,"url":null,"abstract":"<div><p>Due to data scarcity and the time-consuming nature of process-based modeling, SWAT often faces challenges in its application. This study evaluates the reliability of simple spatial interpolation using monitoring data, combined with advanced machine learning techniques, including Self-Organizing Maps (SOM), Generalized Additive Models (GAMs), Receiver Operating Characteristic (ROC) analysis, and K-means clustering, to identify critical source areas (CSAs), key stressors, and thresholds for water quality management. Similar to SWAT-based analyses, the study found that forest cover and human-modified land use significantly affect total nitrogen (TN) and total phosphorus (TP) levels, while also revealing population density as an additional influential factor. GAMs showed that human-disturbed land use drives TN pollution, and population density is key to TP enrichment. ROC analysis identified thresholds of 40.91% for forest cover (close to SWAT results) and 10.21% for human-disturbed areas, which is lower than SWAT-based estimates. A population threshold of 239 significantly impacted TP, a factor not identified by SWAT modeling. K-means clustering highlighted clusters 1, 4, and 5 as high-priority areas, and SWAT modeling indicated that managing these clusters—covering 47.39% of the watershed—could mitigate 42.66% of TN and 41.34% of TP. While this approach cannot fully replace SWAT modeling, but simple and time saving, it proves to be helpful for identifying CSAs and informing water quality management strategies.</p></div>","PeriodicalId":542,"journal":{"name":"Environmental Earth Sciences","volume":"84 10","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Earth Sciences","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s12665-025-12313-1","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Due to data scarcity and the time-consuming nature of process-based modeling, SWAT often faces challenges in its application. This study evaluates the reliability of simple spatial interpolation using monitoring data, combined with advanced machine learning techniques, including Self-Organizing Maps (SOM), Generalized Additive Models (GAMs), Receiver Operating Characteristic (ROC) analysis, and K-means clustering, to identify critical source areas (CSAs), key stressors, and thresholds for water quality management. Similar to SWAT-based analyses, the study found that forest cover and human-modified land use significantly affect total nitrogen (TN) and total phosphorus (TP) levels, while also revealing population density as an additional influential factor. GAMs showed that human-disturbed land use drives TN pollution, and population density is key to TP enrichment. ROC analysis identified thresholds of 40.91% for forest cover (close to SWAT results) and 10.21% for human-disturbed areas, which is lower than SWAT-based estimates. A population threshold of 239 significantly impacted TP, a factor not identified by SWAT modeling. K-means clustering highlighted clusters 1, 4, and 5 as high-priority areas, and SWAT modeling indicated that managing these clusters—covering 47.39% of the watershed—could mitigate 42.66% of TN and 41.34% of TP. While this approach cannot fully replace SWAT modeling, but simple and time saving, it proves to be helpful for identifying CSAs and informing water quality management strategies.

评估水质管理中数据插值和机器学习方法的可靠性:SWAT模型比较
由于数据的稀缺性和基于过程的建模的耗时特性,SWAT在其应用中经常面临挑战。本研究利用监测数据,结合先进的机器学习技术,包括自组织图(SOM)、广义加性模型(GAMs)、接收者工作特征(ROC)分析和K-means聚类,评估了简单空间插值的可靠性,以确定水质管理的关键源区(csa)、关键压力源和阈值。与基于swat的分析类似,该研究发现森林覆盖和人类改造土地利用显著影响总氮(TN)和总磷(TP)水平,同时也揭示了人口密度是一个额外的影响因素。GAMs结果表明,人类干扰土地利用驱动总氮污染,人口密度是总磷富集的关键。ROC分析发现森林覆盖率的阈值为40.91%(接近SWAT结果),人为干扰地区的阈值为10.21%,低于基于SWAT的估计。239的人口阈值显著影响TP,这是SWAT模型未识别的一个因素。K-means聚类将1、4和5聚类突出为高优先级区域,SWAT模型表明,管理这些聚类(覆盖47.39%的流域)可以减少42.66%的总氮和41.34%的总磷。虽然这种方法不能完全取代SWAT建模,但简单且节省时间,它被证明有助于识别csa和通知水质管理策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Earth Sciences
Environmental Earth Sciences 环境科学-地球科学综合
CiteScore
5.10
自引率
3.60%
发文量
494
审稿时长
8.3 months
期刊介绍: Environmental Earth Sciences is an international multidisciplinary journal concerned with all aspects of interaction between humans, natural resources, ecosystems, special climates or unique geographic zones, and the earth: Water and soil contamination caused by waste management and disposal practices Environmental problems associated with transportation by land, air, or water Geological processes that may impact biosystems or humans Man-made or naturally occurring geological or hydrological hazards Environmental problems associated with the recovery of materials from the earth Environmental problems caused by extraction of minerals, coal, and ores, as well as oil and gas, water and alternative energy sources Environmental impacts of exploration and recultivation – Environmental impacts of hazardous materials Management of environmental data and information in data banks and information systems Dissemination of knowledge on techniques, methods, approaches and experiences to improve and remediate the environment In pursuit of these topics, the geoscientific disciplines are invited to contribute their knowledge and experience. Major disciplines include: hydrogeology, hydrochemistry, geochemistry, geophysics, engineering geology, remediation science, natural resources management, environmental climatology and biota, environmental geography, soil science and geomicrobiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信