Shubo Fang, Matthew J. Deitch, Tesfay G. Gebremicael
{"title":"评估水质管理中数据插值和机器学习方法的可靠性:SWAT模型比较","authors":"Shubo Fang, Matthew J. Deitch, Tesfay G. Gebremicael","doi":"10.1007/s12665-025-12313-1","DOIUrl":null,"url":null,"abstract":"<div><p>Due to data scarcity and the time-consuming nature of process-based modeling, SWAT often faces challenges in its application. This study evaluates the reliability of simple spatial interpolation using monitoring data, combined with advanced machine learning techniques, including Self-Organizing Maps (SOM), Generalized Additive Models (GAMs), Receiver Operating Characteristic (ROC) analysis, and K-means clustering, to identify critical source areas (CSAs), key stressors, and thresholds for water quality management. Similar to SWAT-based analyses, the study found that forest cover and human-modified land use significantly affect total nitrogen (TN) and total phosphorus (TP) levels, while also revealing population density as an additional influential factor. GAMs showed that human-disturbed land use drives TN pollution, and population density is key to TP enrichment. ROC analysis identified thresholds of 40.91% for forest cover (close to SWAT results) and 10.21% for human-disturbed areas, which is lower than SWAT-based estimates. A population threshold of 239 significantly impacted TP, a factor not identified by SWAT modeling. K-means clustering highlighted clusters 1, 4, and 5 as high-priority areas, and SWAT modeling indicated that managing these clusters—covering 47.39% of the watershed—could mitigate 42.66% of TN and 41.34% of TP. While this approach cannot fully replace SWAT modeling, but simple and time saving, it proves to be helpful for identifying CSAs and informing water quality management strategies.</p></div>","PeriodicalId":542,"journal":{"name":"Environmental Earth Sciences","volume":"84 10","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the reliability of data interpolation and machine learning methods for water quality management: a SWAT model comparison\",\"authors\":\"Shubo Fang, Matthew J. Deitch, Tesfay G. Gebremicael\",\"doi\":\"10.1007/s12665-025-12313-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Due to data scarcity and the time-consuming nature of process-based modeling, SWAT often faces challenges in its application. This study evaluates the reliability of simple spatial interpolation using monitoring data, combined with advanced machine learning techniques, including Self-Organizing Maps (SOM), Generalized Additive Models (GAMs), Receiver Operating Characteristic (ROC) analysis, and K-means clustering, to identify critical source areas (CSAs), key stressors, and thresholds for water quality management. Similar to SWAT-based analyses, the study found that forest cover and human-modified land use significantly affect total nitrogen (TN) and total phosphorus (TP) levels, while also revealing population density as an additional influential factor. GAMs showed that human-disturbed land use drives TN pollution, and population density is key to TP enrichment. ROC analysis identified thresholds of 40.91% for forest cover (close to SWAT results) and 10.21% for human-disturbed areas, which is lower than SWAT-based estimates. A population threshold of 239 significantly impacted TP, a factor not identified by SWAT modeling. K-means clustering highlighted clusters 1, 4, and 5 as high-priority areas, and SWAT modeling indicated that managing these clusters—covering 47.39% of the watershed—could mitigate 42.66% of TN and 41.34% of TP. While this approach cannot fully replace SWAT modeling, but simple and time saving, it proves to be helpful for identifying CSAs and informing water quality management strategies.</p></div>\",\"PeriodicalId\":542,\"journal\":{\"name\":\"Environmental Earth Sciences\",\"volume\":\"84 10\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Earth Sciences\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s12665-025-12313-1\",\"RegionNum\":4,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Earth Sciences","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s12665-025-12313-1","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Evaluating the reliability of data interpolation and machine learning methods for water quality management: a SWAT model comparison
Due to data scarcity and the time-consuming nature of process-based modeling, SWAT often faces challenges in its application. This study evaluates the reliability of simple spatial interpolation using monitoring data, combined with advanced machine learning techniques, including Self-Organizing Maps (SOM), Generalized Additive Models (GAMs), Receiver Operating Characteristic (ROC) analysis, and K-means clustering, to identify critical source areas (CSAs), key stressors, and thresholds for water quality management. Similar to SWAT-based analyses, the study found that forest cover and human-modified land use significantly affect total nitrogen (TN) and total phosphorus (TP) levels, while also revealing population density as an additional influential factor. GAMs showed that human-disturbed land use drives TN pollution, and population density is key to TP enrichment. ROC analysis identified thresholds of 40.91% for forest cover (close to SWAT results) and 10.21% for human-disturbed areas, which is lower than SWAT-based estimates. A population threshold of 239 significantly impacted TP, a factor not identified by SWAT modeling. K-means clustering highlighted clusters 1, 4, and 5 as high-priority areas, and SWAT modeling indicated that managing these clusters—covering 47.39% of the watershed—could mitigate 42.66% of TN and 41.34% of TP. While this approach cannot fully replace SWAT modeling, but simple and time saving, it proves to be helpful for identifying CSAs and informing water quality management strategies.
期刊介绍:
Environmental Earth Sciences is an international multidisciplinary journal concerned with all aspects of interaction between humans, natural resources, ecosystems, special climates or unique geographic zones, and the earth:
Water and soil contamination caused by waste management and disposal practices
Environmental problems associated with transportation by land, air, or water
Geological processes that may impact biosystems or humans
Man-made or naturally occurring geological or hydrological hazards
Environmental problems associated with the recovery of materials from the earth
Environmental problems caused by extraction of minerals, coal, and ores, as well as oil and gas, water and alternative energy sources
Environmental impacts of exploration and recultivation – Environmental impacts of hazardous materials
Management of environmental data and information in data banks and information systems
Dissemination of knowledge on techniques, methods, approaches and experiences to improve and remediate the environment
In pursuit of these topics, the geoscientific disciplines are invited to contribute their knowledge and experience. Major disciplines include: hydrogeology, hydrochemistry, geochemistry, geophysics, engineering geology, remediation science, natural resources management, environmental climatology and biota, environmental geography, soil science and geomicrobiology.