Aayush Pandit, Sarah Hogan, David T. Mahoney, William I. Ford, James F. Fox, Christopher Wellen, Admin Husic
{"title":"Establishing performance criteria for evaluating watershed-scale sediment and nutrient models at fine temporal scales","authors":"Aayush Pandit, Sarah Hogan, David T. Mahoney, William I. Ford, James F. Fox, Christopher Wellen, Admin Husic","doi":"10.1016/j.watres.2025.123156","DOIUrl":null,"url":null,"abstract":"Watershed water quality models are mathematical tools used to simulate processes related to water, sediment, and nutrients. These models provide a framework that can be used to inform decision-making and the allocation of resources for watershed management. Therefore, it is critical to answer the question “when is a model good enough?” Established performance evaluation criteria, or thresholds for what is considered a ‘good’ model, provide common benchmarks against which model performance can be compared. Since the publication of prior meta-analyses on this topic, developments in the last decade necessitate further investigation, such as the advancement in high performance computing, the proliferation of aquatic sensors, and the development of machine learning algorithms. We surveyed the literature for quantitative model performance measures, including the Nash-Sutcliffe efficiency (NSE), with a particular focus on process-based models operating at fine temporal scales as their performance evaluation criteria are presently underdeveloped. The synthesis dataset was used to assess the influence of temporal resolution (sub-daily, daily, and monthly), calibration duration (< 3 years, 3 to 8 years, and > 8 years), and constituent target units (concentration, load, and yield) on model performance. The synthesis dataset includes 229 model applications, from which we use bootstrapping and personal modeling experience to establish sub-daily and daily performance evaluation criteria for flow, sediment, total nutrient, and dissolved nutrient models. For daily model evaluation, the NSE for sediment, total nutrient, and dissolved nutrient models should exceed 0.45, 0.30, and 0.35, respectively, for ‘satisfactory’ performance. Model performance generally improved when transitioning from short (< 3 years) to medium (3 to 8 years) calibration durations, but no additional gain was observed with longer (> 8 years) calibration. Performance was not significantly influenced by the selection of concentration (e.g. mg/L) or load (e.g. kg/s) as the target units for sediment or total nutrient models but was for dissolved nutrient models. We recommend the use of concentration rather than load as a water quality modeling target, as load may be biased by strong flow model performance whereas concentration provides a flow-independent measure of performance. Although the performance criteria developed herein are based on process-based models, they may be useful in assessing machine learning model performance and we demonstrate one such assessment on a recent deep learning model of daily nitrate prediction across the United States. The guidance presented here is intended to be used alongside, rather than to replace, the experience and modeling judgement of engineers and scientist who work to maintain our collective water resources.","PeriodicalId":443,"journal":{"name":"Water Research","volume":"23 1","pages":""},"PeriodicalIF":11.4000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.watres.2025.123156","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Watershed water quality models are mathematical tools used to simulate processes related to water, sediment, and nutrients. These models provide a framework that can be used to inform decision-making and the allocation of resources for watershed management. Therefore, it is critical to answer the question “when is a model good enough?” Established performance evaluation criteria, or thresholds for what is considered a ‘good’ model, provide common benchmarks against which model performance can be compared. Since the publication of prior meta-analyses on this topic, developments in the last decade necessitate further investigation, such as the advancement in high performance computing, the proliferation of aquatic sensors, and the development of machine learning algorithms. We surveyed the literature for quantitative model performance measures, including the Nash-Sutcliffe efficiency (NSE), with a particular focus on process-based models operating at fine temporal scales as their performance evaluation criteria are presently underdeveloped. The synthesis dataset was used to assess the influence of temporal resolution (sub-daily, daily, and monthly), calibration duration (< 3 years, 3 to 8 years, and > 8 years), and constituent target units (concentration, load, and yield) on model performance. The synthesis dataset includes 229 model applications, from which we use bootstrapping and personal modeling experience to establish sub-daily and daily performance evaluation criteria for flow, sediment, total nutrient, and dissolved nutrient models. For daily model evaluation, the NSE for sediment, total nutrient, and dissolved nutrient models should exceed 0.45, 0.30, and 0.35, respectively, for ‘satisfactory’ performance. Model performance generally improved when transitioning from short (< 3 years) to medium (3 to 8 years) calibration durations, but no additional gain was observed with longer (> 8 years) calibration. Performance was not significantly influenced by the selection of concentration (e.g. mg/L) or load (e.g. kg/s) as the target units for sediment or total nutrient models but was for dissolved nutrient models. We recommend the use of concentration rather than load as a water quality modeling target, as load may be biased by strong flow model performance whereas concentration provides a flow-independent measure of performance. Although the performance criteria developed herein are based on process-based models, they may be useful in assessing machine learning model performance and we demonstrate one such assessment on a recent deep learning model of daily nitrate prediction across the United States. The guidance presented here is intended to be used alongside, rather than to replace, the experience and modeling judgement of engineers and scientist who work to maintain our collective water resources.
期刊介绍:
Water Research, along with its open access companion journal Water Research X, serves as a platform for publishing original research papers covering various aspects of the science and technology related to the anthropogenic water cycle, water quality, and its management worldwide. The audience targeted by the journal comprises biologists, chemical engineers, chemists, civil engineers, environmental engineers, limnologists, and microbiologists. The scope of the journal include:
•Treatment processes for water and wastewaters (municipal, agricultural, industrial, and on-site treatment), including resource recovery and residuals management;
•Urban hydrology including sewer systems, stormwater management, and green infrastructure;
•Drinking water treatment and distribution;
•Potable and non-potable water reuse;
•Sanitation, public health, and risk assessment;
•Anaerobic digestion, solid and hazardous waste management, including source characterization and the effects and control of leachates and gaseous emissions;
•Contaminants (chemical, microbial, anthropogenic particles such as nanoparticles or microplastics) and related water quality sensing, monitoring, fate, and assessment;
•Anthropogenic impacts on inland, tidal, coastal and urban waters, focusing on surface and ground waters, and point and non-point sources of pollution;
•Environmental restoration, linked to surface water, groundwater and groundwater remediation;
•Analysis of the interfaces between sediments and water, and between water and atmosphere, focusing specifically on anthropogenic impacts;
•Mathematical modelling, systems analysis, machine learning, and beneficial use of big data related to the anthropogenic water cycle;
•Socio-economic, policy, and regulations studies.