估算具有明显季节变化的热带盆地中缺失的日流量数据:来自厄瓜多尔瓜亚斯河流域的比较案例研究

Q2 Environmental Science
Daniela Stay-Arevalo , Mijail Arias-Hidalgo , Boris Apolo-Masache , Luis Dominguez-Granda , Gonzalo Villa-Cox
{"title":"估算具有明显季节变化的热带盆地中缺失的日流量数据:来自厄瓜多尔瓜亚斯河流域的比较案例研究","authors":"Daniela Stay-Arevalo ,&nbsp;Mijail Arias-Hidalgo ,&nbsp;Boris Apolo-Masache ,&nbsp;Luis Dominguez-Granda ,&nbsp;Gonzalo Villa-Cox","doi":"10.1016/j.envc.2025.101262","DOIUrl":null,"url":null,"abstract":"<div><div>Streamflow data holds significant importance in multiple environmental assessments and management frameworks. Information gaps can markedly influence the precision and reliability of these assessments and practices, especially in developing countries. This study employs a predictive framework implementing Seasonal Autoregressive Integrated Moving Average (SARIMA), k-Nearest Neighbors (kNN) and Random Forest (RF) models to tackle missing information in a daily streamflow dataset of 22 hydrological stations within the Guayas River Basin (GRB), Ecuador. A comparative predictive performance contrast was set between actual observed data and out-of-sample model estimates. Models were evaluated by the computation of performance metrics (e.g. Bias, Normalized Root Mean Square Error (NRMSE), Normalized Mean Absolute Error (NMAE) and Nash–Sutcliffe model Efficiency coefficient (NSE)). We found that the kNN and RF models outperform the SARIMA model, with NSE values ranging from 0.715 to 0.983 when estimating randomly allocated contiguous gaps. Different gap extensions were tested as well, with more than 70% similitude for gap lengths up to 15 days with the RF model. Further estimations fail to reproduce the natural peak-flow dynamics of the original streamflow time series, and exhibit step-like patterns with lower adjustment metrics, generally underestimating observed values. This study opens room for improvement in data mining stages prior to modelling for proper characterization in data scarcity regions.</div></div>","PeriodicalId":34794,"journal":{"name":"Environmental Challenges","volume":"20 ","pages":"Article 101262"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating missing daily streamflow data in a tropical basin with pronounced seasonal variability: A comparative case study from the Guayas River Basin, Ecuador\",\"authors\":\"Daniela Stay-Arevalo ,&nbsp;Mijail Arias-Hidalgo ,&nbsp;Boris Apolo-Masache ,&nbsp;Luis Dominguez-Granda ,&nbsp;Gonzalo Villa-Cox\",\"doi\":\"10.1016/j.envc.2025.101262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Streamflow data holds significant importance in multiple environmental assessments and management frameworks. Information gaps can markedly influence the precision and reliability of these assessments and practices, especially in developing countries. This study employs a predictive framework implementing Seasonal Autoregressive Integrated Moving Average (SARIMA), k-Nearest Neighbors (kNN) and Random Forest (RF) models to tackle missing information in a daily streamflow dataset of 22 hydrological stations within the Guayas River Basin (GRB), Ecuador. A comparative predictive performance contrast was set between actual observed data and out-of-sample model estimates. Models were evaluated by the computation of performance metrics (e.g. Bias, Normalized Root Mean Square Error (NRMSE), Normalized Mean Absolute Error (NMAE) and Nash–Sutcliffe model Efficiency coefficient (NSE)). We found that the kNN and RF models outperform the SARIMA model, with NSE values ranging from 0.715 to 0.983 when estimating randomly allocated contiguous gaps. Different gap extensions were tested as well, with more than 70% similitude for gap lengths up to 15 days with the RF model. Further estimations fail to reproduce the natural peak-flow dynamics of the original streamflow time series, and exhibit step-like patterns with lower adjustment metrics, generally underestimating observed values. This study opens room for improvement in data mining stages prior to modelling for proper characterization in data scarcity regions.</div></div>\",\"PeriodicalId\":34794,\"journal\":{\"name\":\"Environmental Challenges\",\"volume\":\"20 \",\"pages\":\"Article 101262\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Challenges\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667010025001817\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Challenges","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667010025001817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Environmental Science","Score":null,"Total":0}
引用次数: 0

摘要

流量数据在多种环境评估和管理框架中具有重要意义。信息差距可以显著影响这些评估和做法的准确性和可靠性,特别是在发展中国家。本研究采用季节性自回归综合移动平均(SARIMA)、k-近邻(kNN)和随机森林(RF)模型的预测框架来处理厄瓜多尔瓜亚斯河流域(GRB) 22个水文站的每日流量数据集中的缺失信息。在实际观测数据和样本外模型估计之间设置了比较预测性能的对比。通过计算性能指标(如偏差、归一化均方根误差(NRMSE)、归一化平均绝对误差(NMAE)和Nash-Sutcliffe模型效率系数(NSE))来评估模型。我们发现,在估计随机分配的连续间隙时,kNN和RF模型的NSE值在0.715至0.983之间,优于SARIMA模型。不同的间隙扩展也进行了测试,在长达15天的间隙长度上,RF模型的相似度超过70%。进一步的估计无法再现原始流量时间序列的自然峰值流量动态,并且在较低的调整指标下呈现阶梯状模式,通常低估了观测值。本研究为在数据稀缺地区进行适当表征建模之前的数据挖掘阶段的改进开辟了空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Estimating missing daily streamflow data in a tropical basin with pronounced seasonal variability: A comparative case study from the Guayas River Basin, Ecuador
Streamflow data holds significant importance in multiple environmental assessments and management frameworks. Information gaps can markedly influence the precision and reliability of these assessments and practices, especially in developing countries. This study employs a predictive framework implementing Seasonal Autoregressive Integrated Moving Average (SARIMA), k-Nearest Neighbors (kNN) and Random Forest (RF) models to tackle missing information in a daily streamflow dataset of 22 hydrological stations within the Guayas River Basin (GRB), Ecuador. A comparative predictive performance contrast was set between actual observed data and out-of-sample model estimates. Models were evaluated by the computation of performance metrics (e.g. Bias, Normalized Root Mean Square Error (NRMSE), Normalized Mean Absolute Error (NMAE) and Nash–Sutcliffe model Efficiency coefficient (NSE)). We found that the kNN and RF models outperform the SARIMA model, with NSE values ranging from 0.715 to 0.983 when estimating randomly allocated contiguous gaps. Different gap extensions were tested as well, with more than 70% similitude for gap lengths up to 15 days with the RF model. Further estimations fail to reproduce the natural peak-flow dynamics of the original streamflow time series, and exhibit step-like patterns with lower adjustment metrics, generally underestimating observed values. This study opens room for improvement in data mining stages prior to modelling for proper characterization in data scarcity regions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Environmental Challenges
Environmental Challenges Environmental Science-Environmental Engineering
CiteScore
8.00
自引率
0.00%
发文量
249
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信