River total dissolved gas prediction using a hybrid greedy-stepwise feature selection and bidirectional long short-term memory model

IF 5.8 2区 环境科学与生态学 Q1 ECOLOGY
Khabat Khosravi , Salim Heddam , Changhyun Jun , Sayed M. Bateni , Dongkyun Kim , Essam Heggy
{"title":"River total dissolved gas prediction using a hybrid greedy-stepwise feature selection and bidirectional long short-term memory model","authors":"Khabat Khosravi ,&nbsp;Salim Heddam ,&nbsp;Changhyun Jun ,&nbsp;Sayed M. Bateni ,&nbsp;Dongkyun Kim ,&nbsp;Essam Heggy","doi":"10.1016/j.ecoinf.2025.103191","DOIUrl":null,"url":null,"abstract":"<div><div>The supersaturation of total dissolved gas (TDG) in rivers serves as a critical indicator of water quality downstream of high dams. This study models TDG levels at two monitoring stations in the Columbia and Snake River Basins (USA), where high TDG concentrations were recorded. Hourly data on water temperature, barometric pressure, dam spill, sensor depth, and discharge serve as input variables for deep-learning models. Several models are developed and tested, including long short-term memory (LSTM), bidirectional LSTM (BiLSTM), gated recurrent unit (GRU), and an alternating model tree (AMT) hybridized with iterative absolute error regression (IAER) and iterative classifier optimizer (ICO) algorithms. A greedy stepwise feature selection technique (GSFST) is employed to identify the optimal model inputs. Each model is trained and evaluated at one station and validated at the second station to assess transferability and generalization capability. Model performance was compared using multiple quantitative and qualitative metrics, including the Nash–Sutcliffe Efficiency and uncertainty coefficient. Additionally, Friedman and Wilcoxon signed-rank tests confirmed statistically significant differences between models. Dam spills emerged as the most influential predictor of TDG levels at both sites. The GSFST selected the optimal input combination, including dam spill, water temperature, barometric pressure, and sensor depth. Among all models, GSFST-BiLSTM achieved the highest predictive accuracy, with Nash–Sutcliffe values of 0.95 (testing) and 0.90 (validation) and uncertainty coefficients of 5.2 % and 7.0 %, respectively. These findings demonstrate that GSFST-BiLSTM provides a robust and transferable framework for TDG prediction, with the potential for broader application pending further validation.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103191"},"PeriodicalIF":5.8000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125002006","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The supersaturation of total dissolved gas (TDG) in rivers serves as a critical indicator of water quality downstream of high dams. This study models TDG levels at two monitoring stations in the Columbia and Snake River Basins (USA), where high TDG concentrations were recorded. Hourly data on water temperature, barometric pressure, dam spill, sensor depth, and discharge serve as input variables for deep-learning models. Several models are developed and tested, including long short-term memory (LSTM), bidirectional LSTM (BiLSTM), gated recurrent unit (GRU), and an alternating model tree (AMT) hybridized with iterative absolute error regression (IAER) and iterative classifier optimizer (ICO) algorithms. A greedy stepwise feature selection technique (GSFST) is employed to identify the optimal model inputs. Each model is trained and evaluated at one station and validated at the second station to assess transferability and generalization capability. Model performance was compared using multiple quantitative and qualitative metrics, including the Nash–Sutcliffe Efficiency and uncertainty coefficient. Additionally, Friedman and Wilcoxon signed-rank tests confirmed statistically significant differences between models. Dam spills emerged as the most influential predictor of TDG levels at both sites. The GSFST selected the optimal input combination, including dam spill, water temperature, barometric pressure, and sensor depth. Among all models, GSFST-BiLSTM achieved the highest predictive accuracy, with Nash–Sutcliffe values of 0.95 (testing) and 0.90 (validation) and uncertainty coefficients of 5.2 % and 7.0 %, respectively. These findings demonstrate that GSFST-BiLSTM provides a robust and transferable framework for TDG prediction, with the potential for broader application pending further validation.
基于贪婪逐步特征选择和双向长短期记忆混合模型的河流总溶解气预测
河流总溶解气(TDG)过饱和是高坝下游水质的重要指标。本研究模拟了哥伦比亚和Snake河流域(美国)两个监测站的TDG水平,这两个监测站记录了较高的TDG浓度。每小时的水温、气压、大坝溢出、传感器深度和流量数据作为深度学习模型的输入变量。本文开发并测试了多个模型,包括长短期记忆(LSTM)、双向LSTM (BiLSTM)、门控循环单元(GRU)以及混合了迭代绝对误差回归(IAER)和迭代分类器优化器(ICO)算法的交替模型树(AMT)。采用贪婪逐步特征选择技术(GSFST)识别最优模型输入。每个模型在一个站点进行训练和评估,并在第二个站点进行验证,以评估可转移性和泛化能力。使用多个定量和定性指标对模型性能进行比较,包括Nash-Sutcliffe效率和不确定性系数。此外,Friedman和Wilcoxon sign -rank检验证实了模型之间的统计学显著差异。大坝泄漏成为这两个地点最具影响力的TDG水平预测指标。GSFST选择了最优的输入组合,包括大坝溢出、水温、气压和传感器深度。在所有模型中,GSFST-BiLSTM的预测准确率最高,其Nash-Sutcliffe值分别为0.95(检验)和0.90(验证),不确定系数分别为5.2%和7.0%。这些发现表明,GSFST-BiLSTM为TDG预测提供了一个强大且可转移的框架,在进一步验证之前具有更广泛应用的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ecological Informatics
Ecological Informatics 环境科学-生态学
CiteScore
8.30
自引率
11.80%
发文量
346
审稿时长
46 days
期刊介绍: The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信