Khabat Khosravi , Salim Heddam , Changhyun Jun , Sayed M. Bateni , Dongkyun Kim , Essam Heggy
{"title":"River total dissolved gas prediction using a hybrid greedy-stepwise feature selection and bidirectional long short-term memory model","authors":"Khabat Khosravi , Salim Heddam , Changhyun Jun , Sayed M. Bateni , Dongkyun Kim , Essam Heggy","doi":"10.1016/j.ecoinf.2025.103191","DOIUrl":null,"url":null,"abstract":"<div><div>The supersaturation of total dissolved gas (TDG) in rivers serves as a critical indicator of water quality downstream of high dams. This study models TDG levels at two monitoring stations in the Columbia and Snake River Basins (USA), where high TDG concentrations were recorded. Hourly data on water temperature, barometric pressure, dam spill, sensor depth, and discharge serve as input variables for deep-learning models. Several models are developed and tested, including long short-term memory (LSTM), bidirectional LSTM (BiLSTM), gated recurrent unit (GRU), and an alternating model tree (AMT) hybridized with iterative absolute error regression (IAER) and iterative classifier optimizer (ICO) algorithms. A greedy stepwise feature selection technique (GSFST) is employed to identify the optimal model inputs. Each model is trained and evaluated at one station and validated at the second station to assess transferability and generalization capability. Model performance was compared using multiple quantitative and qualitative metrics, including the Nash–Sutcliffe Efficiency and uncertainty coefficient. Additionally, Friedman and Wilcoxon signed-rank tests confirmed statistically significant differences between models. Dam spills emerged as the most influential predictor of TDG levels at both sites. The GSFST selected the optimal input combination, including dam spill, water temperature, barometric pressure, and sensor depth. Among all models, GSFST-BiLSTM achieved the highest predictive accuracy, with Nash–Sutcliffe values of 0.95 (testing) and 0.90 (validation) and uncertainty coefficients of 5.2 % and 7.0 %, respectively. These findings demonstrate that GSFST-BiLSTM provides a robust and transferable framework for TDG prediction, with the potential for broader application pending further validation.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103191"},"PeriodicalIF":5.8000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125002006","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The supersaturation of total dissolved gas (TDG) in rivers serves as a critical indicator of water quality downstream of high dams. This study models TDG levels at two monitoring stations in the Columbia and Snake River Basins (USA), where high TDG concentrations were recorded. Hourly data on water temperature, barometric pressure, dam spill, sensor depth, and discharge serve as input variables for deep-learning models. Several models are developed and tested, including long short-term memory (LSTM), bidirectional LSTM (BiLSTM), gated recurrent unit (GRU), and an alternating model tree (AMT) hybridized with iterative absolute error regression (IAER) and iterative classifier optimizer (ICO) algorithms. A greedy stepwise feature selection technique (GSFST) is employed to identify the optimal model inputs. Each model is trained and evaluated at one station and validated at the second station to assess transferability and generalization capability. Model performance was compared using multiple quantitative and qualitative metrics, including the Nash–Sutcliffe Efficiency and uncertainty coefficient. Additionally, Friedman and Wilcoxon signed-rank tests confirmed statistically significant differences between models. Dam spills emerged as the most influential predictor of TDG levels at both sites. The GSFST selected the optimal input combination, including dam spill, water temperature, barometric pressure, and sensor depth. Among all models, GSFST-BiLSTM achieved the highest predictive accuracy, with Nash–Sutcliffe values of 0.95 (testing) and 0.90 (validation) and uncertainty coefficients of 5.2 % and 7.0 %, respectively. These findings demonstrate that GSFST-BiLSTM provides a robust and transferable framework for TDG prediction, with the potential for broader application pending further validation.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.