{"title":"Modeling river water dissolved organic matter using ensemble computing and genetic programming techniques","authors":"Mohammad Zounemat-Kermani , Soudabeh Golestani Kermani , Marzieh Fadaee , Ammar Aldallal , Ozgur Kisi , Abdollah Ramezani-Charmahineh","doi":"10.1016/j.ecohyd.2024.04.003","DOIUrl":null,"url":null,"abstract":"<div><div>Dissolved organic matter (DOM) plays a diverse role in aquatic ecosystems and is a key participant in global carbon budgets; thus, precise simulation and modeling of DOM concentrations in rivers and streams is critical in hydro-environmental projects. Among various modeling strategies, data-driven Machine Learning (ML) approaches ‒ and particularly Ensemble Machine Learning (EML) models ‒ have proven their fair capabilities in simulating environmental issues in aquatic media based on the information available from a limited number of physicochemical and biological parameters of water. In this regard, several MLs (such as Support Vector Regression, SVR, and Extreme Learning Machine, ELM), two EMLs (e.g., Random Forests, RF, and Boosted Trees, BTs), as well as Gene Expression Programming (GEP), are evaluated for predicting fluorescent Dissolved Organic Matter (fDOM) in the Caloosahatchee River in Florida. The modeling strategy of fDOM was based on constructing regular and ensemble ML models using seven quantitative and qualitative independent parameters (flow rate, temperature, specific conductance, dissolved oxygen, pH, turbidity, and nitrate recorded from 2017 to 2019) after all being introduced as influential parameters on the target variable (fDOM) using the best subset regression technique. Based on the k-fold cross-validation method (<em>k</em> = 4), the applied regular MLs (SVR, ELM, and GEP) provided better performance than the traditional multiple linear regression model (on average, 6.8 % improvement in RMSE). However, the results showed that the EML models (RF and BT) outperformed the regular MLs (on average, 7.2 % improvement in RMSE) in fDOM prediction.</div></div>","PeriodicalId":56070,"journal":{"name":"Ecohydrology & Hydrobiology","volume":"25 2","pages":"Pages 292-302"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecohydrology & Hydrobiology","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1642359324000430","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Dissolved organic matter (DOM) plays a diverse role in aquatic ecosystems and is a key participant in global carbon budgets; thus, precise simulation and modeling of DOM concentrations in rivers and streams is critical in hydro-environmental projects. Among various modeling strategies, data-driven Machine Learning (ML) approaches ‒ and particularly Ensemble Machine Learning (EML) models ‒ have proven their fair capabilities in simulating environmental issues in aquatic media based on the information available from a limited number of physicochemical and biological parameters of water. In this regard, several MLs (such as Support Vector Regression, SVR, and Extreme Learning Machine, ELM), two EMLs (e.g., Random Forests, RF, and Boosted Trees, BTs), as well as Gene Expression Programming (GEP), are evaluated for predicting fluorescent Dissolved Organic Matter (fDOM) in the Caloosahatchee River in Florida. The modeling strategy of fDOM was based on constructing regular and ensemble ML models using seven quantitative and qualitative independent parameters (flow rate, temperature, specific conductance, dissolved oxygen, pH, turbidity, and nitrate recorded from 2017 to 2019) after all being introduced as influential parameters on the target variable (fDOM) using the best subset regression technique. Based on the k-fold cross-validation method (k = 4), the applied regular MLs (SVR, ELM, and GEP) provided better performance than the traditional multiple linear regression model (on average, 6.8 % improvement in RMSE). However, the results showed that the EML models (RF and BT) outperformed the regular MLs (on average, 7.2 % improvement in RMSE) in fDOM prediction.
期刊介绍:
Ecohydrology & Hydrobiology is an international journal that aims to advance ecohydrology as the study of the interplay between ecological and hydrological processes from molecular to river basin scales, and to promote its implementation as an integrative management tool to harmonize societal needs with biosphere potential.