Mohamed F. Mahmoud, Mazdak Arabi, Shrideep Pallickara
{"title":"Harnessing ensemble Machine learning models for improved salinity prediction in large river basin scales","authors":"Mohamed F. Mahmoud, Mazdak Arabi, Shrideep Pallickara","doi":"10.1016/j.jhydrol.2025.132691","DOIUrl":null,"url":null,"abstract":"This study develops a robust ensemble machine learning methodology for predicting average annual salinity by combining multiple machine learning algorithms. Salt concentration is a crucial water quality indicator, and salinity issues cost $300 million annually in the U.S. Irrigated agricultural lands in the Upper Colorado River Basin contribute excessively to dissolved solid loads despite covering less than 2% of the basin area. The economic impact and complex relationship between irrigation practices, groundwater dynamics, and salinity levels necessitate improved predictive capabilities at river basin scales. Using twenty years of data from 150 watersheds, eleven machine learning algorithms were evaluated through both random and spatial cross-validation approaches, with Extreme Gradient Boosting, Gradient Boosting, and Random Forest emerging as top performers. Bayesian Model Averaging and stacked generalization were employed to create ensemble models, demonstrating enhanced performance validity. The BMA ensemble achieved better spatial generalization compared to individual models while requiring significantly less computational resources than stacking. Model uncertainty analysis revealed that BMA provided the most stable predictions among all approaches. Soil electrical conductivity and calcium carbonate content emerged as the most important predictors, followed by river flow. The resulting spatially distributed predictions revealed distinct patterns in sulfate loads and concentrations across sub-basins, providing insights for targeted salinity management. This study demonstrates the effectiveness of ensemble machine learning approaches for robust salinity prediction while highlighting the importance of comprehensive uncertainty assessment and spatial validation in environmental modeling applications.","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":"6 1","pages":""},"PeriodicalIF":5.9000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1016/j.jhydrol.2025.132691","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
This study develops a robust ensemble machine learning methodology for predicting average annual salinity by combining multiple machine learning algorithms. Salt concentration is a crucial water quality indicator, and salinity issues cost $300 million annually in the U.S. Irrigated agricultural lands in the Upper Colorado River Basin contribute excessively to dissolved solid loads despite covering less than 2% of the basin area. The economic impact and complex relationship between irrigation practices, groundwater dynamics, and salinity levels necessitate improved predictive capabilities at river basin scales. Using twenty years of data from 150 watersheds, eleven machine learning algorithms were evaluated through both random and spatial cross-validation approaches, with Extreme Gradient Boosting, Gradient Boosting, and Random Forest emerging as top performers. Bayesian Model Averaging and stacked generalization were employed to create ensemble models, demonstrating enhanced performance validity. The BMA ensemble achieved better spatial generalization compared to individual models while requiring significantly less computational resources than stacking. Model uncertainty analysis revealed that BMA provided the most stable predictions among all approaches. Soil electrical conductivity and calcium carbonate content emerged as the most important predictors, followed by river flow. The resulting spatially distributed predictions revealed distinct patterns in sulfate loads and concentrations across sub-basins, providing insights for targeted salinity management. This study demonstrates the effectiveness of ensemble machine learning approaches for robust salinity prediction while highlighting the importance of comprehensive uncertainty assessment and spatial validation in environmental modeling applications.
期刊介绍:
The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.