Victor Oliveira Santos , Paulo Alexandre Costa Rocha , Jesse Van Griensven Thé , Bahram Gharabaghi
{"title":"Evaluation of machine learning methods for forecasting turbidity in river networks using Sentinel-2 remote sensing data","authors":"Victor Oliveira Santos , Paulo Alexandre Costa Rocha , Jesse Van Griensven Thé , Bahram Gharabaghi","doi":"10.1016/j.ecoinf.2025.103313","DOIUrl":null,"url":null,"abstract":"<div><div>Turbidity is an important indicator of river water quality and of great interest to improve the data acquisition methods and efficiency of decision support systems for sustainable ecosystem management. However, river water quality monitoring stations are very expensive to operate and maintain and lack spatial coverage. Therefore, this study takes advantage of the vast spatial coverage of remote sensing datasets from satellites to provide a more efficient hybrid system with comprehensive coverage of both spatial and temporal changes in water quality across a vast river network. Spectral bands from Sentinel-2 were analyzed using machine learning algorithms, namely XGBoost, Random Forests, GMDH, Support Vector Regression, k-Nearest Neighbors and Least Absolute Shrinkage and Selection Operator to model turbidity, using data from twelve monitoring stations across the Mississippi River, USA. Results show that considering the individual monitoring stations, the ML algorithms for turbidity modeling were satisfactory at locations with a larger range and standard deviation of turbidity values, achieving a mean R<sup>2</sup> value of 59.5 %. Tree-based models were the best overall approach, often ranking as the best or second-best performing model. Using all the samples from the monitoring stations, the XGBoost provided a superior output for turbidity modeling, reaching R<sup>2</sup> equal to 75.7 %. This represents an improvement of over 16 % compared to the average metric value for the individual stations. A comprehensive comparison with the literature found that the models implemented using this study's methodology could provide competitive results, deeming it as an alternative for turbidity modeling from remote sensing data.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103313"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S157495412500322X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Turbidity is an important indicator of river water quality and of great interest to improve the data acquisition methods and efficiency of decision support systems for sustainable ecosystem management. However, river water quality monitoring stations are very expensive to operate and maintain and lack spatial coverage. Therefore, this study takes advantage of the vast spatial coverage of remote sensing datasets from satellites to provide a more efficient hybrid system with comprehensive coverage of both spatial and temporal changes in water quality across a vast river network. Spectral bands from Sentinel-2 were analyzed using machine learning algorithms, namely XGBoost, Random Forests, GMDH, Support Vector Regression, k-Nearest Neighbors and Least Absolute Shrinkage and Selection Operator to model turbidity, using data from twelve monitoring stations across the Mississippi River, USA. Results show that considering the individual monitoring stations, the ML algorithms for turbidity modeling were satisfactory at locations with a larger range and standard deviation of turbidity values, achieving a mean R2 value of 59.5 %. Tree-based models were the best overall approach, often ranking as the best or second-best performing model. Using all the samples from the monitoring stations, the XGBoost provided a superior output for turbidity modeling, reaching R2 equal to 75.7 %. This represents an improvement of over 16 % compared to the average metric value for the individual stations. A comprehensive comparison with the literature found that the models implemented using this study's methodology could provide competitive results, deeming it as an alternative for turbidity modeling from remote sensing data.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.