Comparative assessment of machine learning algorithms for retrieving colored dissolved organic matter (CDOM) from Sentinel-2/MSI images in the coastal waters of the Persian Gulf
Bonyad Ahmadi , Mehdi Gholamalifard , Seyed Mahmoud Ghasempouri , Tiit Kutser
{"title":"Comparative assessment of machine learning algorithms for retrieving colored dissolved organic matter (CDOM) from Sentinel-2/MSI images in the coastal waters of the Persian Gulf","authors":"Bonyad Ahmadi , Mehdi Gholamalifard , Seyed Mahmoud Ghasempouri , Tiit Kutser","doi":"10.1016/j.ecoinf.2025.103171","DOIUrl":null,"url":null,"abstract":"<div><div>Colored Dissolved Organic Matter, a pivotal component of aquatic biogeochemical cycles, plays a critical role in regulating water quality and ecosystem functionality. This study provides the first comprehensive assessment of CDOM dynamics in the Persian Gulf's industrialized coastal waters, focusing on the Pars Special Economic Energy Zone (PSEEZ)—a global energy epicenter and the world's largest natural gas reserve. Seasonal field campaigns conducted in 2023 acquired 199 in situ samples stratified across four seasons (Spring: <em>n</em> = 62, Summer: <em>n</em> = 18, Fall: <em>n</em> = 55, Winter: <em>n</em> = 64) using a CTD-integrated Cyclops-7 fluorometer. Sampling intervals were methodologically synchronized with satellite overpasses (±3 h) to minimize temporal discrepancies between ground-truth measurements and remotely sensed data, thereby ensuring spatiotemporal coherence essential for robust algorithm calibration and validation. Contrary to expectations, CDOM concentrations in petrochemical-influenced areas (e.g., stations P7: 0.29 ppb, P13: 0.35 ppb) were markedly lower than in natural mangrove ecosystems (stations N13: 19.61 ppb, NA2: 12.91 ppb), underscoring the antagonistic effects of industrial pollutants on organic matter stability. Initial CDOM retrieval algorithms yielded suboptimal accuracy (MAE = 1.16, RMSLE = 1.2). A regionally tuned band ratio algorithm improved performance by 27 % (MAE = 0.85) and 22 % (RMSLE = 0.94). Machine learning models further enhanced retrievals, with the Mixture Density Network (MDN) emerging as the superior framework. The MDN achieved an RMSLE of 0.47 (17.5 % improvement over MLP, 14.5 % over SVM) and reduced systematic bias (SSPB) by 26.12 units compared to Bayesian Ridge Regression (BRR), outperforming conventional models like SVM (MAE = 0.61, RMSLE = 0.55). While the MDN exhibited marginally higher absolute error (MAE = 0.53) than deterministic models, its probabilistic architecture uniquely addressed the Persian Gulf's optical complexity, characterized by overlapping signals from SGD-driven organics, hydrocarbon plumes, and sediment resuspension. This study establishes MDN as a transformative tool for CDOM retrieval in optically heterogeneous, anthropogenically stressed waters, while advocating for regionally adaptive frameworks to advance precision water quality monitoring in critical marine ecosystems.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"89 ","pages":"Article 103171"},"PeriodicalIF":5.8000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125001803","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Colored Dissolved Organic Matter, a pivotal component of aquatic biogeochemical cycles, plays a critical role in regulating water quality and ecosystem functionality. This study provides the first comprehensive assessment of CDOM dynamics in the Persian Gulf's industrialized coastal waters, focusing on the Pars Special Economic Energy Zone (PSEEZ)—a global energy epicenter and the world's largest natural gas reserve. Seasonal field campaigns conducted in 2023 acquired 199 in situ samples stratified across four seasons (Spring: n = 62, Summer: n = 18, Fall: n = 55, Winter: n = 64) using a CTD-integrated Cyclops-7 fluorometer. Sampling intervals were methodologically synchronized with satellite overpasses (±3 h) to minimize temporal discrepancies between ground-truth measurements and remotely sensed data, thereby ensuring spatiotemporal coherence essential for robust algorithm calibration and validation. Contrary to expectations, CDOM concentrations in petrochemical-influenced areas (e.g., stations P7: 0.29 ppb, P13: 0.35 ppb) were markedly lower than in natural mangrove ecosystems (stations N13: 19.61 ppb, NA2: 12.91 ppb), underscoring the antagonistic effects of industrial pollutants on organic matter stability. Initial CDOM retrieval algorithms yielded suboptimal accuracy (MAE = 1.16, RMSLE = 1.2). A regionally tuned band ratio algorithm improved performance by 27 % (MAE = 0.85) and 22 % (RMSLE = 0.94). Machine learning models further enhanced retrievals, with the Mixture Density Network (MDN) emerging as the superior framework. The MDN achieved an RMSLE of 0.47 (17.5 % improvement over MLP, 14.5 % over SVM) and reduced systematic bias (SSPB) by 26.12 units compared to Bayesian Ridge Regression (BRR), outperforming conventional models like SVM (MAE = 0.61, RMSLE = 0.55). While the MDN exhibited marginally higher absolute error (MAE = 0.53) than deterministic models, its probabilistic architecture uniquely addressed the Persian Gulf's optical complexity, characterized by overlapping signals from SGD-driven organics, hydrocarbon plumes, and sediment resuspension. This study establishes MDN as a transformative tool for CDOM retrieval in optically heterogeneous, anthropogenically stressed waters, while advocating for regionally adaptive frameworks to advance precision water quality monitoring in critical marine ecosystems.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.