{"title":"A machine learning algorithm to retrieve the red peak of phytoplankton absorption spectra from ocean-colour remote sensing","authors":"Mohammad Ashphaq, Shovonlal Roy","doi":"10.1016/j.rsase.2025.101702","DOIUrl":null,"url":null,"abstract":"<div><div>Light absorption by microscopic phytoplankton in marine ecosystems is a crucial process underpinning biological production and global biogeochemical cycles. Accurate estimation of phytoplankton absorption coefficients, an inherent optical property of ocean water, can improve remote sensing applications including spectral photosynthesis models and assessments of ocean health, biodiversity, and climate change impacts. However, considerable uncertainty exists in current satellite retrievals of phytoplankton absorption coefficients, particularly for <em>ɑ</em><sub><em>ph</em></sub>(676) - the phytoplankton absorption peak at red wavelengths near 676 nm - which is an input to several novel and advanced satellite algorithms. This uncertainty hinders operational use of algorithms for assessing phytoplankton physiology, size structure and oceanic carbon pools from space. We aimed to improve satellite-based estimation of <em>ɑ</em><sub><em>ph</em></sub> (676) using advanced machine learning (ML) techniques. We compiled a comprehensive <em>in situ</em> dataset (n = 1576) of <em>ɑ</em><sub><em>ph</em></sub>(676) from published databases and matched with remote-sensing reflectance <em>Rrs</em> at six wavelengths (412, 443, 490, 510, 560, and 665 nm) from the Ocean Colour Climate Change Initiative. We extensively evaluated multiple base ML algorithms: Random Forest (RF), Gradient Boosting Machines, and Linear Regression; and implemented ensemble ML models: RF with Grid Search Cross-Validation, eXtreme Gradient Boosting Ensembled Model, Ensemble Forecast, Stacked Voting, Optimised Ensemble and Meta Stacking, integrating the base models through cross-validated hyperparameter tuning. Meta Stacking outperformed individual ML models in predictive accuracy across temporal resolutions, showing best results with daily composites. Our study addresses key limitations of previous models, including small training datasets, inconsistent performances, and lack of ensemble comparisons. We present a robust, extensively trained and validated ensemble ML model that significantly improves <em>ɑ</em><sub><em>ph</em></sub>(676) estimation and opens the possibility of routinely using in ocean colour remote sensing.</div></div>","PeriodicalId":53227,"journal":{"name":"Remote Sensing Applications-Society and Environment","volume":"39 ","pages":"Article 101702"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing Applications-Society and Environment","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352938525002551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Light absorption by microscopic phytoplankton in marine ecosystems is a crucial process underpinning biological production and global biogeochemical cycles. Accurate estimation of phytoplankton absorption coefficients, an inherent optical property of ocean water, can improve remote sensing applications including spectral photosynthesis models and assessments of ocean health, biodiversity, and climate change impacts. However, considerable uncertainty exists in current satellite retrievals of phytoplankton absorption coefficients, particularly for ɑph(676) - the phytoplankton absorption peak at red wavelengths near 676 nm - which is an input to several novel and advanced satellite algorithms. This uncertainty hinders operational use of algorithms for assessing phytoplankton physiology, size structure and oceanic carbon pools from space. We aimed to improve satellite-based estimation of ɑph (676) using advanced machine learning (ML) techniques. We compiled a comprehensive in situ dataset (n = 1576) of ɑph(676) from published databases and matched with remote-sensing reflectance Rrs at six wavelengths (412, 443, 490, 510, 560, and 665 nm) from the Ocean Colour Climate Change Initiative. We extensively evaluated multiple base ML algorithms: Random Forest (RF), Gradient Boosting Machines, and Linear Regression; and implemented ensemble ML models: RF with Grid Search Cross-Validation, eXtreme Gradient Boosting Ensembled Model, Ensemble Forecast, Stacked Voting, Optimised Ensemble and Meta Stacking, integrating the base models through cross-validated hyperparameter tuning. Meta Stacking outperformed individual ML models in predictive accuracy across temporal resolutions, showing best results with daily composites. Our study addresses key limitations of previous models, including small training datasets, inconsistent performances, and lack of ensemble comparisons. We present a robust, extensively trained and validated ensemble ML model that significantly improves ɑph(676) estimation and opens the possibility of routinely using in ocean colour remote sensing.
期刊介绍:
The journal ''Remote Sensing Applications: Society and Environment'' (RSASE) focuses on remote sensing studies that address specific topics with an emphasis on environmental and societal issues - regional / local studies with global significance. Subjects are encouraged to have an interdisciplinary approach and include, but are not limited by: " -Global and climate change studies addressing the impact of increasing concentrations of greenhouse gases, CO2 emission, carbon balance and carbon mitigation, energy system on social and environmental systems -Ecological and environmental issues including biodiversity, ecosystem dynamics, land degradation, atmospheric and water pollution, urban footprint, ecosystem management and natural hazards (e.g. earthquakes, typhoons, floods, landslides) -Natural resource studies including land-use in general, biomass estimation, forests, agricultural land, plantation, soils, coral reefs, wetland and water resources -Agriculture, food production systems and food security outcomes -Socio-economic issues including urban systems, urban growth, public health, epidemics, land-use transition and land use conflicts -Oceanography and coastal zone studies, including sea level rise projections, coastlines changes and the ocean-land interface -Regional challenges for remote sensing application techniques, monitoring and analysis, such as cloud screening and atmospheric correction for tropical regions -Interdisciplinary studies combining remote sensing, household survey data, field measurements and models to address environmental, societal and sustainability issues -Quantitative and qualitative analysis that documents the impact of using remote sensing studies in social, political, environmental or economic systems