Ting Qin , Tianlong Liang , Donglin Fan , Hongchang He , Guiwen Lan , Bolin Fu
{"title":"一种新的混合机器学习方法,用于精确检索海洋表面叶绿素- A横跨贫营养化到富营养化水域","authors":"Ting Qin , Tianlong Liang , Donglin Fan , Hongchang He , Guiwen Lan , Bolin Fu","doi":"10.1016/j.envres.2025.121864","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate assessment of chlorophyll <em>a</em> (Chla) concentration distribution and variations is significant for environmental monitoring and ecological research. However, the inversion of Chla in different optical types of water bodies can only be achieved by establishing algorithms suitable for different optical types, lacking a machine learning algorithm framework. Therefore, this study focuses on two aspects, input features and data samples, and designs an innovative composite machine learning algorithm framework called Synth Ridge Framework (SRF). The framework mainly consists of two main components: feature expansion and model construction. We employed the band ratio method and BorutaShap for feature expansion and selection. By integrating three gradient boosting decision tree models (XGBoost, LightBoost, and CatBoost) with the MDN ensemble strategy, we constructed a model named SynthRidge, aiming to enhance the model's overall performance. SynthRidge was trained and validated using the Rrs-In situ Chla dataset from the Terra-MODIS sensor, with Chla values ranging from 0 to 50 mg/m<sup>3</sup> in both datasets. On mg/m<sup>3</sup>the validation dataset, the SynthRidge model achieved strong predictive performance, with an R<sup>2</sup> of 0.930, a slope of 0.928, an RMSE of 4.672 mg/m<sup>3</sup>, an RMLSE of 0.039, a bias of 1.023, and an MAE of 1.389. Compared to the best-performing baseline model, the GBDT ensemble, SynthRidge demonstrated superior accuracy and robustness. Specifically, it improved the R<sup>2</sup> by 0.006, increased the slope by 0.020, reduced the RMSE by 0.890 mg/m<sup>3</sup>, and decreased the RMLSE by 0.003. The validation dataset has its R<sup>2</sup>, Slope, RMSE, RMLSE, Bias, and MAE values of 0.930, 0.928, 4.672 mg/m<sup>3</sup>, 0.039, 1.023, and 1.389, respectively. The predicted Chla density distribution by SynthRidge was more consistent with the measured values. These findings suggest that SRF is capable of effectively compensating for the limitations of input features, reducing the negative impact of data distribution, and improving the limitations of complex fusion algorithms. Furthermore, the performance of SRF on the SeaWiFS dataset demonstrates its versatility across different sensors.</div></div>","PeriodicalId":312,"journal":{"name":"Environmental Research","volume":"279 ","pages":"Article 121864"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel hybrid machine learning approach for accurate retrieval of ocean surface chlorophyll-a across oligotrophic to eutrophic waters\",\"authors\":\"Ting Qin , Tianlong Liang , Donglin Fan , Hongchang He , Guiwen Lan , Bolin Fu\",\"doi\":\"10.1016/j.envres.2025.121864\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate assessment of chlorophyll <em>a</em> (Chla) concentration distribution and variations is significant for environmental monitoring and ecological research. However, the inversion of Chla in different optical types of water bodies can only be achieved by establishing algorithms suitable for different optical types, lacking a machine learning algorithm framework. Therefore, this study focuses on two aspects, input features and data samples, and designs an innovative composite machine learning algorithm framework called Synth Ridge Framework (SRF). The framework mainly consists of two main components: feature expansion and model construction. We employed the band ratio method and BorutaShap for feature expansion and selection. By integrating three gradient boosting decision tree models (XGBoost, LightBoost, and CatBoost) with the MDN ensemble strategy, we constructed a model named SynthRidge, aiming to enhance the model's overall performance. SynthRidge was trained and validated using the Rrs-In situ Chla dataset from the Terra-MODIS sensor, with Chla values ranging from 0 to 50 mg/m<sup>3</sup> in both datasets. On mg/m<sup>3</sup>the validation dataset, the SynthRidge model achieved strong predictive performance, with an R<sup>2</sup> of 0.930, a slope of 0.928, an RMSE of 4.672 mg/m<sup>3</sup>, an RMLSE of 0.039, a bias of 1.023, and an MAE of 1.389. Compared to the best-performing baseline model, the GBDT ensemble, SynthRidge demonstrated superior accuracy and robustness. Specifically, it improved the R<sup>2</sup> by 0.006, increased the slope by 0.020, reduced the RMSE by 0.890 mg/m<sup>3</sup>, and decreased the RMLSE by 0.003. The validation dataset has its R<sup>2</sup>, Slope, RMSE, RMLSE, Bias, and MAE values of 0.930, 0.928, 4.672 mg/m<sup>3</sup>, 0.039, 1.023, and 1.389, respectively. The predicted Chla density distribution by SynthRidge was more consistent with the measured values. These findings suggest that SRF is capable of effectively compensating for the limitations of input features, reducing the negative impact of data distribution, and improving the limitations of complex fusion algorithms. Furthermore, the performance of SRF on the SeaWiFS dataset demonstrates its versatility across different sensors.</div></div>\",\"PeriodicalId\":312,\"journal\":{\"name\":\"Environmental Research\",\"volume\":\"279 \",\"pages\":\"Article 121864\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0013935125011156\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0013935125011156","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
A novel hybrid machine learning approach for accurate retrieval of ocean surface chlorophyll-a across oligotrophic to eutrophic waters
Accurate assessment of chlorophyll a (Chla) concentration distribution and variations is significant for environmental monitoring and ecological research. However, the inversion of Chla in different optical types of water bodies can only be achieved by establishing algorithms suitable for different optical types, lacking a machine learning algorithm framework. Therefore, this study focuses on two aspects, input features and data samples, and designs an innovative composite machine learning algorithm framework called Synth Ridge Framework (SRF). The framework mainly consists of two main components: feature expansion and model construction. We employed the band ratio method and BorutaShap for feature expansion and selection. By integrating three gradient boosting decision tree models (XGBoost, LightBoost, and CatBoost) with the MDN ensemble strategy, we constructed a model named SynthRidge, aiming to enhance the model's overall performance. SynthRidge was trained and validated using the Rrs-In situ Chla dataset from the Terra-MODIS sensor, with Chla values ranging from 0 to 50 mg/m3 in both datasets. On mg/m3the validation dataset, the SynthRidge model achieved strong predictive performance, with an R2 of 0.930, a slope of 0.928, an RMSE of 4.672 mg/m3, an RMLSE of 0.039, a bias of 1.023, and an MAE of 1.389. Compared to the best-performing baseline model, the GBDT ensemble, SynthRidge demonstrated superior accuracy and robustness. Specifically, it improved the R2 by 0.006, increased the slope by 0.020, reduced the RMSE by 0.890 mg/m3, and decreased the RMLSE by 0.003. The validation dataset has its R2, Slope, RMSE, RMLSE, Bias, and MAE values of 0.930, 0.928, 4.672 mg/m3, 0.039, 1.023, and 1.389, respectively. The predicted Chla density distribution by SynthRidge was more consistent with the measured values. These findings suggest that SRF is capable of effectively compensating for the limitations of input features, reducing the negative impact of data distribution, and improving the limitations of complex fusion algorithms. Furthermore, the performance of SRF on the SeaWiFS dataset demonstrates its versatility across different sensors.
期刊介绍:
The Environmental Research journal presents a broad range of interdisciplinary research, focused on addressing worldwide environmental concerns and featuring innovative findings. Our publication strives to explore relevant anthropogenic issues across various environmental sectors, showcasing practical applications in real-life settings.