一种新的混合机器学习方法,用于精确检索海洋表面叶绿素- A横跨贫营养化到富营养化水域

IF 7.7 2区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Ting Qin , Tianlong Liang , Donglin Fan , Hongchang He , Guiwen Lan , Bolin Fu
{"title":"一种新的混合机器学习方法,用于精确检索海洋表面叶绿素- A横跨贫营养化到富营养化水域","authors":"Ting Qin ,&nbsp;Tianlong Liang ,&nbsp;Donglin Fan ,&nbsp;Hongchang He ,&nbsp;Guiwen Lan ,&nbsp;Bolin Fu","doi":"10.1016/j.envres.2025.121864","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate assessment of chlorophyll <em>a</em> (Chla) concentration distribution and variations is significant for environmental monitoring and ecological research. However, the inversion of Chla in different optical types of water bodies can only be achieved by establishing algorithms suitable for different optical types, lacking a machine learning algorithm framework. Therefore, this study focuses on two aspects, input features and data samples, and designs an innovative composite machine learning algorithm framework called Synth Ridge Framework (SRF). The framework mainly consists of two main components: feature expansion and model construction. We employed the band ratio method and BorutaShap for feature expansion and selection. By integrating three gradient boosting decision tree models (XGBoost, LightBoost, and CatBoost) with the MDN ensemble strategy, we constructed a model named SynthRidge, aiming to enhance the model's overall performance. SynthRidge was trained and validated using the Rrs-In situ Chla dataset from the Terra-MODIS sensor, with Chla values ranging from 0 to 50 mg/m<sup>3</sup> in both datasets. On mg/m<sup>3</sup>the validation dataset, the SynthRidge model achieved strong predictive performance, with an R<sup>2</sup> of 0.930, a slope of 0.928, an RMSE of 4.672 mg/m<sup>3</sup>, an RMLSE of 0.039, a bias of 1.023, and an MAE of 1.389. Compared to the best-performing baseline model, the GBDT ensemble, SynthRidge demonstrated superior accuracy and robustness. Specifically, it improved the R<sup>2</sup> by 0.006, increased the slope by 0.020, reduced the RMSE by 0.890 mg/m<sup>3</sup>, and decreased the RMLSE by 0.003. The validation dataset has its R<sup>2</sup>, Slope, RMSE, RMLSE, Bias, and MAE values of 0.930, 0.928, 4.672 mg/m<sup>3</sup>, 0.039, 1.023, and 1.389, respectively. The predicted Chla density distribution by SynthRidge was more consistent with the measured values. These findings suggest that SRF is capable of effectively compensating for the limitations of input features, reducing the negative impact of data distribution, and improving the limitations of complex fusion algorithms. Furthermore, the performance of SRF on the SeaWiFS dataset demonstrates its versatility across different sensors.</div></div>","PeriodicalId":312,"journal":{"name":"Environmental Research","volume":"279 ","pages":"Article 121864"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel hybrid machine learning approach for accurate retrieval of ocean surface chlorophyll-a across oligotrophic to eutrophic waters\",\"authors\":\"Ting Qin ,&nbsp;Tianlong Liang ,&nbsp;Donglin Fan ,&nbsp;Hongchang He ,&nbsp;Guiwen Lan ,&nbsp;Bolin Fu\",\"doi\":\"10.1016/j.envres.2025.121864\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate assessment of chlorophyll <em>a</em> (Chla) concentration distribution and variations is significant for environmental monitoring and ecological research. However, the inversion of Chla in different optical types of water bodies can only be achieved by establishing algorithms suitable for different optical types, lacking a machine learning algorithm framework. Therefore, this study focuses on two aspects, input features and data samples, and designs an innovative composite machine learning algorithm framework called Synth Ridge Framework (SRF). The framework mainly consists of two main components: feature expansion and model construction. We employed the band ratio method and BorutaShap for feature expansion and selection. By integrating three gradient boosting decision tree models (XGBoost, LightBoost, and CatBoost) with the MDN ensemble strategy, we constructed a model named SynthRidge, aiming to enhance the model's overall performance. SynthRidge was trained and validated using the Rrs-In situ Chla dataset from the Terra-MODIS sensor, with Chla values ranging from 0 to 50 mg/m<sup>3</sup> in both datasets. On mg/m<sup>3</sup>the validation dataset, the SynthRidge model achieved strong predictive performance, with an R<sup>2</sup> of 0.930, a slope of 0.928, an RMSE of 4.672 mg/m<sup>3</sup>, an RMLSE of 0.039, a bias of 1.023, and an MAE of 1.389. Compared to the best-performing baseline model, the GBDT ensemble, SynthRidge demonstrated superior accuracy and robustness. Specifically, it improved the R<sup>2</sup> by 0.006, increased the slope by 0.020, reduced the RMSE by 0.890 mg/m<sup>3</sup>, and decreased the RMLSE by 0.003. The validation dataset has its R<sup>2</sup>, Slope, RMSE, RMLSE, Bias, and MAE values of 0.930, 0.928, 4.672 mg/m<sup>3</sup>, 0.039, 1.023, and 1.389, respectively. The predicted Chla density distribution by SynthRidge was more consistent with the measured values. These findings suggest that SRF is capable of effectively compensating for the limitations of input features, reducing the negative impact of data distribution, and improving the limitations of complex fusion algorithms. Furthermore, the performance of SRF on the SeaWiFS dataset demonstrates its versatility across different sensors.</div></div>\",\"PeriodicalId\":312,\"journal\":{\"name\":\"Environmental Research\",\"volume\":\"279 \",\"pages\":\"Article 121864\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0013935125011156\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0013935125011156","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

准确评估叶绿素a (Chla)浓度分布及其变化对环境监测和生态研究具有重要意义。然而,不同光学类型水体中Chla的反演只能通过建立适合不同光学类型的算法来实现,缺乏机器学习算法框架。因此,本研究从输入特征和数据样本两个方面着手,设计了一种创新的复合机器学习算法框架Synth Ridge framework (SRF)。该框架主要由两个主要部分组成:特征扩展和模型构建。我们采用带比法和BorutaShap进行特征扩展和选择。通过将三个梯度增强决策树模型(XGBoost、LightBoost和CatBoost)与MDN集成策略集成,我们构建了一个名为SynthRidge的模型,旨在提高模型的整体性能。SynthRidge使用Terra-MODIS传感器的rs- in situ Chla数据集进行训练和验证,两个数据集的Chla值范围为0至50 mg/m3。在mg/m3验证数据集上,SynthRidge模型具有较强的预测性能,R2为0.930,斜率为0.928,RMSE为4.672 mg/m3, RMLSE为0.039,偏差为1.023,MAE为1.389。与性能最好的基线模型GBDT集合相比,SynthRidge显示出更高的准确性和鲁棒性。其中,R2提高0.006,斜率提高0.020,RMSE降低0.890 mg/m3, RMLSE降低0.003。验证数据集的R2、Slope、RMSE、RMLSE、Bias和MAE值分别为0.930、0.928、4.672 mg/m3、0.039、1.023和1.389。SynthRidge预测的Chla密度分布与实测值更为吻合。这些发现表明,SRF能够有效补偿输入特征的局限性,减少数据分布的负面影响,改善复杂融合算法的局限性。此外,SRF在SeaWiFS数据集上的性能证明了它在不同传感器上的通用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A novel hybrid machine learning approach for accurate retrieval of ocean surface chlorophyll-a across oligotrophic to eutrophic waters
Accurate assessment of chlorophyll a (Chla) concentration distribution and variations is significant for environmental monitoring and ecological research. However, the inversion of Chla in different optical types of water bodies can only be achieved by establishing algorithms suitable for different optical types, lacking a machine learning algorithm framework. Therefore, this study focuses on two aspects, input features and data samples, and designs an innovative composite machine learning algorithm framework called Synth Ridge Framework (SRF). The framework mainly consists of two main components: feature expansion and model construction. We employed the band ratio method and BorutaShap for feature expansion and selection. By integrating three gradient boosting decision tree models (XGBoost, LightBoost, and CatBoost) with the MDN ensemble strategy, we constructed a model named SynthRidge, aiming to enhance the model's overall performance. SynthRidge was trained and validated using the Rrs-In situ Chla dataset from the Terra-MODIS sensor, with Chla values ranging from 0 to 50 mg/m3 in both datasets. On mg/m3the validation dataset, the SynthRidge model achieved strong predictive performance, with an R2 of 0.930, a slope of 0.928, an RMSE of 4.672 mg/m3, an RMLSE of 0.039, a bias of 1.023, and an MAE of 1.389. Compared to the best-performing baseline model, the GBDT ensemble, SynthRidge demonstrated superior accuracy and robustness. Specifically, it improved the R2 by 0.006, increased the slope by 0.020, reduced the RMSE by 0.890 mg/m3, and decreased the RMLSE by 0.003. The validation dataset has its R2, Slope, RMSE, RMLSE, Bias, and MAE values of 0.930, 0.928, 4.672 mg/m3, 0.039, 1.023, and 1.389, respectively. The predicted Chla density distribution by SynthRidge was more consistent with the measured values. These findings suggest that SRF is capable of effectively compensating for the limitations of input features, reducing the negative impact of data distribution, and improving the limitations of complex fusion algorithms. Furthermore, the performance of SRF on the SeaWiFS dataset demonstrates its versatility across different sensors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Environmental Research
Environmental Research 环境科学-公共卫生、环境卫生与职业卫生
CiteScore
12.60
自引率
8.40%
发文量
2480
审稿时长
4.7 months
期刊介绍: The Environmental Research journal presents a broad range of interdisciplinary research, focused on addressing worldwide environmental concerns and featuring innovative findings. Our publication strives to explore relevant anthropogenic issues across various environmental sectors, showcasing practical applications in real-life settings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信