Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation

IF 8.6 Q1 REMOTE SENSING
Aifen Zhong , Difeng Wang , Fang Gong , Jingjing Huang , Zhuoqi Zheng , Xianqiang He , Qing Zhang , Qiankun Zhu
{"title":"Data fusion-based improvements in empirical regression and machine learning for global daily ∼ 8 km resolution sea surface nitrate estimation and interpretation","authors":"Aifen Zhong ,&nbsp;Difeng Wang ,&nbsp;Fang Gong ,&nbsp;Jingjing Huang ,&nbsp;Zhuoqi Zheng ,&nbsp;Xianqiang He ,&nbsp;Qing Zhang ,&nbsp;Qiankun Zhu","doi":"10.1016/j.jag.2025.104800","DOIUrl":null,"url":null,"abstract":"<div><div>Assessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms based on SSN-environment variable relationships have been developed, the prediction accuracy and spatiotemporal resolution of their applications continue to face limitations. Additionally, there has been relatively little reporting on the interannual variability of global SSN in previous studies. Here we aim to enhance the accuracy and spatial resolution of SSN retrievals by developing improved regression and machine learning models, enabling the generation of global daily ∼ 8 km SSN products from satellite and model data. To construct the empirical regression models, the global ocean was divided into five regions on the basis of the relationship between sea surface temperature (SST) and SSN: 80° S to 40° N, the North Pacific, the North Atlantic, the Arabian Sea, and the eastern equatorial Pacific. After adding SSN-related physical variables, high-accuracy regional empirical models are developed, with root mean square deviations (RMSDs) of 1.641, 2.701, 1.221, 1.298, and 2.379 μmol/kg for the studied regions. For the machine learning models, seven algorithms, namely, extremely randomized trees (ET), multilayer perceptron (MLP), stacking random forest (SRF), Gaussian process regression (GPR), support vector machine (SVM), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) algorithms, were tested. After modeling, validation, and extensive tests using independent cruise dataset, the XGBoost model outperformed others (RMSD = 1.189 μmol/kg) and bypassed the need for regional segmentation. Mechanistic analysis revealed the driving variables influencing SSN in both regional empirical and XGBoost models, improving interpretability. Comparative validation confirmed that our models surpass traditional approaches in accuracy and applicability, demonstrating their potential to advance global SSN monitoring. Using XGBoost-derived products, we find a slight weak decreasing trend in SSN over 23 years. The proposed robust and explainable SSN retrieval models have the potential to assist in ocean environmental management.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"143 ","pages":"Article 104800"},"PeriodicalIF":8.6000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225004479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0

Abstract

Assessing sea surface nitrate (SSN) concentrations and dynamics is crucial for understanding marine ecosystem health, yet optical remote sensing of SSN remains challenging because of the lack of distinct spectral features. While various global-scale SSN regression and machine learning algorithms based on SSN-environment variable relationships have been developed, the prediction accuracy and spatiotemporal resolution of their applications continue to face limitations. Additionally, there has been relatively little reporting on the interannual variability of global SSN in previous studies. Here we aim to enhance the accuracy and spatial resolution of SSN retrievals by developing improved regression and machine learning models, enabling the generation of global daily ∼ 8 km SSN products from satellite and model data. To construct the empirical regression models, the global ocean was divided into five regions on the basis of the relationship between sea surface temperature (SST) and SSN: 80° S to 40° N, the North Pacific, the North Atlantic, the Arabian Sea, and the eastern equatorial Pacific. After adding SSN-related physical variables, high-accuracy regional empirical models are developed, with root mean square deviations (RMSDs) of 1.641, 2.701, 1.221, 1.298, and 2.379 μmol/kg for the studied regions. For the machine learning models, seven algorithms, namely, extremely randomized trees (ET), multilayer perceptron (MLP), stacking random forest (SRF), Gaussian process regression (GPR), support vector machine (SVM), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) algorithms, were tested. After modeling, validation, and extensive tests using independent cruise dataset, the XGBoost model outperformed others (RMSD = 1.189 μmol/kg) and bypassed the need for regional segmentation. Mechanistic analysis revealed the driving variables influencing SSN in both regional empirical and XGBoost models, improving interpretability. Comparative validation confirmed that our models surpass traditional approaches in accuracy and applicability, demonstrating their potential to advance global SSN monitoring. Using XGBoost-derived products, we find a slight weak decreasing trend in SSN over 23 years. The proposed robust and explainable SSN retrieval models have the potential to assist in ocean environmental management.
基于数据融合的经验回归和机器学习改进,用于全球日~ 8公里分辨率海面硝酸盐估计和解释
评估海洋表面硝酸盐(SSN)浓度和动态对了解海洋生态系统健康至关重要,但由于缺乏明确的光谱特征,SSN的光学遥感仍然具有挑战性。虽然各种基于SSN-环境变量关系的全球尺度SSN回归和机器学习算法已经被开发出来,但其应用的预测精度和时空分辨率仍然面临限制。此外,在以往的研究中,关于全球SSN年际变化的报道相对较少。在这里,我们的目标是通过开发改进的回归和机器学习模型来提高SSN检索的准确性和空间分辨率,从而能够从卫星和模型数据中生成全球每日~ 8公里的SSN产品。根据海表温度与SSN的关系,将全球海洋划分为5个区域:80°S ~ 40°N、北太平洋、北大西洋、阿拉伯海和东赤道太平洋。在加入与ssn相关的物理变量后,建立了高精度的区域经验模型,研究区域的均方根偏差(rmsd)分别为1.641、2.701、1.221、1.298和2.379 μmol/kg。对于机器学习模型,测试了极端随机树(ET)、多层感知器(MLP)、堆叠随机森林(SRF)、高斯过程回归(GPR)、支持向量机(SVM)、梯度增强决策树(GBDT)和极端梯度增强(XGBoost)算法7种算法。经过建模、验证和独立巡航数据集的广泛测试,XGBoost模型优于其他模型(RMSD = 1.189 μmol/kg),并且绕过了区域分割的需要。机制分析揭示了区域经验模型和XGBoost模型中影响SSN的驱动变量,提高了可解释性。对比验证证实,我们的模型在准确性和适用性方面超越了传统方法,展示了它们在推进全球SSN监测方面的潜力。使用xgboost衍生产品,我们发现在过去的23年中SSN有轻微的微弱下降趋势。所提出的稳健且可解释的SSN检索模型具有协助海洋环境管理的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International journal of applied earth observation and geoinformation : ITC journal
International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences
CiteScore
12.00
自引率
0.00%
发文量
0
审稿时长
77 days
期刊介绍: The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信