CatBoost-based prediction of suspended sediment concentration in the Pearl River estuary: Driving mechanisms unraveled via SHAP analysis

IF 2.9 4区 地球科学 Q2 MARINE & FRESHWATER BIOLOGY
Journal of Sea Research Pub Date : 2026-03-01 Epub Date: 2026-01-07 DOI:10.1016/j.seares.2026.102668
Xiaofei Cheng , Yunzhi Chen , Yang Zhang , Wei Xie , Liang Du , Dan Wu , Rui Xiao , Guoxuan Ji
{"title":"CatBoost-based prediction of suspended sediment concentration in the Pearl River estuary: Driving mechanisms unraveled via SHAP analysis","authors":"Xiaofei Cheng ,&nbsp;Yunzhi Chen ,&nbsp;Yang Zhang ,&nbsp;Wei Xie ,&nbsp;Liang Du ,&nbsp;Dan Wu ,&nbsp;Rui Xiao ,&nbsp;Guoxuan Ji","doi":"10.1016/j.seares.2026.102668","DOIUrl":null,"url":null,"abstract":"<div><div>This study focuses on predicting suspended sediment concentration (SSC) and analyzing its influencing factors in the Pearl River Estuary (Zhuhai, China) using in-situ hydrological data collected from October 2022 to May 2023. Nine mainstream machine learning models were compared, with the Categorical Boosting (CatBoost) model identified as the optimal for SSC prediction. CatBoost achieved high accuracy, with a Pearson Correlation Coefficient (R) of 0.76, Root Mean Squared Error (RMSE) of 3.76 mg/L, Mean Absolute Error (MAE) of 2.47 mg/L, Median Absolute Error (MedAE) of 2.04 mg/L, and Mean Squared Logarithmic Error (MSLE) of 0.198 mg/L, outperforming models such as Light Gradient Boosting Machine (LGBM), Ramdom Forest (RF), and Extreme Gradient Boosting (XGBoost). Stratified analysis showed it performed well for low-to-medium SSC (≤30 mg/L) but had limited accuracy for high SSC (&gt;30 mg/L). SHapley Additive exPlanations (SHAP) analysis revealed that significant wave height (Hs) and surface current speed (SCS) were the dominant drivers, with Hs exerting the most substantial influence. Both factors exhibited a pronounced positive regulatory effect on SSC. Further tests on variable combinations indicated that the simplified input mode (Hs + SCS) alone was sufficient to achieve accurate SSC predictions, with no significant improvement from adding more variables. This study demonstrates the effectiveness of CatBoost in SSC prediction and highlights key influencing factors via SHAP, providing a robust framework for precise SSC forecasting in estuarine environments.</div></div>","PeriodicalId":50056,"journal":{"name":"Journal of Sea Research","volume":"210 ","pages":"Article 102668"},"PeriodicalIF":2.9000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sea Research","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S138511012600002X","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MARINE & FRESHWATER BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

This study focuses on predicting suspended sediment concentration (SSC) and analyzing its influencing factors in the Pearl River Estuary (Zhuhai, China) using in-situ hydrological data collected from October 2022 to May 2023. Nine mainstream machine learning models were compared, with the Categorical Boosting (CatBoost) model identified as the optimal for SSC prediction. CatBoost achieved high accuracy, with a Pearson Correlation Coefficient (R) of 0.76, Root Mean Squared Error (RMSE) of 3.76 mg/L, Mean Absolute Error (MAE) of 2.47 mg/L, Median Absolute Error (MedAE) of 2.04 mg/L, and Mean Squared Logarithmic Error (MSLE) of 0.198 mg/L, outperforming models such as Light Gradient Boosting Machine (LGBM), Ramdom Forest (RF), and Extreme Gradient Boosting (XGBoost). Stratified analysis showed it performed well for low-to-medium SSC (≤30 mg/L) but had limited accuracy for high SSC (>30 mg/L). SHapley Additive exPlanations (SHAP) analysis revealed that significant wave height (Hs) and surface current speed (SCS) were the dominant drivers, with Hs exerting the most substantial influence. Both factors exhibited a pronounced positive regulatory effect on SSC. Further tests on variable combinations indicated that the simplified input mode (Hs + SCS) alone was sufficient to achieve accurate SSC predictions, with no significant improvement from adding more variables. This study demonstrates the effectiveness of CatBoost in SSC prediction and highlights key influencing factors via SHAP, providing a robust framework for precise SSC forecasting in estuarine environments.
基于catboost的珠江口悬沙浓度预测:通过SHAP分析揭示驱动机制
利用2022年10月至2023年5月的珠江口现场水文资料,对珠江口悬浮物浓度(SSC)进行预测并分析其影响因素。比较了9种主流机器学习模型,其中CatBoost模型被认为是SSC预测的最佳模型。CatBoost具有较高的准确性,Pearson相关系数(R)为0.76,均方根误差(RMSE)为3.76 mg/L,平均绝对误差(MAE)为2.47 mg/L,中位数绝对误差(MedAE)为2.04 mg/L,均方对数误差(MSLE)为0.198 mg/L,优于光梯度增强机(LGBM)、随机森林(RF)和极端梯度增强(XGBoost)等模型。分层分析表明,该方法对低至中等SSC(≤30 mg/L)检测效果良好,但对高SSC(≤30 mg/L)检测精度有限。SHapley加性解释(SHAP)分析表明,显著波高(Hs)和表面流速度(SCS)是主要驱动因素,其中Hs的影响最大。这两个因子对SSC均有显著的正向调节作用。对变量组合的进一步测试表明,简化的输入模式(Hs + SCS)本身就足以实现准确的SSC预测,增加更多的变量并没有显著的改善。本研究验证了CatBoost在海温预报中的有效性,并通过SHAP强调了海温预报的关键影响因素,为河口海温的精确预报提供了一个强有力的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Sea Research
Journal of Sea Research 地学-海洋学
CiteScore
3.20
自引率
5.00%
发文量
86
审稿时长
6-12 weeks
期刊介绍: The Journal of Sea Research is an international and multidisciplinary periodical on marine research, with an emphasis on the functioning of marine ecosystems in coastal and shelf seas, including intertidal, estuarine and brackish environments. As several subdisciplines add to this aim, manuscripts are welcome from the fields of marine biology, marine chemistry, marine sedimentology and physical oceanography, provided they add to the understanding of ecosystem processes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书