Xiaofei Cheng , Yunzhi Chen , Yang Zhang , Wei Xie , Liang Du , Dan Wu , Rui Xiao , Guoxuan Ji
{"title":"CatBoost-based prediction of suspended sediment concentration in the Pearl River estuary: Driving mechanisms unraveled via SHAP analysis","authors":"Xiaofei Cheng , Yunzhi Chen , Yang Zhang , Wei Xie , Liang Du , Dan Wu , Rui Xiao , Guoxuan Ji","doi":"10.1016/j.seares.2026.102668","DOIUrl":null,"url":null,"abstract":"<div><div>This study focuses on predicting suspended sediment concentration (SSC) and analyzing its influencing factors in the Pearl River Estuary (Zhuhai, China) using in-situ hydrological data collected from October 2022 to May 2023. Nine mainstream machine learning models were compared, with the Categorical Boosting (CatBoost) model identified as the optimal for SSC prediction. CatBoost achieved high accuracy, with a Pearson Correlation Coefficient (R) of 0.76, Root Mean Squared Error (RMSE) of 3.76 mg/L, Mean Absolute Error (MAE) of 2.47 mg/L, Median Absolute Error (MedAE) of 2.04 mg/L, and Mean Squared Logarithmic Error (MSLE) of 0.198 mg/L, outperforming models such as Light Gradient Boosting Machine (LGBM), Ramdom Forest (RF), and Extreme Gradient Boosting (XGBoost). Stratified analysis showed it performed well for low-to-medium SSC (≤30 mg/L) but had limited accuracy for high SSC (>30 mg/L). SHapley Additive exPlanations (SHAP) analysis revealed that significant wave height (Hs) and surface current speed (SCS) were the dominant drivers, with Hs exerting the most substantial influence. Both factors exhibited a pronounced positive regulatory effect on SSC. Further tests on variable combinations indicated that the simplified input mode (Hs + SCS) alone was sufficient to achieve accurate SSC predictions, with no significant improvement from adding more variables. This study demonstrates the effectiveness of CatBoost in SSC prediction and highlights key influencing factors via SHAP, providing a robust framework for precise SSC forecasting in estuarine environments.</div></div>","PeriodicalId":50056,"journal":{"name":"Journal of Sea Research","volume":"210 ","pages":"Article 102668"},"PeriodicalIF":2.9000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sea Research","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S138511012600002X","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MARINE & FRESHWATER BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This study focuses on predicting suspended sediment concentration (SSC) and analyzing its influencing factors in the Pearl River Estuary (Zhuhai, China) using in-situ hydrological data collected from October 2022 to May 2023. Nine mainstream machine learning models were compared, with the Categorical Boosting (CatBoost) model identified as the optimal for SSC prediction. CatBoost achieved high accuracy, with a Pearson Correlation Coefficient (R) of 0.76, Root Mean Squared Error (RMSE) of 3.76 mg/L, Mean Absolute Error (MAE) of 2.47 mg/L, Median Absolute Error (MedAE) of 2.04 mg/L, and Mean Squared Logarithmic Error (MSLE) of 0.198 mg/L, outperforming models such as Light Gradient Boosting Machine (LGBM), Ramdom Forest (RF), and Extreme Gradient Boosting (XGBoost). Stratified analysis showed it performed well for low-to-medium SSC (≤30 mg/L) but had limited accuracy for high SSC (>30 mg/L). SHapley Additive exPlanations (SHAP) analysis revealed that significant wave height (Hs) and surface current speed (SCS) were the dominant drivers, with Hs exerting the most substantial influence. Both factors exhibited a pronounced positive regulatory effect on SSC. Further tests on variable combinations indicated that the simplified input mode (Hs + SCS) alone was sufficient to achieve accurate SSC predictions, with no significant improvement from adding more variables. This study demonstrates the effectiveness of CatBoost in SSC prediction and highlights key influencing factors via SHAP, providing a robust framework for precise SSC forecasting in estuarine environments.
期刊介绍:
The Journal of Sea Research is an international and multidisciplinary periodical on marine research, with an emphasis on the functioning of marine ecosystems in coastal and shelf seas, including intertidal, estuarine and brackish environments. As several subdisciplines add to this aim, manuscripts are welcome from the fields of marine biology, marine chemistry, marine sedimentology and physical oceanography, provided they add to the understanding of ecosystem processes.