Yundong Wu , Bo Xian , Xiaowei Xiang , Fang Fang , Fuhao Chu , Xingkang Deng , Qing Hu , Xiuqiong Sun , Wei Tang , Shaopan Bao , Genbao Li , Tao Fang
{"title":"Identification of key feature variables and prediction of harmful algal blooms in a water diversion lake based on interpretable machine learning","authors":"Yundong Wu , Bo Xian , Xiaowei Xiang , Fang Fang , Fuhao Chu , Xingkang Deng , Qing Hu , Xiuqiong Sun , Wei Tang , Shaopan Bao , Genbao Li , Tao Fang","doi":"10.1016/j.envres.2025.121491","DOIUrl":null,"url":null,"abstract":"<div><div>Harmful algal blooms (HABs) as an increasing environmental problem in lakes, and water diversion has become a common and effective strategy for mitigating HABs. Early and accurate identification of the occurrence of HABs in lakes is essential for scientific guidance of water diversion. Furthermore, the inevitable changes of hydrodynamic and water environment in the receiving area during water diversion make it more challenging to identify the important environmental features of HABs. Therefore, we constructed a machine learning modelling framework suitable for predicting HABs with favorable performance in both non-water diversion and water diversion states. In this study, we collected data from three monitoring sites for the years 2008–2020 (non-water diversion period from 2008 to 2013 and water diversion period from 2014 to 2020) as external validations and six sampling sites for the years 2021–2022 (2021 non-water diversion period and 2022 water diversion period) as internal validation. The CatBoost (AUC = 0.948) model fared best performance was obtained by comparing 10 machine learning models for comprehensive HABs prediction analyses in the external cohorts of Yilong Lake, and the 24 features were reduced to obtain the 8 (Including TP, TN and COD<sub>Cr</sub>, etc.) most important environmental features. In addition, the SHapley Additive explanation (SHAP) method was used to interpret this CatBoost model through a global interpretation that describes the whole features of the model and a local interpretation that details how a certain forecast of HABs is made for a single sample via inputting the individual data. The CatBoost interpretable model also performed well in internal validation and the model has been converted into a convenient application for use by the Bureau of Yilong Lake Administration personnel and researchers. Finally, the results of the PLS-PM explains that water diversion indirectly mitigates HABs mainly through diluting nutrient concentrations. Overall, the final model of this study has a good performance and application benefits in predicting HABs during the non-water diversion period and water diversion period of Yilong Lake, which provides a guideline for water diversion. Furthermore, this study also provides a reference for other similar eutrophic lake water diversion strategies.</div></div>","PeriodicalId":312,"journal":{"name":"Environmental Research","volume":"276 ","pages":"Article 121491"},"PeriodicalIF":7.7000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001393512500742X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Harmful algal blooms (HABs) as an increasing environmental problem in lakes, and water diversion has become a common and effective strategy for mitigating HABs. Early and accurate identification of the occurrence of HABs in lakes is essential for scientific guidance of water diversion. Furthermore, the inevitable changes of hydrodynamic and water environment in the receiving area during water diversion make it more challenging to identify the important environmental features of HABs. Therefore, we constructed a machine learning modelling framework suitable for predicting HABs with favorable performance in both non-water diversion and water diversion states. In this study, we collected data from three monitoring sites for the years 2008–2020 (non-water diversion period from 2008 to 2013 and water diversion period from 2014 to 2020) as external validations and six sampling sites for the years 2021–2022 (2021 non-water diversion period and 2022 water diversion period) as internal validation. The CatBoost (AUC = 0.948) model fared best performance was obtained by comparing 10 machine learning models for comprehensive HABs prediction analyses in the external cohorts of Yilong Lake, and the 24 features were reduced to obtain the 8 (Including TP, TN and CODCr, etc.) most important environmental features. In addition, the SHapley Additive explanation (SHAP) method was used to interpret this CatBoost model through a global interpretation that describes the whole features of the model and a local interpretation that details how a certain forecast of HABs is made for a single sample via inputting the individual data. The CatBoost interpretable model also performed well in internal validation and the model has been converted into a convenient application for use by the Bureau of Yilong Lake Administration personnel and researchers. Finally, the results of the PLS-PM explains that water diversion indirectly mitigates HABs mainly through diluting nutrient concentrations. Overall, the final model of this study has a good performance and application benefits in predicting HABs during the non-water diversion period and water diversion period of Yilong Lake, which provides a guideline for water diversion. Furthermore, this study also provides a reference for other similar eutrophic lake water diversion strategies.
期刊介绍:
The Environmental Research journal presents a broad range of interdisciplinary research, focused on addressing worldwide environmental concerns and featuring innovative findings. Our publication strives to explore relevant anthropogenic issues across various environmental sectors, showcasing practical applications in real-life settings.