Identification of key feature variables and prediction of harmful algal blooms in a water diversion lake based on interpretable machine learning

IF 7.7 2区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Yundong Wu , Bo Xian , Xiaowei Xiang , Fang Fang , Fuhao Chu , Xingkang Deng , Qing Hu , Xiuqiong Sun , Wei Tang , Shaopan Bao , Genbao Li , Tao Fang
{"title":"Identification of key feature variables and prediction of harmful algal blooms in a water diversion lake based on interpretable machine learning","authors":"Yundong Wu ,&nbsp;Bo Xian ,&nbsp;Xiaowei Xiang ,&nbsp;Fang Fang ,&nbsp;Fuhao Chu ,&nbsp;Xingkang Deng ,&nbsp;Qing Hu ,&nbsp;Xiuqiong Sun ,&nbsp;Wei Tang ,&nbsp;Shaopan Bao ,&nbsp;Genbao Li ,&nbsp;Tao Fang","doi":"10.1016/j.envres.2025.121491","DOIUrl":null,"url":null,"abstract":"<div><div>Harmful algal blooms (HABs) as an increasing environmental problem in lakes, and water diversion has become a common and effective strategy for mitigating HABs. Early and accurate identification of the occurrence of HABs in lakes is essential for scientific guidance of water diversion. Furthermore, the inevitable changes of hydrodynamic and water environment in the receiving area during water diversion make it more challenging to identify the important environmental features of HABs. Therefore, we constructed a machine learning modelling framework suitable for predicting HABs with favorable performance in both non-water diversion and water diversion states. In this study, we collected data from three monitoring sites for the years 2008–2020 (non-water diversion period from 2008 to 2013 and water diversion period from 2014 to 2020) as external validations and six sampling sites for the years 2021–2022 (2021 non-water diversion period and 2022 water diversion period) as internal validation. The CatBoost (AUC = 0.948) model fared best performance was obtained by comparing 10 machine learning models for comprehensive HABs prediction analyses in the external cohorts of Yilong Lake, and the 24 features were reduced to obtain the 8 (Including TP, TN and COD<sub>Cr</sub>, etc.) most important environmental features. In addition, the SHapley Additive explanation (SHAP) method was used to interpret this CatBoost model through a global interpretation that describes the whole features of the model and a local interpretation that details how a certain forecast of HABs is made for a single sample via inputting the individual data. The CatBoost interpretable model also performed well in internal validation and the model has been converted into a convenient application for use by the Bureau of Yilong Lake Administration personnel and researchers. Finally, the results of the PLS-PM explains that water diversion indirectly mitigates HABs mainly through diluting nutrient concentrations. Overall, the final model of this study has a good performance and application benefits in predicting HABs during the non-water diversion period and water diversion period of Yilong Lake, which provides a guideline for water diversion. Furthermore, this study also provides a reference for other similar eutrophic lake water diversion strategies.</div></div>","PeriodicalId":312,"journal":{"name":"Environmental Research","volume":"276 ","pages":"Article 121491"},"PeriodicalIF":7.7000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001393512500742X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Harmful algal blooms (HABs) as an increasing environmental problem in lakes, and water diversion has become a common and effective strategy for mitigating HABs. Early and accurate identification of the occurrence of HABs in lakes is essential for scientific guidance of water diversion. Furthermore, the inevitable changes of hydrodynamic and water environment in the receiving area during water diversion make it more challenging to identify the important environmental features of HABs. Therefore, we constructed a machine learning modelling framework suitable for predicting HABs with favorable performance in both non-water diversion and water diversion states. In this study, we collected data from three monitoring sites for the years 2008–2020 (non-water diversion period from 2008 to 2013 and water diversion period from 2014 to 2020) as external validations and six sampling sites for the years 2021–2022 (2021 non-water diversion period and 2022 water diversion period) as internal validation. The CatBoost (AUC = 0.948) model fared best performance was obtained by comparing 10 machine learning models for comprehensive HABs prediction analyses in the external cohorts of Yilong Lake, and the 24 features were reduced to obtain the 8 (Including TP, TN and CODCr, etc.) most important environmental features. In addition, the SHapley Additive explanation (SHAP) method was used to interpret this CatBoost model through a global interpretation that describes the whole features of the model and a local interpretation that details how a certain forecast of HABs is made for a single sample via inputting the individual data. The CatBoost interpretable model also performed well in internal validation and the model has been converted into a convenient application for use by the Bureau of Yilong Lake Administration personnel and researchers. Finally, the results of the PLS-PM explains that water diversion indirectly mitigates HABs mainly through diluting nutrient concentrations. Overall, the final model of this study has a good performance and application benefits in predicting HABs during the non-water diversion period and water diversion period of Yilong Lake, which provides a guideline for water diversion. Furthermore, this study also provides a reference for other similar eutrophic lake water diversion strategies.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Environmental Research
Environmental Research 环境科学-公共卫生、环境卫生与职业卫生
CiteScore
12.60
自引率
8.40%
发文量
2480
审稿时长
4.7 months
期刊介绍: The Environmental Research journal presents a broad range of interdisciplinary research, focused on addressing worldwide environmental concerns and featuring innovative findings. Our publication strives to explore relevant anthropogenic issues across various environmental sectors, showcasing practical applications in real-life settings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信