可解释人工智能用于解释藻华估计中的集合学习性能。

IF 2.5 4区 环境科学与生态学 Q3 ENGINEERING, ENVIRONMENTAL
Jungsu Park, Byeongchan Seong, Yeonjeong Park, Woo Hyoung Lee, Tae-Young Heo
{"title":"可解释人工智能用于解释藻华估计中的集合学习性能。","authors":"Jungsu Park, Byeongchan Seong, Yeonjeong Park, Woo Hyoung Lee, Tae-Young Heo","doi":"10.1002/wer.11140","DOIUrl":null,"url":null,"abstract":"<p><p>Chlorophyll-a (Chl-a) concentrations, a key indicator of algal blooms, were estimated using the XGBoost machine learning model with 23 variables, including water quality and meteorological factors. The model performance was evaluated using three indices: root mean square error (RMSE), RMSE-observation standard deviation ratio (RSR), and Nash-Sutcliffe efficiency. Nine datasets were created by averaging 1 hour data to cover time frequencies ranging from 1 hour to 1 month. The dataset with relatively high observation frequencies (1-24 h) maintained stability, with an RSR ranging between 0.61 and 0.65. However, the model's performance declined significantly for datasets with weekly and monthly intervals. The Shapley value (SHAP) analysis, an explainable artificial intelligence method, was further applied to provide a quantitative understanding of how environmental factors in the watershed impact the model's performance and is also utilized to enhance the practical applicability of the model in the field. The number of input variables for model construction increased sequentially from 1 to 23, starting from the variable with the highest SHAP value to that with the lowest. The model's performance plateaued after considering five or more variables, demonstrating that stable performance could be achieved using only a small number of variables, including relatively easily measured data collected by real-time sensors, such as pH, dissolved oxygen, and turbidity. This result highlights the practicality of employing machine learning models and real-time sensor-based measurements for effective on-site water quality management. PRACTITIONER POINTS: XAI quantifies the effects of environmental factors on algal bloom prediction models The effects of input variable frequency and seasonality were analyzed using XAI XAI analysis on key variables ensures cost-effective model development.</p>","PeriodicalId":23621,"journal":{"name":"Water Environment Research","volume":"96 10","pages":"e11140"},"PeriodicalIF":2.5000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explainable artificial intelligence for the interpretation of ensemble learning performance in algal bloom estimation.\",\"authors\":\"Jungsu Park, Byeongchan Seong, Yeonjeong Park, Woo Hyoung Lee, Tae-Young Heo\",\"doi\":\"10.1002/wer.11140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Chlorophyll-a (Chl-a) concentrations, a key indicator of algal blooms, were estimated using the XGBoost machine learning model with 23 variables, including water quality and meteorological factors. The model performance was evaluated using three indices: root mean square error (RMSE), RMSE-observation standard deviation ratio (RSR), and Nash-Sutcliffe efficiency. Nine datasets were created by averaging 1 hour data to cover time frequencies ranging from 1 hour to 1 month. The dataset with relatively high observation frequencies (1-24 h) maintained stability, with an RSR ranging between 0.61 and 0.65. However, the model's performance declined significantly for datasets with weekly and monthly intervals. The Shapley value (SHAP) analysis, an explainable artificial intelligence method, was further applied to provide a quantitative understanding of how environmental factors in the watershed impact the model's performance and is also utilized to enhance the practical applicability of the model in the field. The number of input variables for model construction increased sequentially from 1 to 23, starting from the variable with the highest SHAP value to that with the lowest. The model's performance plateaued after considering five or more variables, demonstrating that stable performance could be achieved using only a small number of variables, including relatively easily measured data collected by real-time sensors, such as pH, dissolved oxygen, and turbidity. This result highlights the practicality of employing machine learning models and real-time sensor-based measurements for effective on-site water quality management. PRACTITIONER POINTS: XAI quantifies the effects of environmental factors on algal bloom prediction models The effects of input variable frequency and seasonality were analyzed using XAI XAI analysis on key variables ensures cost-effective model development.</p>\",\"PeriodicalId\":23621,\"journal\":{\"name\":\"Water Environment Research\",\"volume\":\"96 10\",\"pages\":\"e11140\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Water Environment Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1002/wer.11140\",\"RegionNum\":4,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Environment Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1002/wer.11140","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

叶绿素-a(Chl-a)浓度是藻类大量繁殖的一个关键指标,该浓度是利用 XGBoost 机器学习模型估算的,该模型包含 23 个变量,其中包括水质和气象因素。模型性能采用三个指标进行评估:均方根误差(RMSE)、均方根误差-观测标准偏差比(RSR)和纳什-苏特克利夫效率。通过平均 1 小时的数据创建了 9 个数据集,时间频率从 1 小时到 1 个月不等。观测频率相对较高的数据集(1-24 小时)保持了稳定性,RSR 在 0.61 和 0.65 之间。然而,对于每周和每月间隔的数据集,模型的性能明显下降。沙普利值(SHAP)分析是一种可解释的人工智能方法,它的进一步应用提供了对流域环境因素如何影响模型性能的定量理解,同时也用于提高模型在现场的实际应用性。从 SHAP 值最高的变量到 SHAP 值最低的变量,构建模型的输入变量数量从 1 个依次增加到 23 个。在考虑了 5 个或更多变量后,模型的性能趋于稳定,这表明只需使用少量变量,包括 pH 值、溶解氧和浊度等实时传感器收集的相对容易测量的数据,就能实现稳定的性能。这一结果凸显了采用机器学习模型和基于传感器的实时测量来进行有效现场水质管理的实用性。实践点:XAI 量化了环境因素对藻华预测模型的影响 利用 XAI 分析了输入变量频率和季节性的影响,对关键变量的分析确保了模型开发的成本效益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Explainable artificial intelligence for the interpretation of ensemble learning performance in algal bloom estimation.

Chlorophyll-a (Chl-a) concentrations, a key indicator of algal blooms, were estimated using the XGBoost machine learning model with 23 variables, including water quality and meteorological factors. The model performance was evaluated using three indices: root mean square error (RMSE), RMSE-observation standard deviation ratio (RSR), and Nash-Sutcliffe efficiency. Nine datasets were created by averaging 1 hour data to cover time frequencies ranging from 1 hour to 1 month. The dataset with relatively high observation frequencies (1-24 h) maintained stability, with an RSR ranging between 0.61 and 0.65. However, the model's performance declined significantly for datasets with weekly and monthly intervals. The Shapley value (SHAP) analysis, an explainable artificial intelligence method, was further applied to provide a quantitative understanding of how environmental factors in the watershed impact the model's performance and is also utilized to enhance the practical applicability of the model in the field. The number of input variables for model construction increased sequentially from 1 to 23, starting from the variable with the highest SHAP value to that with the lowest. The model's performance plateaued after considering five or more variables, demonstrating that stable performance could be achieved using only a small number of variables, including relatively easily measured data collected by real-time sensors, such as pH, dissolved oxygen, and turbidity. This result highlights the practicality of employing machine learning models and real-time sensor-based measurements for effective on-site water quality management. PRACTITIONER POINTS: XAI quantifies the effects of environmental factors on algal bloom prediction models The effects of input variable frequency and seasonality were analyzed using XAI XAI analysis on key variables ensures cost-effective model development.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Water Environment Research
Water Environment Research 环境科学-工程:环境
CiteScore
6.30
自引率
0.00%
发文量
138
审稿时长
11 months
期刊介绍: Published since 1928, Water Environment Research (WER) is an international multidisciplinary water resource management journal for the dissemination of fundamental and applied research in all scientific and technical areas related to water quality and resource recovery. WER''s goal is to foster communication and interdisciplinary research between water sciences and related fields such as environmental toxicology, agriculture, public and occupational health, microbiology, and ecology. In addition to original research articles, short communications, case studies, reviews, and perspectives are encouraged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信