Machine Learning Approach towards the Early Warning of Cyanobacterial Blooms in Drinking Water Reservoirs

Claudia Fournier, A. Quesada, A. Justel, A. Monteoliva, Jordi Cirera, C. Sola, A. Munné, Juan C. García, José Javier Rodríguez, S. Cirés
{"title":"Machine Learning Approach towards the Early Warning of Cyanobacterial Blooms in Drinking Water Reservoirs","authors":"Claudia Fournier, A. Quesada, A. Justel, A. Monteoliva, Jordi Cirera, C. Sola, A. Munné, Juan C. García, José Javier Rodríguez, S. Cirés","doi":"10.3390/blsf2022014038","DOIUrl":null,"url":null,"abstract":": Cyanobacterial harmful algal blooms (CyanoHABs) are expanding globally, representing a major risk for lakes and reservoirs due to their toxicity and economic impacts. Therefore, antic-ipating their occurrence and understanding the main factors related to CyanoHABs are critical to improve decision-making processes and water resource management. In this context, we present two modelling options for the analysis and prediction of cyanoHABs in two drinking water reservoirs from Spain. This case represents a unique opportunity to combine efforts from different academic disciplines (i.e., aquatic ecology and data science), environmental companies, and public water managers to address this increasingly severe issue. Susqueda (Ter basin, Catalonia) is a eutrophic, large and deep reservoir (Z max = 110 m) where monitoring efforts in recent years have focused on a monthly measurement in more than 30 physico-chemical, hydrological, meteorological and biological parameters, some of them involving expert intervention and costly efforts that could not be held at a higher temporary frequency. Cuerda del Pozo (Duero basin, Castilla y Le ó n) is a deep reservoir (Z max = 30 m) where monitoring efforts have focused on daily data collection through probes mounted in automatic profilers. This strategy allowed a higher monitoring frequency for fewer parameters and a narrower time span. In both cases, the parameter chosen as a proxy of cyanobacterial proliferation (output of models) is fluorometric measurements of chlorophyll-a and phycocyanin. The results of our machine-learning-based analyses suggest that the selected modelling path mainly depends on two aspects: (1) the time span where data are collected, and (2) the frequency and type of data measured (i.e., one discrete measurement at the surface vs. many measurements along the water column). Thus, a Susqueda dataset analysis led to more interpretative results, al-lowing for a better understanding of the system and the main factors related to CyanoHABs with limited predictive capacity. Meanwhile, the Cuerda del Pozo dataset is treated as a time series where autoregressive forecasting techniques, combined with information of exogenous parameters, are applied to foresee cyanobacterial blooms before they occur, losing part of the interpretability in the process. The results from this work are expected to provide an effective tool to boost smart and goal-orientated sampling planning, while improving data-driven decision-making processes essential for the water management of cyanobacterial blooms.","PeriodicalId":198127,"journal":{"name":"The 7th Iberian Congress on Cyanotoxins/3rd Iberoamerican Congress on Cyanotoxins","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 7th Iberian Congress on Cyanotoxins/3rd Iberoamerican Congress on Cyanotoxins","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/blsf2022014038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

: Cyanobacterial harmful algal blooms (CyanoHABs) are expanding globally, representing a major risk for lakes and reservoirs due to their toxicity and economic impacts. Therefore, antic-ipating their occurrence and understanding the main factors related to CyanoHABs are critical to improve decision-making processes and water resource management. In this context, we present two modelling options for the analysis and prediction of cyanoHABs in two drinking water reservoirs from Spain. This case represents a unique opportunity to combine efforts from different academic disciplines (i.e., aquatic ecology and data science), environmental companies, and public water managers to address this increasingly severe issue. Susqueda (Ter basin, Catalonia) is a eutrophic, large and deep reservoir (Z max = 110 m) where monitoring efforts in recent years have focused on a monthly measurement in more than 30 physico-chemical, hydrological, meteorological and biological parameters, some of them involving expert intervention and costly efforts that could not be held at a higher temporary frequency. Cuerda del Pozo (Duero basin, Castilla y Le ó n) is a deep reservoir (Z max = 30 m) where monitoring efforts have focused on daily data collection through probes mounted in automatic profilers. This strategy allowed a higher monitoring frequency for fewer parameters and a narrower time span. In both cases, the parameter chosen as a proxy of cyanobacterial proliferation (output of models) is fluorometric measurements of chlorophyll-a and phycocyanin. The results of our machine-learning-based analyses suggest that the selected modelling path mainly depends on two aspects: (1) the time span where data are collected, and (2) the frequency and type of data measured (i.e., one discrete measurement at the surface vs. many measurements along the water column). Thus, a Susqueda dataset analysis led to more interpretative results, al-lowing for a better understanding of the system and the main factors related to CyanoHABs with limited predictive capacity. Meanwhile, the Cuerda del Pozo dataset is treated as a time series where autoregressive forecasting techniques, combined with information of exogenous parameters, are applied to foresee cyanobacterial blooms before they occur, losing part of the interpretability in the process. The results from this work are expected to provide an effective tool to boost smart and goal-orientated sampling planning, while improving data-driven decision-making processes essential for the water management of cyanobacterial blooms.
饮用水水库蓝藻水华预警的机器学习方法
蓝藻有害藻华(CyanoHABs)正在全球范围内扩大,由于其毒性和经济影响,对湖泊和水库构成了重大风险。因此,预测其发生并了解与蓝藻有害藻华有关的主要因素对改善决策过程和水资源管理至关重要。在这种情况下,我们提出了两种建模方案,用于分析和预测西班牙两个饮用水水库中的蓝藻有害藻华。这个案例提供了一个独特的机会,可以将不同学科(即水生生态学和数据科学)、环境公司和公共水管理人员的努力结合起来,解决这一日益严重的问题。Susqueda (Ter basin, Catalonia)是一个富营养化、大而深的水库(zmax = 110 m),近年来的监测工作主要集中在每月测量30多种物理化学、水文、气象和生物参数,其中一些涉及专家干预和昂贵的努力,无法以更高的临时频率进行。Cuerda del Pozo (Castilla y Le Duero basin, ó n)是一个深层油藏(最大水深30米),监测工作主要集中在通过安装在自动剖面仪上的探针收集日常数据。该策略允许对更少的参数和更短的时间跨度进行更高的监测频率。在这两种情况下,选择作为蓝藻增殖代理的参数(模型的输出)是叶绿素-a和藻蓝蛋白的荧光测量。我们基于机器学习的分析结果表明,所选择的建模路径主要取决于两个方面:(1)收集数据的时间跨度,(2)测量数据的频率和类型(即,在地表进行一次离散测量与沿着水柱进行多次测量)。因此,通过对Susqueda数据集的分析,可以获得更具解释性的结果,从而更好地了解该系统以及与蓝藻藻华相关的主要因素,但预测能力有限。同时,Cuerda del Pozo数据集被视为一个时间序列,其中自回归预测技术结合外源参数信息,应用于在蓝藻繁殖发生之前预测,在此过程中失去了部分可解释性。这项工作的结果有望提供一个有效的工具,以促进智能和目标导向的采样计划,同时改善数据驱动的决策过程,对蓝藻华的水管理至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信