Spatiotemporal dynamics of cyanobacterial blooms: Integrating machine learning and feature selection techniques with uncrewed aircraft systems and autonomous surface vessel data
Mohammed Shakiul Islam , Padmanava Dash , John P. Liles , Hafez Ahmad , Abduselam M. Nur , Rajendra M. Panda , Jessica S. Wolfe , Gray Turnage , Lee Hathcock , Gary D. Chesser , Robert J. Moorhead
{"title":"Spatiotemporal dynamics of cyanobacterial blooms: Integrating machine learning and feature selection techniques with uncrewed aircraft systems and autonomous surface vessel data","authors":"Mohammed Shakiul Islam , Padmanava Dash , John P. Liles , Hafez Ahmad , Abduselam M. Nur , Rajendra M. Panda , Jessica S. Wolfe , Gray Turnage , Lee Hathcock , Gary D. Chesser , Robert J. Moorhead","doi":"10.1016/j.jenvman.2025.124878","DOIUrl":null,"url":null,"abstract":"<div><div>Cyanobacterial blooms pose significant threats to aquatic ecosystems and public health due to their ability to release harmful toxins, degrade water quality, disrupt aquatic habitats, and endanger human and animal health through contact or consumption of contaminated water. Monitoring phycocyanin (PC), a pigment unique to cyanobacteria, offers a reliable method for detecting and quantifying these blooms, enabling timely interventions to mitigate their impacts. This study aimed to evaluate ten machine learning algorithms (MLAs) for assessing the spatiotemporal variations of cyanobacterial concentrations over an oyster reef in the Western Mississippi Sound (WMS) using remotely sensed imagery from uncrewed aircraft systems (UAS) and in-situ PC concentrations measured by an autonomous surface vessel (ASV). The study further investigated the influence of river discharge and climatic variables on cyanobacterial concentrations using a time-series of cyanobacteria maps. To derive the most accurate PC retrieval model, a comprehensive set of 85 features was initially generated, including individual spectral bands, band ratios, multiple vegetation indices, and three-band indices. Feature selection was performed using a two-step approach that combined Sequential Backward Floating Selection (SBFS) and Exhaustive Feature Selection (EFS). SBFS was first used to iteratively remove features and optimize model performance, while EFS evaluated all possible combinations of the features identified by SBFS to select the best subset. Among the ten MLAs tested, Extreme Gradient Boosting emerged as the top-performing model, achieving an R<sup>2</sup> of 0.835, a root mean square deviation of 0.419 μg/l, an unbiased mean absolute relative difference of 0.176 μg/l, and an average percentage difference of 18.072 % in retrieving PC concentration. The novelty of this study lies in its data-driven approach to identifying the most suitable machine learning algorithm and feature subsets for PC retrieval, thereby enhancing the accuracy and robustness of the developed algorithm. The time-series analysis revealed substantial variations in cyanobacterial concentration in the WMS from 2018 to 2022. The highest average concentration occurred in 2019, coinciding with the introduction of diverted Mississippi River water through the Bonnet Carré Spillway, which triggered an unprecedented cyanobacterial bloom. Furthermore, the average PC concentration was consistently higher during the summer months, likely due to elevated air temperatures and increased sunlight promoting cyanobacterial growth. The methodology developed in this study improves the quantitative monitoring of cyanobacterial blooms using UAS imagery and provides valuable insights for future water quality monitoring initiatives in other regions.</div></div>","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"381 ","pages":"Article 124878"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0301479725008540","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Cyanobacterial blooms pose significant threats to aquatic ecosystems and public health due to their ability to release harmful toxins, degrade water quality, disrupt aquatic habitats, and endanger human and animal health through contact or consumption of contaminated water. Monitoring phycocyanin (PC), a pigment unique to cyanobacteria, offers a reliable method for detecting and quantifying these blooms, enabling timely interventions to mitigate their impacts. This study aimed to evaluate ten machine learning algorithms (MLAs) for assessing the spatiotemporal variations of cyanobacterial concentrations over an oyster reef in the Western Mississippi Sound (WMS) using remotely sensed imagery from uncrewed aircraft systems (UAS) and in-situ PC concentrations measured by an autonomous surface vessel (ASV). The study further investigated the influence of river discharge and climatic variables on cyanobacterial concentrations using a time-series of cyanobacteria maps. To derive the most accurate PC retrieval model, a comprehensive set of 85 features was initially generated, including individual spectral bands, band ratios, multiple vegetation indices, and three-band indices. Feature selection was performed using a two-step approach that combined Sequential Backward Floating Selection (SBFS) and Exhaustive Feature Selection (EFS). SBFS was first used to iteratively remove features and optimize model performance, while EFS evaluated all possible combinations of the features identified by SBFS to select the best subset. Among the ten MLAs tested, Extreme Gradient Boosting emerged as the top-performing model, achieving an R2 of 0.835, a root mean square deviation of 0.419 μg/l, an unbiased mean absolute relative difference of 0.176 μg/l, and an average percentage difference of 18.072 % in retrieving PC concentration. The novelty of this study lies in its data-driven approach to identifying the most suitable machine learning algorithm and feature subsets for PC retrieval, thereby enhancing the accuracy and robustness of the developed algorithm. The time-series analysis revealed substantial variations in cyanobacterial concentration in the WMS from 2018 to 2022. The highest average concentration occurred in 2019, coinciding with the introduction of diverted Mississippi River water through the Bonnet Carré Spillway, which triggered an unprecedented cyanobacterial bloom. Furthermore, the average PC concentration was consistently higher during the summer months, likely due to elevated air temperatures and increased sunlight promoting cyanobacterial growth. The methodology developed in this study improves the quantitative monitoring of cyanobacterial blooms using UAS imagery and provides valuable insights for future water quality monitoring initiatives in other regions.
期刊介绍:
The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.