Xurxo Rigueira , David Olivieri , Maria Araujo , Angeles Saavedra , Maria Pazo
{"title":"Multivariate functional data analysis and machine learning methods for anomaly detection in water quality sensor data","authors":"Xurxo Rigueira , David Olivieri , Maria Araujo , Angeles Saavedra , Maria Pazo","doi":"10.1016/j.envsoft.2025.106443","DOIUrl":null,"url":null,"abstract":"<div><div>Reliable anomaly detection is crucial for water resources management, but the complexity of environmental sensor data presents challenges, especially with limited labeled data in water quality analysis. Functional data has experienced significant growth in anomaly detection, but most applications focus on unlabeled datasets. This study assesses the performance of multivariate functional data analysis and compares it with current machine learning models for detecting water quality anomalies on 18 years of expert-annotated data from four monitoring stations along Spain’s Ebro River. We propose and validate a multivariate functional model incorporating a new amplitude metric and a nonparametric outlier detector (Multivariate Magnitude, Shape, and Amplitude–MMSA). Additionally, a Random Forest-based machine learning architecture was developed for the same purpose, employing sliding windows and data balancing techniques. The Random Forest model demonstrated the highest performance, achieving an average F1 score of 93%, while MMSA exhibited robustness in scenarios with limited anomalous data or labels.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"190 ","pages":"Article 106443"},"PeriodicalIF":4.8000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Modelling & Software","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1364815225001276","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Reliable anomaly detection is crucial for water resources management, but the complexity of environmental sensor data presents challenges, especially with limited labeled data in water quality analysis. Functional data has experienced significant growth in anomaly detection, but most applications focus on unlabeled datasets. This study assesses the performance of multivariate functional data analysis and compares it with current machine learning models for detecting water quality anomalies on 18 years of expert-annotated data from four monitoring stations along Spain’s Ebro River. We propose and validate a multivariate functional model incorporating a new amplitude metric and a nonparametric outlier detector (Multivariate Magnitude, Shape, and Amplitude–MMSA). Additionally, a Random Forest-based machine learning architecture was developed for the same purpose, employing sliding windows and data balancing techniques. The Random Forest model demonstrated the highest performance, achieving an average F1 score of 93%, while MMSA exhibited robustness in scenarios with limited anomalous data or labels.
期刊介绍:
Environmental Modelling & Software publishes contributions, in the form of research articles, reviews and short communications, on recent advances in environmental modelling and/or software. The aim is to improve our capacity to represent, understand, predict or manage the behaviour of environmental systems at all practical scales, and to communicate those improvements to a wide scientific and professional audience.