Chenyu Song , Jingyuan Cui , Yafei Cui , Sheng Zhang , Chang Wu , Xiaoyan Qin , Qiaofeng Wu , Shanqing Chi , Mingqing Yang , Jia Liu , Ruihong Chen , Haiping Zhang
{"title":"用于在线水文和水质监测数据清理的 STL-DBSCAN 集成算法","authors":"Chenyu Song , Jingyuan Cui , Yafei Cui , Sheng Zhang , Chang Wu , Xiaoyan Qin , Qiaofeng Wu , Shanqing Chi , Mingqing Yang , Jia Liu , Ruihong Chen , Haiping Zhang","doi":"10.1016/j.envsoft.2024.106262","DOIUrl":null,"url":null,"abstract":"<div><div>Online hydrological and water quality monitoring data has become increasingly crucial for water environment management such as assessment and modeling. However, online monitoring data often contains erroneous or incomplete datasets, consequently affecting its operational use. In the study, we developed an automated data cleaning algorithm grounded in Seasonal-Trend decomposition using Loess (STL) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). STL identifies and corrects more obvious anomalies in the time series, followed by DBSCAN for further refinement, in which the reverse nearest neighbor method was employed to enhance the clustering accuracy. To improve anomaly detection, a two-level residual judgment threshold was applied. The algorithm has been successfully applied to three reservoirs in Shanghai, China, achieving the precision rate of 0.91 and recall rate of 0.81 for dissolved oxygen and pH. The proposed algorithm can be potentially applied for cleaning of environment monitoring data with high accuracy and stability.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"183 ","pages":"Article 106262"},"PeriodicalIF":4.8000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated STL-DBSCAN algorithm for online hydrological and water quality monitoring data cleaning\",\"authors\":\"Chenyu Song , Jingyuan Cui , Yafei Cui , Sheng Zhang , Chang Wu , Xiaoyan Qin , Qiaofeng Wu , Shanqing Chi , Mingqing Yang , Jia Liu , Ruihong Chen , Haiping Zhang\",\"doi\":\"10.1016/j.envsoft.2024.106262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Online hydrological and water quality monitoring data has become increasingly crucial for water environment management such as assessment and modeling. However, online monitoring data often contains erroneous or incomplete datasets, consequently affecting its operational use. In the study, we developed an automated data cleaning algorithm grounded in Seasonal-Trend decomposition using Loess (STL) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). STL identifies and corrects more obvious anomalies in the time series, followed by DBSCAN for further refinement, in which the reverse nearest neighbor method was employed to enhance the clustering accuracy. To improve anomaly detection, a two-level residual judgment threshold was applied. The algorithm has been successfully applied to three reservoirs in Shanghai, China, achieving the precision rate of 0.91 and recall rate of 0.81 for dissolved oxygen and pH. The proposed algorithm can be potentially applied for cleaning of environment monitoring data with high accuracy and stability.</div></div>\",\"PeriodicalId\":310,\"journal\":{\"name\":\"Environmental Modelling & Software\",\"volume\":\"183 \",\"pages\":\"Article 106262\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Modelling & Software\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1364815224003232\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Modelling & Software","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1364815224003232","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Integrated STL-DBSCAN algorithm for online hydrological and water quality monitoring data cleaning
Online hydrological and water quality monitoring data has become increasingly crucial for water environment management such as assessment and modeling. However, online monitoring data often contains erroneous or incomplete datasets, consequently affecting its operational use. In the study, we developed an automated data cleaning algorithm grounded in Seasonal-Trend decomposition using Loess (STL) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). STL identifies and corrects more obvious anomalies in the time series, followed by DBSCAN for further refinement, in which the reverse nearest neighbor method was employed to enhance the clustering accuracy. To improve anomaly detection, a two-level residual judgment threshold was applied. The algorithm has been successfully applied to three reservoirs in Shanghai, China, achieving the precision rate of 0.91 and recall rate of 0.81 for dissolved oxygen and pH. The proposed algorithm can be potentially applied for cleaning of environment monitoring data with high accuracy and stability.
期刊介绍:
Environmental Modelling & Software publishes contributions, in the form of research articles, reviews and short communications, on recent advances in environmental modelling and/or software. The aim is to improve our capacity to represent, understand, predict or manage the behaviour of environmental systems at all practical scales, and to communicate those improvements to a wide scientific and professional audience.