Using Machine Learning for Dependable Outlier Detection in Environmental Monitoring Systems

IF 2.9 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

ACM Transactions on Cyber-Physical Systems Pub Date : 2021-07-01 DOI:10.1145/3445812

Gonçalo Jesus, A. Casimiro, Anabela Oliveira

{"title":"Using Machine Learning for Dependable Outlier Detection in Environmental Monitoring Systems","authors":"Gonçalo Jesus, A. Casimiro, Anabela Oliveira","doi":"10.1145/3445812","DOIUrl":null,"url":null,"abstract":"Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE (ANN Outlier Detection). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"5 1","pages":"1 - 30"},"PeriodicalIF":2.9000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3445812","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3445812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 4

Abstract

Sensor platforms used in environmental monitoring applications are often subject to harsh environmental conditions while monitoring complex phenomena. Therefore, designing dependable monitoring systems is challenging given the external disturbances affecting sensor measurements. Even the apparently simple task of outlier detection in sensor data becomes a hard problem, amplified by the difficulty in distinguishing true data errors due to sensor faults from deviations due to natural phenomenon, which look like data errors. Existing solutions for runtime outlier detection typically assume that the physical processes can be accurately modeled, or that outliers consist in large deviations that are easily detected and filtered by appropriate thresholds. Other solutions assume that it is possible to deploy multiple sensors providing redundant data to support voting-based techniques. In this article, we propose a new methodology for dependable runtime detection of outliers in environmental monitoring systems, aiming to increase data quality by treating them. We propose the use of machine learning techniques to model each sensor behavior, exploiting the existence of correlated data provided by other related sensors. Using these models, along with knowledge of processed past measurements, it is possible to obtain accurate estimations of the observed environment parameters and build failure detectors that use these estimations. When a failure is detected, these estimations also allow one to correct the erroneous measurements and hence improve the overall data quality. Our methodology not only allows one to distinguish truly abnormal measurements from deviations due to complex natural phenomena, but also allows the quantification of each measurement quality, which is relevant from a dependability perspective. We apply the methodology to real datasets from a complex aquatic monitoring system, measuring temperature and salinity parameters, through which we illustrate the process for building the machine learning prediction models using a technique based on Artificial Neural Networks, denoted ANNODE (ANN Outlier Detection). From this application, we also observe the effectiveness of our ANNODE approach for accurate outlier detection in harsh environments. Then we validate these positive results by comparing ANNODE with state-of-the-art solutions for outlier detection. The results show that ANNODE improves existing solutions regarding accuracy of outlier detection.

查看原文本刊更多论文

在环境监测系统中使用机器学习进行可靠的离群点检测

环境监测应用中使用的传感器平台在监测复杂现象时经常受到恶劣的环境条件的影响。因此，考虑到影响传感器测量的外部干扰，设计可靠的监测系统具有挑战性。即使是在传感器数据中检测异常值这一看似简单的任务也成为了一个难题，因为难以区分由传感器故障引起的真实数据错误和由自然现象引起的偏差（看起来像数据错误），这一问题更加突出。现有的运行时异常值检测解决方案通常假设物理过程可以精确建模，或者异常值由大偏差组成，这些偏差很容易被适当的阈值检测和过滤。其他解决方案假设可以部署多个传感器，提供冗余数据以支持基于投票的技术。在本文中，我们提出了一种新的方法来可靠地检测环境监测系统中的异常值，旨在通过处理它们来提高数据质量。我们建议使用机器学习技术对每个传感器的行为进行建模，利用其他相关传感器提供的相关数据的存在。使用这些模型，以及处理过的过去测量的知识，可以获得对观测到的环境参数的准确估计，并构建使用这些估计的故障检测器。当检测到故障时，这些估计还允许校正错误的测量，从而提高整体数据质量。我们的方法不仅可以将真正的异常测量与复杂自然现象引起的偏差区分开来，还可以量化每种测量质量，这从可靠性的角度来看是相关的。我们将该方法应用于复杂水生监测系统的真实数据集，测量温度和盐度参数，通过该数据集，我们说明了使用基于人工神经网络的技术构建机器学习预测模型的过程，该技术表示为ANNODE（ANN异常值检测）。从这个应用程序中，我们还观察到了我们的ANNODE方法在恶劣环境中准确检测异常值的有效性。然后，我们通过将ANNODE与最先进的异常值检测解决方案进行比较来验证这些积极的结果。结果表明，ANNODE改进了现有的异常值检测精度的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Cyber-Physical Systems COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

5.70

自引率

4.30%

发文量