基于gan的多变量不完全数据的无监督异常检测:比较研究

Kisan Sarda, A. Yerudkar, C. D. Vecchio
{"title":"基于gan的多变量不完全数据的无监督异常检测:比较研究","authors":"Kisan Sarda, A. Yerudkar, C. D. Vecchio","doi":"10.1109/MED59994.2023.10185791","DOIUrl":null,"url":null,"abstract":"With the increasing interconnectivity of cyber-physical systems (CPSs) in various fields, such as manufacturing plants, power plants, and smart networked systems, large amounts of multivariate data are generated through sensors and actuators, also other data sources such as measurements and images. This paper focuses on the anomaly detection (AD) problem, also known as fault detection or outlier detection, depending on the type of dataset, which involves identifying anomalous values in the dataset using analytical methods. However, datasets often contain missing values, which can lead to incorrect outcomes and affect the availability of anomalous samples that are fewer in amount, making incomplete datasets. Therefore, a generalized AD method is proposed for incomplete datasets, which involves two steps: data imputation (DI) to obtain complete datasets using GAN and later AD for the complete datasets. While statistical-based imputation methods are commonly used, they do not consider data distribution for datasets with anomalous samples. The capabilities of GANbased DI are tested under different hyperparameter settings and percentages of missing values. The AD problem is then addressed using seven unsupervised anomaly detection methods on six different datasets, including a real dataset from a steel manufacturing plant in Italy. Each dataset is analyzed to determine which DI and AD method combination performs the best. The results show that GAN-imputed data provides the best DI performance, while the reweighted minimum covariance determinant (RMCD) method offers the overall best AD results combined with GAN.","PeriodicalId":270226,"journal":{"name":"2023 31st Mediterranean Conference on Control and Automation (MED)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Anomaly Detection for Multivariate Incomplete Data using GAN-based Data Imputation: A Comparative Study\",\"authors\":\"Kisan Sarda, A. Yerudkar, C. D. Vecchio\",\"doi\":\"10.1109/MED59994.2023.10185791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing interconnectivity of cyber-physical systems (CPSs) in various fields, such as manufacturing plants, power plants, and smart networked systems, large amounts of multivariate data are generated through sensors and actuators, also other data sources such as measurements and images. This paper focuses on the anomaly detection (AD) problem, also known as fault detection or outlier detection, depending on the type of dataset, which involves identifying anomalous values in the dataset using analytical methods. However, datasets often contain missing values, which can lead to incorrect outcomes and affect the availability of anomalous samples that are fewer in amount, making incomplete datasets. Therefore, a generalized AD method is proposed for incomplete datasets, which involves two steps: data imputation (DI) to obtain complete datasets using GAN and later AD for the complete datasets. While statistical-based imputation methods are commonly used, they do not consider data distribution for datasets with anomalous samples. The capabilities of GANbased DI are tested under different hyperparameter settings and percentages of missing values. The AD problem is then addressed using seven unsupervised anomaly detection methods on six different datasets, including a real dataset from a steel manufacturing plant in Italy. Each dataset is analyzed to determine which DI and AD method combination performs the best. The results show that GAN-imputed data provides the best DI performance, while the reweighted minimum covariance determinant (RMCD) method offers the overall best AD results combined with GAN.\",\"PeriodicalId\":270226,\"journal\":{\"name\":\"2023 31st Mediterranean Conference on Control and Automation (MED)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 31st Mediterranean Conference on Control and Automation (MED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MED59994.2023.10185791\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Mediterranean Conference on Control and Automation (MED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MED59994.2023.10185791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着网络物理系统(cps)在制造工厂、发电厂和智能网络系统等各个领域的互联性日益增强,通过传感器和执行器以及测量和图像等其他数据源产生了大量的多元数据。本文的重点是异常检测(AD)问题,也称为故障检测或异常点检测,取决于数据集的类型,它涉及到使用分析方法识别数据集中的异常值。然而,数据集经常包含缺失值,这可能导致不正确的结果,并影响数量较少的异常样本的可用性,使数据集不完整。因此,本文提出了一种针对不完整数据集的广义AD方法,该方法包括两个步骤:首先是数据插入(DI),首先使用GAN获取完整数据集,然后再使用AD获取完整数据集。虽然通常使用基于统计的插值方法,但它们没有考虑异常样本数据集的数据分布。在不同的超参数设置和缺失值百分比下,测试了基于gan的DI的性能。然后在6个不同的数据集上使用7种无监督异常检测方法来解决AD问题,其中包括来自意大利一家钢铁制造厂的真实数据集。对每个数据集进行分析,以确定哪种DI和AD方法组合表现最佳。结果表明,GAN估算的数据提供了最佳的数据分割性能,而重新加权的最小协方差决定(RMCD)方法结合GAN提供了最佳的数据分割效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Unsupervised Anomaly Detection for Multivariate Incomplete Data using GAN-based Data Imputation: A Comparative Study
With the increasing interconnectivity of cyber-physical systems (CPSs) in various fields, such as manufacturing plants, power plants, and smart networked systems, large amounts of multivariate data are generated through sensors and actuators, also other data sources such as measurements and images. This paper focuses on the anomaly detection (AD) problem, also known as fault detection or outlier detection, depending on the type of dataset, which involves identifying anomalous values in the dataset using analytical methods. However, datasets often contain missing values, which can lead to incorrect outcomes and affect the availability of anomalous samples that are fewer in amount, making incomplete datasets. Therefore, a generalized AD method is proposed for incomplete datasets, which involves two steps: data imputation (DI) to obtain complete datasets using GAN and later AD for the complete datasets. While statistical-based imputation methods are commonly used, they do not consider data distribution for datasets with anomalous samples. The capabilities of GANbased DI are tested under different hyperparameter settings and percentages of missing values. The AD problem is then addressed using seven unsupervised anomaly detection methods on six different datasets, including a real dataset from a steel manufacturing plant in Italy. Each dataset is analyzed to determine which DI and AD method combination performs the best. The results show that GAN-imputed data provides the best DI performance, while the reweighted minimum covariance determinant (RMCD) method offers the overall best AD results combined with GAN.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信