Shuyu Li, Wen Chen, Kaiyan Xing, Hongchao Wang, Yilin Zhang, Ming Kang
{"title":"MGAN-LD: A sparse label propagation-based anomaly detection approach using multi-generative adversarial networks","authors":"Shuyu Li, Wen Chen, Kaiyan Xing, Hongchao Wang, Yilin Zhang, Ming Kang","doi":"10.1016/j.knosys.2025.113124","DOIUrl":null,"url":null,"abstract":"<div><div>Learning with synthetic data for anomaly detection has attracted a lot of attention. Recent works attempted to utilize generative adversarial networks (GANs) to generate pseudo-labeled synthetic samples for the model’s learning process. However, in real applications, the sparsity of originally labeled training samples leads to a model collapsing problem, such that most of the pseudo-labeled samples synthesized by GANs are crowded in a small area, resulting in the difficulty for GANs in learning the spatial distribution of samples. In this paper, we proposed a sparse label propagation-based anomaly detection approach using the multi-generators dual-discriminator framework (MGAN-LD). Firstly, DBSCAN clustering is utilized to assign samples to different clusters. Then, to expand the labeled training set, label propagation processes are carried out in each cluster to generate highly-credible pseudo-labels for unlabeled samples. Furthermore, a novel GAN with multiple generators is trained to simultaneously learn the local data distribution of different areas in the feature space based on the expanded training set to avoid the model collapsing. Finally, the training set is further augmented by synthetic samples from multiple generators of MGAN-LD, and the set is employed to train an overall discriminator. Benefiting from the data augmentation, MGAN-LD can build reliable classification boundaries between normal and abnormal samples. MGAN-LD is evaluated against nine classical anomaly detection methods on 11 public datasets. The results show that MGAN-LD improves the AUC metrics by an average of 10%, the AP metrics by an average of 17%, and the F1 metrics by an average of 15% compared with other classical methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"312 ","pages":"Article 113124"},"PeriodicalIF":7.2000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125001716","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Learning with synthetic data for anomaly detection has attracted a lot of attention. Recent works attempted to utilize generative adversarial networks (GANs) to generate pseudo-labeled synthetic samples for the model’s learning process. However, in real applications, the sparsity of originally labeled training samples leads to a model collapsing problem, such that most of the pseudo-labeled samples synthesized by GANs are crowded in a small area, resulting in the difficulty for GANs in learning the spatial distribution of samples. In this paper, we proposed a sparse label propagation-based anomaly detection approach using the multi-generators dual-discriminator framework (MGAN-LD). Firstly, DBSCAN clustering is utilized to assign samples to different clusters. Then, to expand the labeled training set, label propagation processes are carried out in each cluster to generate highly-credible pseudo-labels for unlabeled samples. Furthermore, a novel GAN with multiple generators is trained to simultaneously learn the local data distribution of different areas in the feature space based on the expanded training set to avoid the model collapsing. Finally, the training set is further augmented by synthetic samples from multiple generators of MGAN-LD, and the set is employed to train an overall discriminator. Benefiting from the data augmentation, MGAN-LD can build reliable classification boundaries between normal and abnormal samples. MGAN-LD is evaluated against nine classical anomaly detection methods on 11 public datasets. The results show that MGAN-LD improves the AUC metrics by an average of 10%, the AP metrics by an average of 17%, and the F1 metrics by an average of 15% compared with other classical methods.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.