Van Quan Nguyen, V. H. Nguyen, Nhien-An Le-Khac, V. Cao
{"title":"Automatically Estimate Clusters in Autoencoder-based Clustering Model for Anomaly Detection","authors":"Van Quan Nguyen, V. H. Nguyen, Nhien-An Le-Khac, V. Cao","doi":"10.1109/RIVF51545.2021.9642120","DOIUrl":null,"url":null,"abstract":"In a previous work, a clustering-based method had been incorporated with the latent feature space of an autoencoder to discover sub-classes of normal data for anomaly detection. However, the work has the limitation in manually setting up the numbers of clusters in the normal training data. Finding a proper number of clusters in datasets is often ambiguous and highly depends on the characteristics of datasets. This paper proposes a novel data-driven empirical approach for automatically identifying the number of normal sub-classes (clusters) without human intervention. This clustering-based method, afterward, is co-trained with an autoencoder to automatically discover the appreciated number of clusters of normal training data in the middle hidden layer of the autoencoder. The resulting clustering centers are then used to identify anomalies in querying data. Our approach is tested on four scenarios from the CTU13 datasets, and the experimental results show that the proposed model often perform better than those of the model in the previous work on almost scenarios.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF51545.2021.9642120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In a previous work, a clustering-based method had been incorporated with the latent feature space of an autoencoder to discover sub-classes of normal data for anomaly detection. However, the work has the limitation in manually setting up the numbers of clusters in the normal training data. Finding a proper number of clusters in datasets is often ambiguous and highly depends on the characteristics of datasets. This paper proposes a novel data-driven empirical approach for automatically identifying the number of normal sub-classes (clusters) without human intervention. This clustering-based method, afterward, is co-trained with an autoencoder to automatically discover the appreciated number of clusters of normal training data in the middle hidden layer of the autoencoder. The resulting clustering centers are then used to identify anomalies in querying data. Our approach is tested on four scenarios from the CTU13 datasets, and the experimental results show that the proposed model often perform better than those of the model in the previous work on almost scenarios.