{"title":"Network Anomaly Detection Using a Commute Distance Based Approach","authors":"N. Khoa, T. Babaie, S. Chawla, Z. Zaidi","doi":"10.1109/ICDMW.2010.90","DOIUrl":null,"url":null,"abstract":"We propose the use of commute distance, a random walk metric, to discover anomalies in network traffic data. The commute distance based anomaly detection approach has several advantages over Principal Component Analysis (PCA), which is the method of choice for this task: (i) It generalizes both distance and density based anomaly detection techniques while PCA is primarily distance-based (ii) It is agnostic about the underlying data distribution, while PCA is based on the assumption that data follows a Gaussian distribution and (iii) It is more robust compared to PCA, i.e., a perturbation of the underlying data or changes in parameters used will have a less significant effect on the output of it than PCA. Experiments and analysis on simulated and real datasets are used to validate our claims.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2010.90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
We propose the use of commute distance, a random walk metric, to discover anomalies in network traffic data. The commute distance based anomaly detection approach has several advantages over Principal Component Analysis (PCA), which is the method of choice for this task: (i) It generalizes both distance and density based anomaly detection techniques while PCA is primarily distance-based (ii) It is agnostic about the underlying data distribution, while PCA is based on the assumption that data follows a Gaussian distribution and (iii) It is more robust compared to PCA, i.e., a perturbation of the underlying data or changes in parameters used will have a less significant effect on the output of it than PCA. Experiments and analysis on simulated and real datasets are used to validate our claims.