Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, Philip S Yu
{"title":"RS-Forest:用于流异常检测的快速密度估计器。","authors":"Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, Philip S Yu","doi":"10.1109/ICDM.2014.45","DOIUrl":null,"url":null,"abstract":"<p><p>Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.</p>","PeriodicalId":74565,"journal":{"name":"Proceedings. IEEE International Conference on Data Mining","volume":"2014 ","pages":"600-609"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDM.2014.45","citationCount":"89","resultStr":"{\"title\":\"RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection.\",\"authors\":\"Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, Philip S Yu\",\"doi\":\"10.1109/ICDM.2014.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.</p>\",\"PeriodicalId\":74565,\"journal\":{\"name\":\"Proceedings. IEEE International Conference on Data Mining\",\"volume\":\"2014 \",\"pages\":\"600-609\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/ICDM.2014.45\",\"citationCount\":\"89\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2014.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2014.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection.
Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.