Omar Alghushairy, Raed Alsini, Xiaogang Ma, T. Soule
{"title":"A Genetic-Based Incremental Local Outlier Factor Algorithm for Efficient Data Stream Processing","authors":"Omar Alghushairy, Raed Alsini, Xiaogang Ma, T. Soule","doi":"10.1145/3388142.3388160","DOIUrl":null,"url":null,"abstract":"Interest in outlier detection methods is increasing because detecting outliers is an important operation for many applications such as detecting fraud transactions in credit card, network intrusion detection and data analysis in different domains. We are now in the big data era, and an important type of big data is data stream. With the increasing necessity for analyzing high-velocity data streams, it becomes difficult to apply older outlier detection methods efficiently. Local Outlier Factor (LOF) is a well-known outlier algorithm. A major challenge of LOF is that it requires the entire dataset and the distance values to be stored in memory. Another issue with LOF is that it needs to be recalculated from the beginning if any change occurs in the dataset. This research paper proposes a novel local outlier detection algorithm for data streams, called Genetic-based Incremental Local Outlier Factor (GILOF). The algorithm works without any previous knowledge of data distribution, and it executes in limited memory. The outcomes of our experiments with various real-world datasets demonstrate that GILOF has better performance in execution time and accuracy than other state-of-the-art LOF algorithms.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388142.3388160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Interest in outlier detection methods is increasing because detecting outliers is an important operation for many applications such as detecting fraud transactions in credit card, network intrusion detection and data analysis in different domains. We are now in the big data era, and an important type of big data is data stream. With the increasing necessity for analyzing high-velocity data streams, it becomes difficult to apply older outlier detection methods efficiently. Local Outlier Factor (LOF) is a well-known outlier algorithm. A major challenge of LOF is that it requires the entire dataset and the distance values to be stored in memory. Another issue with LOF is that it needs to be recalculated from the beginning if any change occurs in the dataset. This research paper proposes a novel local outlier detection algorithm for data streams, called Genetic-based Incremental Local Outlier Factor (GILOF). The algorithm works without any previous knowledge of data distribution, and it executes in limited memory. The outcomes of our experiments with various real-world datasets demonstrate that GILOF has better performance in execution time and accuracy than other state-of-the-art LOF algorithms.