基于k -均值算法的数据流异常点检测技术综述

2015 International Conference on Advances in Computer Engineering and Applications Pub Date : 2015-03-19 DOI:10.1109/ICACEA.2015.7164758

Prashant Chauhan, M. Shukla

{"title":"基于k -均值算法的数据流异常点检测技术综述","authors":"Prashant Chauhan, M. Shukla","doi":"10.1109/ICACEA.2015.7164758","DOIUrl":null,"url":null,"abstract":"Data Stream mining has gained attraction from many researchers as there is need to mine large dataset which pose different challenges for researchers. Stream data is different compared to normal data as they are continuously produced from different applications which impose different challenges like massive, infinite, concept drift for processing. An object that does not obey the behavior of normal data object is called outliers. Outlier detection is used in different applications like fraud detection, intrusion detection, track environmental changes, medical diagnosis so there is need to detect outliers from data streams. Various approaches are used for outlier detection. Some of them use K-Means algorithm for outlier detection in data streams which help to create a similar group or cluster of data points. Data stream clustering techniques are highly helpful to cluster similar data items in data streams and also to detect the outliers from them, so they are called cluster based outlier detection. K-means algorithm is partition based algorithm which is used for clustering datasets into number of clusters. It is most common and popular algorithm for clustering due to its simplicity and efficiency. Purpose of this paper is to review of different approaches of outlier detection which is used for K-Means algorithm for clustering dataset with some other methods. Different application areas of outlier detection are discussed in this paper.","PeriodicalId":202893,"journal":{"name":"2015 International Conference on Advances in Computer Engineering and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm\",\"authors\":\"Prashant Chauhan, M. Shukla\",\"doi\":\"10.1109/ICACEA.2015.7164758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data Stream mining has gained attraction from many researchers as there is need to mine large dataset which pose different challenges for researchers. Stream data is different compared to normal data as they are continuously produced from different applications which impose different challenges like massive, infinite, concept drift for processing. An object that does not obey the behavior of normal data object is called outliers. Outlier detection is used in different applications like fraud detection, intrusion detection, track environmental changes, medical diagnosis so there is need to detect outliers from data streams. Various approaches are used for outlier detection. Some of them use K-Means algorithm for outlier detection in data streams which help to create a similar group or cluster of data points. Data stream clustering techniques are highly helpful to cluster similar data items in data streams and also to detect the outliers from them, so they are called cluster based outlier detection. K-means algorithm is partition based algorithm which is used for clustering datasets into number of clusters. It is most common and popular algorithm for clustering due to its simplicity and efficiency. Purpose of this paper is to review of different approaches of outlier detection which is used for K-Means algorithm for clustering dataset with some other methods. Different application areas of outlier detection are discussed in this paper.\",\"PeriodicalId\":202893,\"journal\":{\"name\":\"2015 International Conference on Advances in Computer Engineering and Applications\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Advances in Computer Engineering and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACEA.2015.7164758\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Advances in Computer Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACEA.2015.7164758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

摘要

由于数据流挖掘需要挖掘大量数据，这给研究人员带来了不同的挑战，因此数据流挖掘受到了许多研究人员的关注。流数据与普通数据不同，因为它们是由不同的应用程序连续产生的，这些应用程序对处理过程提出了不同的挑战，比如海量的、无限的、概念漂移。不遵循正常数据对象行为的对象称为离群值。异常值检测用于欺诈检测、入侵检测、跟踪环境变化、医疗诊断等不同的应用中，因此需要从数据流中检测异常值。不同的方法被用于异常值检测。其中一些使用K-Means算法在数据流中进行离群值检测，这有助于创建类似的数据点组或簇。数据流聚类技术有助于对数据流中相似的数据项进行聚类，并从中检测出异常值，因此被称为基于聚类的异常值检测。K-means算法是一种基于分区的算法，用于将数据集聚成若干个簇。由于它的简单和高效，它是最常见和流行的聚类算法。本文综述了K-Means聚类算法中不同的离群点检测方法和其他方法。本文讨论了离群值检测的不同应用领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm

Data Stream mining has gained attraction from many researchers as there is need to mine large dataset which pose different challenges for researchers. Stream data is different compared to normal data as they are continuously produced from different applications which impose different challenges like massive, infinite, concept drift for processing. An object that does not obey the behavior of normal data object is called outliers. Outlier detection is used in different applications like fraud detection, intrusion detection, track environmental changes, medical diagnosis so there is need to detect outliers from data streams. Various approaches are used for outlier detection. Some of them use K-Means algorithm for outlier detection in data streams which help to create a similar group or cluster of data points. Data stream clustering techniques are highly helpful to cluster similar data items in data streams and also to detect the outliers from them, so they are called cluster based outlier detection. K-means algorithm is partition based algorithm which is used for clustering datasets into number of clusters. It is most common and popular algorithm for clustering due to its simplicity and efficiency. Purpose of this paper is to review of different approaches of outlier detection which is used for K-Means algorithm for clustering dataset with some other methods. Different application areas of outlier detection are discussed in this paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Advances in Computer Engineering and Applications

自引率

0.00%

发文量