基于加权聚类的流数据无监督异常点检测

2012 12th International Conference on Intelligent Systems Design and Applications (ISDA) Pub Date : 2012-11-01 DOI:10.1109/ISDA.2012.6416666

Yogita Thakran, Durga Toshniwal

{"title":"基于加权聚类的流数据无监督异常点检测","authors":"Yogita Thakran, Durga Toshniwal","doi":"10.1109/ISDA.2012.6416666","DOIUrl":null,"url":null,"abstract":"Outlier detection is a very important task in many fields like network intrusion detection, credit card fraud detection, stock market analysis, detecting outlying cases in medical data etc. Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in coming data over time. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data. In proposed scheme both density based and partitioning clustering method are combined to take advantage of both density based and distance based outlier detection. Proposed scheme also assigns weights to attributes depending upon their respective relevance in mining task and weights are adaptive in nature. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.","PeriodicalId":370150,"journal":{"name":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Unsupervised outlier detection in streaming data using weighted clustering\",\"authors\":\"Yogita Thakran, Durga Toshniwal\",\"doi\":\"10.1109/ISDA.2012.6416666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier detection is a very important task in many fields like network intrusion detection, credit card fraud detection, stock market analysis, detecting outlying cases in medical data etc. Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in coming data over time. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data. In proposed scheme both density based and partitioning clustering method are combined to take advantage of both density based and distance based outlier detection. Proposed scheme also assigns weights to attributes depending upon their respective relevance in mining task and weights are adaptive in nature. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.\",\"PeriodicalId\":370150,\"journal\":{\"name\":\"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISDA.2012.6416666\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2012.6416666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

摘要

离群点检测在网络入侵检测、信用卡欺诈检测、股票市场分析、医疗数据离群点检测等领域都是一项非常重要的任务。流数据中的异常值检测非常具有挑战性，因为流数据不能多次扫描，而且随着时间的推移，新的概念可能会不断发展。不相关的属性可以称为噪声属性，这些属性进一步增加了处理数据流的挑战。本文提出了一种流数据的无监督离群值检测方案。该方案基于聚类，因为聚类是一种无监督的数据挖掘任务，不需要标记数据。该方案将密度聚类和分区聚类相结合，充分利用了密度聚类和距离聚类的优势。该方案还根据属性在挖掘任务中的相关性为属性分配权重，且权重具有自适应性。加权属性有助于减少或消除噪声属性的影响。考虑到流数据的挑战，提出的方案是增量的，并适应概念的演变。在合成数据集和真实世界数据集上的实验结果表明，我们提出的方法在异常值检测率、虚警率和异常值增加百分比方面优于其他现有方法(CORM)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsupervised outlier detection in streaming data using weighted clustering

Outlier detection is a very important task in many fields like network intrusion detection, credit card fraud detection, stock market analysis, detecting outlying cases in medical data etc. Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in coming data over time. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data. In proposed scheme both density based and partitioning clustering method are combined to take advantage of both density based and distance based outlier detection. Proposed scheme also assigns weights to attributes depending upon their respective relevance in mining task and weights are adaptive in nature. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)

自引率

0.00%

发文量