Locality Sensitive Outlier Detection: A ranking driven approach

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI:10.1109/ICDE.2011.5767852

Ye Wang, S. Parthasarathy, S. Tatikonda

引用次数: 65

Abstract

Outlier detection is fundamental to a variety of database and analytic tasks. Recently, distance-based outlier detection has emerged as a viable and scalable alternative to traditional statistical and geometric approaches. In this article we explore the role of ranking for the efficient discovery of distance-based outliers from large high dimensional data sets. Specifically, we develop a light-weight ranking scheme that is powered by locality sensitive hashing, which reorders the database points according to their likelihood of being an outlier. We provide theoretical arguments to justify the rationale for the approach and subsequently conduct an extensive empirical study highlighting the effectiveness of our approach over extant solutions. We show that our ranking scheme improves the efficiency of the distance-based outlier discovery process by up to 5-fold. Furthermore, we find that using our approach the top outliers can often be isolated very quickly, typically by scanning less than 3% of the data set.

查看原文本刊更多论文

局部敏感离群点检测:一种排序驱动方法

异常值检测是各种数据库和分析任务的基础。最近，基于距离的离群点检测已经成为传统统计和几何方法的一种可行且可扩展的替代方法。在本文中，我们探讨了排序对于从大型高维数据集中有效发现基于距离的离群值的作用。具体来说，我们开发了一个轻量级的排序方案，该方案由位置敏感散列提供支持，该散列根据它们成为离群值的可能性对数据库点进行重新排序。我们提供了理论论据来证明该方法的基本原理，随后进行了广泛的实证研究，突出了我们的方法相对于现有解决方案的有效性。我们表明，我们的排序方案将基于距离的离群点发现过程的效率提高了5倍。此外，我们发现使用我们的方法通常可以非常快速地分离出顶部异常值，通常通过扫描不到3%的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 27th International Conference on Data Engineering

自引率

0.00%

发文量