On the internal evaluation of unsupervised outlier detection

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI:10.1145/2791347.2791352

Henrique O. Marques, R. Campello, A. Zimek, J. Sander

{"title":"On the internal evaluation of unsupervised outlier detection","authors":"Henrique O. Marques, R. Campello, A. Zimek, J. Sander","doi":"10.1145/2791347.2791352","DOIUrl":null,"url":null,"abstract":"Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in relative terms) the solutions provided by different algorithms or by different parameterizations of a given algorithm in the absence of labeled data. However, in contrast to unsupervised cluster analysis, where indexes for internal evaluation and validation of clustering solutions have been conceived and shown to be very useful, in the outlier detection domain this problem has been notably overlooked. Here we discuss this problem and provide a solution for the internal evaluation of top-n (binary) outlier detection results. Specifically, we propose an index called IREOS (Internal, Relative Evaluation of Outlier Solutions) that can evaluate and compare different candidate labelings of a collection of multivariate observations in terms of outliers and inliers. We also statistically adjust IREOS for chance and extensively evaluate it in several experiments involving different collections of synthetic and real data sets.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2791347.2791352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in relative terms) the solutions provided by different algorithms or by different parameterizations of a given algorithm in the absence of labeled data. However, in contrast to unsupervised cluster analysis, where indexes for internal evaluation and validation of clustering solutions have been conceived and shown to be very useful, in the outlier detection domain this problem has been notably overlooked. Here we discuss this problem and provide a solution for the internal evaluation of top-n (binary) outlier detection results. Specifically, we propose an index called IREOS (Internal, Relative Evaluation of Outlier Solutions) that can evaluate and compare different candidate labelings of a collection of multivariate observations in terms of outliers and inliers. We also statistically adjust IREOS for chance and extensively evaluate it in several experiments involving different collections of synthetic and real data sets.

查看原文本刊更多论文

论无监督离群值检测的内部评价

尽管有大量且不断增长的文献解决了无监督离群值检测问题，但离群值检测结果的无监督评估在文献中仍然几乎没有触及。如果想要在没有标记数据的情况下对不同算法或给定算法的不同参数化提供的解决方案进行统计验证(绝对意义上的)或只是比较(相对意义上的)，则需要仅基于数据和评估的解决方案本身的所谓内部评估。然而，与无监督聚类分析相反，在无监督聚类分析中，用于内部评估和聚类解决方案验证的指标已经被构想出来并被证明是非常有用的，在离群值检测领域，这个问题明显被忽视了。在这里，我们讨论了这个问题，并提供了一个顶部n(二进制)离群值检测结果的内部评价的解决方案。具体来说，我们提出了一个名为IREOS(内部，相对评价的异常值解决方案)的指数，它可以评估和比较不同的候选标签的多变量观测集合的异常值和内线。我们还对IREOS进行了统计调整，并在涉及不同合成数据集和真实数据集的几个实验中对其进行了广泛评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 27th International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量