Ordinal isolation: An efficient and effective intelligent outlier detection algorithm

2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems Pub Date : 2011-03-20 DOI:10.1109/CYBER.2011.6011757

Gang Chen, Yuan-li Cai, Juan Shi

{"title":"Ordinal isolation: An efficient and effective intelligent outlier detection algorithm","authors":"Gang Chen, Yuan-li Cai, Juan Shi","doi":"10.1109/CYBER.2011.6011757","DOIUrl":null,"url":null,"abstract":"Outlier detection plays important roles in intelligent cyber systems, especially for fault-tolerant and adaptive ones. Traditional algorithms always need to evaluate distances or densities, which are very time-consuming. On the increasingly urgent demand for real-time, during past years, various novel algorithms have been proposed. They are much faster, but less stable and accurate. To cope with these problems, with the core idea of ordinal optimization and the ‘few and different’ characteristics of outliers, by introducing the concept of outlier probability, we propose the ordinal isolation algorithm, which extracts outliers in terms of the order of being isolated in a recursive uniform data space partition process. It doesn't need any distance or density evaluating, and the complexity is reduced to O(n). Experiments show that, the CPU time of ordinal isolation increases linearly with linearly growing data sets. Furthermore, compared with recent iForest algorithm, ordinal isolation is about 30 times faster, with 20% to 30% improvement in accuracy, and especially is much more stable. Ordinal isolation also has good scalability, so it works well in high-dimensional data sets which have a huge number of instances and irrelevant attributes.","PeriodicalId":131682,"journal":{"name":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBER.2011.6011757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Outlier detection plays important roles in intelligent cyber systems, especially for fault-tolerant and adaptive ones. Traditional algorithms always need to evaluate distances or densities, which are very time-consuming. On the increasingly urgent demand for real-time, during past years, various novel algorithms have been proposed. They are much faster, but less stable and accurate. To cope with these problems, with the core idea of ordinal optimization and the ‘few and different’ characteristics of outliers, by introducing the concept of outlier probability, we propose the ordinal isolation algorithm, which extracts outliers in terms of the order of being isolated in a recursive uniform data space partition process. It doesn't need any distance or density evaluating, and the complexity is reduced to O(n). Experiments show that, the CPU time of ordinal isolation increases linearly with linearly growing data sets. Furthermore, compared with recent iForest algorithm, ordinal isolation is about 30 times faster, with 20% to 30% improvement in accuracy, and especially is much more stable. Ordinal isolation also has good scalability, so it works well in high-dimensional data sets which have a huge number of instances and irrelevant attributes.

查看原文本刊更多论文

有序隔离:一种高效的智能离群点检测算法

异常点检测在智能网络系统中起着重要的作用，特别是在容错和自适应网络系统中。传统算法总是需要计算距离或密度，这是非常耗时的。随着实时需求的日益迫切，近年来出现了各种新颖的算法。它们要快得多，但稳定性和准确性较差。针对这些问题，本文以有序优化的核心思想和离群点“少而不同”的特点，通过引入离群点概率的概念，提出了有序隔离算法，该算法在递归均匀数据空间划分过程中，按照被隔离的先后顺序提取离群点。它不需要任何距离或密度计算，并且复杂度降低到O(n)。实验表明，随着数据集的线性增长，顺序隔离的CPU时间呈线性增长。此外，与最近的ifforest算法相比，顺序隔离算法的速度提高了约30倍，准确率提高了20%至30%，特别是稳定性要高得多。顺序隔离还具有良好的可扩展性，因此它适用于具有大量实例和不相关属性的高维数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems

自引率

0.00%

发文量