隔离森林算法族分支节点的超球

Jayanta Choudhury, Piseth Ky, Yi-Ming Ren, Chenhua Shi
{"title":"隔离森林算法族分支节点的超球","authors":"Jayanta Choudhury, Piseth Ky, Yi-Ming Ren, Chenhua Shi","doi":"10.1109/SMARTCOMP52413.2021.00090","DOIUrl":null,"url":null,"abstract":"We propose Finite Boundary (FB) version of Isolation Forest (IF), Split Selection Criterion iForest (SciForest) and Extended Isolation Forest (EIF) algorithms using hypersphere as branching boundary for enhanced consistency in anomaly score. EIF substitutes axis parallel hyperplanes with slanted hyperplanes as in SciForest for a remedy of the problem of inconsistent anomaly score. EIF offers an improvement of computation speed over SciForest algorithm by removing the search for the optimum hyperplane for branching. We identify inconsistency in anomaly score by EIF for a synthetic 2-D spiral dataset and inconsistency in anomaly score for single blob of 2-D synthetic gaussian dataset by SciForest to empirically show that the slanted hyperplanes alone is insufficient. First, we explain the abnormal decrease of anomaly score for anomalous data points due to the unexpected increase in the number of branching for anomalous data points by the infinite extensions of hyperplanes. Second, we propose to use hyper-sphere as a suitable option for generalized branching decision boundary. Next, we empirically show that the anomaly scores suffer not from the artifacts of axis parallelism of the hyper-planes of IF, by comparing anomaly scores with finite boundary hyper-sphere as branching decision boundary against the slanted hyperplanes and highlight the redundant extension of the infinite hyperplanes as the dominant cause of the inconsistency in anomaly score. Third, we apply FB version of IF (FBIF), EIF (FBEIF) and SciForest (FBSciForest) to several standard 2-D synthetic datasets to assess robustness and computation speed in comparison to EIF, SciForest and IF.","PeriodicalId":330785,"journal":{"name":"2021 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hypersphere for Branching Node for the Family of Isolation Forest Algorithms\",\"authors\":\"Jayanta Choudhury, Piseth Ky, Yi-Ming Ren, Chenhua Shi\",\"doi\":\"10.1109/SMARTCOMP52413.2021.00090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose Finite Boundary (FB) version of Isolation Forest (IF), Split Selection Criterion iForest (SciForest) and Extended Isolation Forest (EIF) algorithms using hypersphere as branching boundary for enhanced consistency in anomaly score. EIF substitutes axis parallel hyperplanes with slanted hyperplanes as in SciForest for a remedy of the problem of inconsistent anomaly score. EIF offers an improvement of computation speed over SciForest algorithm by removing the search for the optimum hyperplane for branching. We identify inconsistency in anomaly score by EIF for a synthetic 2-D spiral dataset and inconsistency in anomaly score for single blob of 2-D synthetic gaussian dataset by SciForest to empirically show that the slanted hyperplanes alone is insufficient. First, we explain the abnormal decrease of anomaly score for anomalous data points due to the unexpected increase in the number of branching for anomalous data points by the infinite extensions of hyperplanes. Second, we propose to use hyper-sphere as a suitable option for generalized branching decision boundary. Next, we empirically show that the anomaly scores suffer not from the artifacts of axis parallelism of the hyper-planes of IF, by comparing anomaly scores with finite boundary hyper-sphere as branching decision boundary against the slanted hyperplanes and highlight the redundant extension of the infinite hyperplanes as the dominant cause of the inconsistency in anomaly score. Third, we apply FB version of IF (FBIF), EIF (FBEIF) and SciForest (FBSciForest) to several standard 2-D synthetic datasets to assess robustness and computation speed in comparison to EIF, SciForest and IF.\",\"PeriodicalId\":330785,\"journal\":{\"name\":\"2021 IEEE International Conference on Smart Computing (SMARTCOMP)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Smart Computing (SMARTCOMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMARTCOMP52413.2021.00090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP52413.2021.00090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

我们提出了有限边界(FB)版本的隔离森林(IF),分裂选择准则ifforest (SciForest)和扩展隔离森林(EIF)算法,使用超球作为分支边界来增强异常评分的一致性。EIF用倾斜超平面代替轴平行超平面,以解决异常评分不一致的问题。与SciForest算法相比,EIF算法消除了对分支的最优超平面的搜索,从而提高了计算速度。我们对合成的二维螺旋数据集的EIF异常评分不一致和SciForest合成的二维高斯数据集的单个blob异常评分不一致进行了识别,以经验表明仅使用倾斜超平面是不够的。首先,我们解释了异常数据点异常分数的异常下降是由于超平面的无限扩展导致异常数据点分支数的意外增加。其次,我们提出使用超球作为广义分支决策边界的合适选择。接下来,我们通过将异常分数与有限边界超球作为分支决策边界与倾斜超平面进行比较,并强调无限超平面的冗余扩展是异常分数不一致的主要原因,经验表明异常分数不受中频超平面轴平行性的影响。第三,我们将FB版本的IF (FBIF), EIF (FBEIF)和SciForest (FBSciForest)应用于几个标准的二维合成数据集,以评估与EIF, SciForest和IF相比的鲁棒性和计算速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hypersphere for Branching Node for the Family of Isolation Forest Algorithms
We propose Finite Boundary (FB) version of Isolation Forest (IF), Split Selection Criterion iForest (SciForest) and Extended Isolation Forest (EIF) algorithms using hypersphere as branching boundary for enhanced consistency in anomaly score. EIF substitutes axis parallel hyperplanes with slanted hyperplanes as in SciForest for a remedy of the problem of inconsistent anomaly score. EIF offers an improvement of computation speed over SciForest algorithm by removing the search for the optimum hyperplane for branching. We identify inconsistency in anomaly score by EIF for a synthetic 2-D spiral dataset and inconsistency in anomaly score for single blob of 2-D synthetic gaussian dataset by SciForest to empirically show that the slanted hyperplanes alone is insufficient. First, we explain the abnormal decrease of anomaly score for anomalous data points due to the unexpected increase in the number of branching for anomalous data points by the infinite extensions of hyperplanes. Second, we propose to use hyper-sphere as a suitable option for generalized branching decision boundary. Next, we empirically show that the anomaly scores suffer not from the artifacts of axis parallelism of the hyper-planes of IF, by comparing anomaly scores with finite boundary hyper-sphere as branching decision boundary against the slanted hyperplanes and highlight the redundant extension of the infinite hyperplanes as the dominant cause of the inconsistency in anomaly score. Third, we apply FB version of IF (FBIF), EIF (FBEIF) and SciForest (FBSciForest) to several standard 2-D synthetic datasets to assess robustness and computation speed in comparison to EIF, SciForest and IF.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信