Sub-Grid Partitioning Algorithm for Distributed Outlier Detection on Big Data

Mohamed Sakr, Walid Atwa, A. Keshk
{"title":"Sub-Grid Partitioning Algorithm for Distributed Outlier Detection on Big Data","authors":"Mohamed Sakr, Walid Atwa, A. Keshk","doi":"10.1109/ICCES.2018.8639409","DOIUrl":null,"url":null,"abstract":"Anomaly detection or outlier detection has become a major research problem in the era of big data. It is used in many applications, remove noise from signals and in credit card fraud detection. One type of outlier detection is Density-based outlier detection. Its major uniqueness is in detecting outlier points in different densities. One of the algorithms that are based on density based outlier detection is Local Outlier Factor (LOF). LOF gives every point a score that identifies its outlierness compared to other points. In this paper, we propose a new algorithm called sub-Grid partition (SGP) algorithm. SGP algorithm helps in calculating the LOF for Big Data in a distributed environment. SGP algorithm splits the tuples into small grids each grid is splitted into sub-grids. Sub-grids in the border are duplicated in every processing node for calculating the LOF for every tuple in these grids. Duplication of sub-grids lead to increase in the number of tuples that will be processed but in the other hand reduces the network overhead required for communication between processing nodes and reducing processing node idle time waiting for the requested tuple. In the end, we evaluate the performance of the SGP algorithm through a series of simulation experiments over real data sets.","PeriodicalId":113848,"journal":{"name":"2018 13th International Conference on Computer Engineering and Systems (ICCES)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 13th International Conference on Computer Engineering and Systems (ICCES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCES.2018.8639409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Anomaly detection or outlier detection has become a major research problem in the era of big data. It is used in many applications, remove noise from signals and in credit card fraud detection. One type of outlier detection is Density-based outlier detection. Its major uniqueness is in detecting outlier points in different densities. One of the algorithms that are based on density based outlier detection is Local Outlier Factor (LOF). LOF gives every point a score that identifies its outlierness compared to other points. In this paper, we propose a new algorithm called sub-Grid partition (SGP) algorithm. SGP algorithm helps in calculating the LOF for Big Data in a distributed environment. SGP algorithm splits the tuples into small grids each grid is splitted into sub-grids. Sub-grids in the border are duplicated in every processing node for calculating the LOF for every tuple in these grids. Duplication of sub-grids lead to increase in the number of tuples that will be processed but in the other hand reduces the network overhead required for communication between processing nodes and reducing processing node idle time waiting for the requested tuple. In the end, we evaluate the performance of the SGP algorithm through a series of simulation experiments over real data sets.
大数据分布式离群点检测的子网格划分算法
异常检测或异常点检测已成为大数据时代的主要研究问题。它被用于许多应用中,从信号中去除噪声和信用卡欺诈检测。一种异常点检测是基于密度的异常点检测。它的独特之处在于在不同密度下检测离群点。局部离群因子(Local outlier Factor, LOF)是基于密度的离群点检测算法之一。LOF给每个点一个分数,以识别其与其他点相比的异常值。本文提出了一种新的网格划分算法——子网格划分算法。SGP算法有助于计算分布式环境下大数据的LOF。SGP算法将元组分成小网格,每个网格又分成子网格。边界中的子网格在每个处理节点中被复制,用于计算这些网格中每个元组的LOF。子网格的重复导致将被处理的元组数量的增加,但另一方面减少了处理节点之间通信所需的网络开销,并减少了处理节点等待请求元组的空闲时间。最后,我们通过一系列真实数据集的仿真实验来评估SGP算法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信