Privacy-Preserving Outlier Detection with High Efficiency over Distributed Datasets

Guanghong Lu, Chunhui Duan, Guohao Zhou, Xuan Ding, Yunhao Liu
{"title":"Privacy-Preserving Outlier Detection with High Efficiency over Distributed Datasets","authors":"Guanghong Lu, Chunhui Duan, Guohao Zhou, Xuan Ding, Yunhao Liu","doi":"10.1109/INFOCOM42981.2021.9488710","DOIUrl":null,"url":null,"abstract":"The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.
分布式数据集上高效保护隐私的离群点检测
检测异常值的能力在数据挖掘中至关重要,广泛应用于许多领域,包括欺诈检测、恶意行为监控、健康诊断等。随着庞大的数据量变得比以往任何时候都更加分散,对一组分布式数据集的全局异常值检测是特别需要的。在这项工作中,我们提出了PIF (Privacy-preserving Isolation Forest),它可以高效准确地检测多个分布式数据提供者的异常值,同时提供一定的安全性保证。为了实现这一目标,PIF对传统的ifforest算法进行了创新改进,使其能够在分布式环境中使用。通过一系列精心设计的算法,每个参与方协作高效地构建隔离树集合,而不会泄露数据的敏感信息。此外,为了处理涉及不同类型分区数据的复杂现实场景,我们提出了一种综合模式,该模式可用于水平和垂直分区数据模型。我们已经实施了我们的方法,并通过大量的实验对其进行了评估。结果表明,该算法的平均AUC可与现有的ifforest相媲美,并且在不侵犯隐私的情况下保持线性时间复杂度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信