Privacy-Preserving Outlier Detection with High Efficiency over Distributed Datasets

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Pub Date : 2021-05-10 DOI:10.1109/INFOCOM42981.2021.9488710

Guanghong Lu, Chunhui Duan, Guohao Zhou, Xuan Ding, Yunhao Liu

{"title":"Privacy-Preserving Outlier Detection with High Efficiency over Distributed Datasets","authors":"Guanghong Lu, Chunhui Duan, Guohao Zhou, Xuan Ding, Yunhao Liu","doi":"10.1109/INFOCOM42981.2021.9488710","DOIUrl":null,"url":null,"abstract":"The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.

查看原文本刊更多论文

分布式数据集上高效保护隐私的离群点检测

检测异常值的能力在数据挖掘中至关重要，广泛应用于许多领域，包括欺诈检测、恶意行为监控、健康诊断等。随着庞大的数据量变得比以往任何时候都更加分散，对一组分布式数据集的全局异常值检测是特别需要的。在这项工作中，我们提出了PIF (Privacy-preserving Isolation Forest)，它可以高效准确地检测多个分布式数据提供者的异常值，同时提供一定的安全性保证。为了实现这一目标，PIF对传统的ifforest算法进行了创新改进，使其能够在分布式环境中使用。通过一系列精心设计的算法，每个参与方协作高效地构建隔离树集合，而不会泄露数据的敏感信息。此外，为了处理涉及不同类型分区数据的复杂现实场景，我们提出了一种综合模式，该模式可用于水平和垂直分区数据模型。我们已经实施了我们的方法，并通过大量的实验对其进行了评估。结果表明，该算法的平均AUC可与现有的ifforest相媲美，并且在不侵犯隐私的情况下保持线性时间复杂度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications

自引率

0.00%

发文量