Privacy-Preserving Statistical Analysis With Low Redundancy Over Task-Relevant Microdata

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-04-08 DOI:10.1109/TIFS.2025.3556347

Jingcheng Zhao;Kaiping Xue;Yingjie Xue;Meng Li;Bin Zhu;Shaoxian Yuan

{"title":"Privacy-Preserving Statistical Analysis With Low Redundancy Over Task-Relevant Microdata","authors":"Jingcheng Zhao;Kaiping Xue;Yingjie Xue;Meng Li;Bin Zhu;Shaoxian Yuan","doi":"10.1109/TIFS.2025.3556347","DOIUrl":null,"url":null,"abstract":"Privacy-preserving statistical analysis enables the data center to analyze datasets from multiple data owners, extracting valuable insights while safeguarding privacy. However, the observation of microdata involvement in various analysis tasks within the data center can indirectly lead to privacy breaches. For instance, when the data center observes microdata involved in a disease-related task, it may reveal information about the corresponding user’s disease. Existing schemes process the entire dataset for each analysis task to prevent privacy breaches, resulting in significant redundancy overhead due to the large amount of task-irrelevant data involved in processing. In this paper, we propose FDC, which can protect privacy and effectively reduce the redundancy overhead. It frees the data center from huge redundancy overhead. Specifically, we propose a co-design of local differential privacy and multiparty computation with preprocessing by the data owner. This design enables the data center to process only task-relevant and LDP noise-induced microdata instead of the entire dataset while maintaining analysis results without accuracy loss. In some scenarios where preprocessing by the data owner is unfeasible, we present a data center-assisted method to complete preprocessing within the data center. Additionally, we design and optimize a secure shuffle protocol within this method. Finally, we implement and evaluate FDC using the aggregation task as a baseline. With different proportions of task-relevant microdata, experimental results show that the runtime of FDC is <inline-formula> <tex-math>$2\\sim 11$ </tex-math></inline-formula>x faster than existing schemes on LAN and <inline-formula> <tex-math>$2\\sim 22$ </tex-math></inline-formula>x on WAN, and the communication overhead is up to <inline-formula> <tex-math>$3\\sim 153$ </tex-math></inline-formula>x lower.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"4382-4395"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955169/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Privacy-preserving statistical analysis enables the data center to analyze datasets from multiple data owners, extracting valuable insights while safeguarding privacy. However, the observation of microdata involvement in various analysis tasks within the data center can indirectly lead to privacy breaches. For instance, when the data center observes microdata involved in a disease-related task, it may reveal information about the corresponding user’s disease. Existing schemes process the entire dataset for each analysis task to prevent privacy breaches, resulting in significant redundancy overhead due to the large amount of task-irrelevant data involved in processing. In this paper, we propose FDC, which can protect privacy and effectively reduce the redundancy overhead. It frees the data center from huge redundancy overhead. Specifically, we propose a co-design of local differential privacy and multiparty computation with preprocessing by the data owner. This design enables the data center to process only task-relevant and LDP noise-induced microdata instead of the entire dataset while maintaining analysis results without accuracy loss. In some scenarios where preprocessing by the data owner is unfeasible, we present a data center-assisted method to complete preprocessing within the data center. Additionally, we design and optimize a secure shuffle protocol within this method. Finally, we implement and evaluate FDC using the aggregation task as a baseline. With different proportions of task-relevant microdata, experimental results show that the runtime of FDC is

$2\sim 11$

x faster than existing schemes on LAN and

$2\sim 22$

x on WAN, and the communication overhead is up to

$3\sim 153$

x lower.

查看原文本刊更多论文

任务相关微数据的低冗余隐私保护统计分析

保护隐私的统计分析使数据中心能够分析来自多个数据所有者的数据集，在保护隐私的同时提取有价值的见解。然而，观察到微数据参与数据中心内的各种分析任务可能会间接导致隐私泄露。例如，当数据中心观察到与疾病相关的任务中涉及的微数据时，它可能会显示有关相应用户疾病的信息。现有的方案为每个分析任务处理整个数据集以防止隐私泄露，由于处理中涉及大量与任务无关的数据，导致大量冗余开销。在本文中，我们提出了FDC，它可以有效地保护隐私并减少冗余开销。它将数据中心从巨大的冗余开销中解放出来。具体来说，我们提出了一种局部差分隐私和多方计算的协同设计，并由数据所有者进行预处理。这种设计使数据中心能够仅处理与任务相关的和LDP噪声引起的微数据，而不是整个数据集，同时保持分析结果而不损失准确性。在某些情况下，数据所有者的预处理是不可行的，我们提出了一种数据中心辅助的方法来完成数据中心内的预处理。此外，我们在该方法中设计并优化了一个安全shuffle协议。最后，我们使用聚合任务作为基线来实现和评估FDC。实验结果表明，在不同比例的任务相关微数据下，FDC的运行时间比现有方案在局域网和广域网上的运行时间分别快2 / 11美元和2 / 22美元，通信开销最高可降低3 / 153美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features