{"title":"Privacy-Preserving Statistical Analysis With Low Redundancy Over Task-Relevant Microdata","authors":"Jingcheng Zhao;Kaiping Xue;Yingjie Xue;Meng Li;Bin Zhu;Shaoxian Yuan","doi":"10.1109/TIFS.2025.3556347","DOIUrl":null,"url":null,"abstract":"Privacy-preserving statistical analysis enables the data center to analyze datasets from multiple data owners, extracting valuable insights while safeguarding privacy. However, the observation of microdata involvement in various analysis tasks within the data center can indirectly lead to privacy breaches. For instance, when the data center observes microdata involved in a disease-related task, it may reveal information about the corresponding user’s disease. Existing schemes process the entire dataset for each analysis task to prevent privacy breaches, resulting in significant redundancy overhead due to the large amount of task-irrelevant data involved in processing. In this paper, we propose FDC, which can protect privacy and effectively reduce the redundancy overhead. It frees the data center from huge redundancy overhead. Specifically, we propose a co-design of local differential privacy and multiparty computation with preprocessing by the data owner. This design enables the data center to process only task-relevant and LDP noise-induced microdata instead of the entire dataset while maintaining analysis results without accuracy loss. In some scenarios where preprocessing by the data owner is unfeasible, we present a data center-assisted method to complete preprocessing within the data center. Additionally, we design and optimize a secure shuffle protocol within this method. Finally, we implement and evaluate FDC using the aggregation task as a baseline. With different proportions of task-relevant microdata, experimental results show that the runtime of FDC is <inline-formula> <tex-math>$2\\sim 11$ </tex-math></inline-formula>x faster than existing schemes on LAN and <inline-formula> <tex-math>$2\\sim 22$ </tex-math></inline-formula>x on WAN, and the communication overhead is up to <inline-formula> <tex-math>$3\\sim 153$ </tex-math></inline-formula>x lower.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"4382-4395"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955169/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Privacy-preserving statistical analysis enables the data center to analyze datasets from multiple data owners, extracting valuable insights while safeguarding privacy. However, the observation of microdata involvement in various analysis tasks within the data center can indirectly lead to privacy breaches. For instance, when the data center observes microdata involved in a disease-related task, it may reveal information about the corresponding user’s disease. Existing schemes process the entire dataset for each analysis task to prevent privacy breaches, resulting in significant redundancy overhead due to the large amount of task-irrelevant data involved in processing. In this paper, we propose FDC, which can protect privacy and effectively reduce the redundancy overhead. It frees the data center from huge redundancy overhead. Specifically, we propose a co-design of local differential privacy and multiparty computation with preprocessing by the data owner. This design enables the data center to process only task-relevant and LDP noise-induced microdata instead of the entire dataset while maintaining analysis results without accuracy loss. In some scenarios where preprocessing by the data owner is unfeasible, we present a data center-assisted method to complete preprocessing within the data center. Additionally, we design and optimize a secure shuffle protocol within this method. Finally, we implement and evaluate FDC using the aggregation task as a baseline. With different proportions of task-relevant microdata, experimental results show that the runtime of FDC is $2\sim 11$ x faster than existing schemes on LAN and $2\sim 22$ x on WAN, and the communication overhead is up to $3\sim 153$ x lower.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features