Ning Wang , Jian Zhuang , Zhigang Wang , Zhiqiang Wei , Yu Gu , Peng Tang , Ge Yu
{"title":"Multidimensional categorical data collection under shuffled differential privacy","authors":"Ning Wang , Jian Zhuang , Zhigang Wang , Zhiqiang Wei , Yu Gu , Peng Tang , Ge Yu","doi":"10.1016/j.cose.2024.104301","DOIUrl":null,"url":null,"abstract":"<div><div>Estimating frequency distributions in multidimensional categorical data is fundamental for many real-world applications, but such data often contains sensitive personal information, necessitating robust privacy protection. The emerging shuffled differential privacy (SDP) model provides a promising solution, yet existing methods are either limited to single-dimensional data or suffer from poor accuracy in multidimensional scenarios. To address these challenges, this paper introduces Multiple Hash Mechanism (MHM), which uses an innovative hash-based local perturbation technique for efficient dimensionality reduction to improve the result accuracy under the SDP framework. Additionally, we provide a detailed analysis of the shuffling benefits of MHM outputs, showing significant accuracy improvements. For cases requiring personalized privacy levels, we propose the Overlapping Group Mechanism, which further enhances the shuffling benefits and boosts overall accuracy. Experimental results on real-world datasets validate the effectiveness of proposed methods.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"151 ","pages":"Article 104301"},"PeriodicalIF":4.8000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824006072","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Estimating frequency distributions in multidimensional categorical data is fundamental for many real-world applications, but such data often contains sensitive personal information, necessitating robust privacy protection. The emerging shuffled differential privacy (SDP) model provides a promising solution, yet existing methods are either limited to single-dimensional data or suffer from poor accuracy in multidimensional scenarios. To address these challenges, this paper introduces Multiple Hash Mechanism (MHM), which uses an innovative hash-based local perturbation technique for efficient dimensionality reduction to improve the result accuracy under the SDP framework. Additionally, we provide a detailed analysis of the shuffling benefits of MHM outputs, showing significant accuracy improvements. For cases requiring personalized privacy levels, we propose the Overlapping Group Mechanism, which further enhances the shuffling benefits and boosts overall accuracy. Experimental results on real-world datasets validate the effectiveness of proposed methods.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.