{"title":"Numerical Data Collection Under Input-Discriminative Local Differential Privacy","authors":"Youwen Zhu;Shibo Dai;Pengfei Zhang;Xiqi Kuang","doi":"10.1109/TKDE.2025.3610932","DOIUrl":null,"url":null,"abstract":"Input-discriminative local differential privacy (ID-LDP) protects user data with a different range of values, which improves the utility of the estimated data compared to traditional LDP. However, the existing ID-LDP methods are used for categorical data and cannot be directly applied to numerical data. In this paper, we propose a numerical data collection (NDC) framework with ID-LDP to provide discriminative protection for the data with different inputs. This framework uses a piecewise mechanism to divide the numerical data into several segments and designs two perturbation methods to minimize the mean value of numerical data based on values submitted by users. We first create an NDC-UE method that encodes the raw data into a binary vector. This method sets the uploaded data bit as 1 and the rest as zero and perturbs each bit with a given probability. We further propose an NDC-GRR algorithm to perturb the numerical data with an optimal privacy budget. To reduce the complexity of NDC-GRR, we apply a greedy algorithm-based spanner to shorten the computation time and improve the accuracy. Theoretical analysis proves that our schemes satisfy the definition of ID-LDP. Experimental results based on two real-world datasets and a synthetic dataset show that the proposed schemes have less mean square error compared with the benchmarks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7346-7361"},"PeriodicalIF":10.4000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11168245/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Input-discriminative local differential privacy (ID-LDP) protects user data with a different range of values, which improves the utility of the estimated data compared to traditional LDP. However, the existing ID-LDP methods are used for categorical data and cannot be directly applied to numerical data. In this paper, we propose a numerical data collection (NDC) framework with ID-LDP to provide discriminative protection for the data with different inputs. This framework uses a piecewise mechanism to divide the numerical data into several segments and designs two perturbation methods to minimize the mean value of numerical data based on values submitted by users. We first create an NDC-UE method that encodes the raw data into a binary vector. This method sets the uploaded data bit as 1 and the rest as zero and perturbs each bit with a given probability. We further propose an NDC-GRR algorithm to perturb the numerical data with an optimal privacy budget. To reduce the complexity of NDC-GRR, we apply a greedy algorithm-based spanner to shorten the computation time and improve the accuracy. Theoretical analysis proves that our schemes satisfy the definition of ID-LDP. Experimental results based on two real-world datasets and a synthetic dataset show that the proposed schemes have less mean square error compared with the benchmarks.
ID-LDP (Input-discriminative local differential privacy)对用户数据进行不同范围的保护,与传统LDP相比,提高了估计数据的利用率。但是,现有的ID-LDP方法用于分类数据,不能直接应用于数值数据。在本文中,我们提出了一个带有ID-LDP的数字数据收集(NDC)框架,为不同输入的数据提供区别保护。该框架采用分段机制将数值数据分成若干段,并根据用户提交的数值设计了两种微扰方法,使数值数据的均值最小。我们首先创建一个NDC-UE方法,将原始数据编码为二进制向量。该方法将上传的数据位设置为1,其余位设置为0,并以给定的概率扰动每个位。我们进一步提出了一种NDC-GRR算法,用最优隐私预算对数值数据进行扰动。为了降低NDC-GRR的复杂度,我们采用了一种基于贪心算法的扳手来缩短计算时间和提高精度。理论分析证明了我们的方案满足ID-LDP的定义。基于两个真实数据集和一个合成数据集的实验结果表明,与基准数据集相比,所提出的方案具有较小的均方误差。
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.