PrivKV: Key-Value Data Collection with Local Differential Privacy

Qingqing Ye, Haibo Hu, Xiaofeng Meng, Huadi Zheng
{"title":"PrivKV: Key-Value Data Collection with Local Differential Privacy","authors":"Qingqing Ye, Haibo Hu, Xiaofeng Meng, Huadi Zheng","doi":"10.1109/SP.2019.00018","DOIUrl":null,"url":null,"abstract":"Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same \"perturbation-calibration\" paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.","PeriodicalId":272713,"journal":{"name":"2019 IEEE Symposium on Security and Privacy (SP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"95","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2019.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 95

Abstract

Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same "perturbation-calibration" paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.
PrivKV:具有本地差分隐私的键值数据收集
本地差分隐私(LDP)是一种新的、有前途的保护隐私的分布式数据收集技术,每个用户在将其数据发送到不可信的数据收集器之前在本地扰动其数据。LDP的优点是使收集器能够在不访问敏感用户数据(例如位置和应用程序使用情况)的情况下获得准确的统计估计。然而,现有的LDP工作仅限于简单的数据类型,如分类、数值和集值数据。据我们所知,目前还没有针对键值数据的LDP工作,键值数据是一种非常流行的NoSQL数据模型,是集值和数值数据的广义形式。在本文中,我们通过首先设计一个基线方法PrivKV来研究键值数据的频率和均值估计问题,该方法与现有的LDP技术具有相同的“扰动校准”范式。为了解决由于用户的无意识扰动导致的估计精度不高的问题,我们提出了PrivKVM和PrivKVM+两种迭代方案,通过一系列的迭代可以逐步提高估计结果。提出了一种优化策略,通过在不需要用户参与的情况下在收集器端引入虚拟迭代来减少网络延迟和提高估计精度。通过理论分析和广泛的实验结果验证了这些解决方案的正确性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信