PrivKV:具有本地差分隐私的键值数据收集

2019 IEEE Symposium on Security and Privacy (SP) Pub Date : 2019-05-19 DOI:10.1109/SP.2019.00018

Qingqing Ye, Haibo Hu, Xiaofeng Meng, Huadi Zheng

{"title":"PrivKV:具有本地差分隐私的键值数据收集","authors":"Qingqing Ye, Haibo Hu, Xiaofeng Meng, Huadi Zheng","doi":"10.1109/SP.2019.00018","DOIUrl":null,"url":null,"abstract":"Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same \"perturbation-calibration\" paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.","PeriodicalId":272713,"journal":{"name":"2019 IEEE Symposium on Security and Privacy (SP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"95","resultStr":"{\"title\":\"PrivKV: Key-Value Data Collection with Local Differential Privacy\",\"authors\":\"Qingqing Ye, Haibo Hu, Xiaofeng Meng, Huadi Zheng\",\"doi\":\"10.1109/SP.2019.00018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same \\\"perturbation-calibration\\\" paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.\",\"PeriodicalId\":272713,\"journal\":{\"name\":\"2019 IEEE Symposium on Security and Privacy (SP)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"95\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Symposium on Security and Privacy (SP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SP.2019.00018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2019.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 95

摘要

本地差分隐私(LDP)是一种新的、有前途的保护隐私的分布式数据收集技术，每个用户在将其数据发送到不可信的数据收集器之前在本地扰动其数据。LDP的优点是使收集器能够在不访问敏感用户数据(例如位置和应用程序使用情况)的情况下获得准确的统计估计。然而，现有的LDP工作仅限于简单的数据类型，如分类、数值和集值数据。据我们所知，目前还没有针对键值数据的LDP工作，键值数据是一种非常流行的NoSQL数据模型，是集值和数值数据的广义形式。在本文中，我们通过首先设计一个基线方法PrivKV来研究键值数据的频率和均值估计问题，该方法与现有的LDP技术具有相同的“扰动校准”范式。为了解决由于用户的无意识扰动导致的估计精度不高的问题，我们提出了PrivKVM和PrivKVM+两种迭代方案，通过一系列的迭代可以逐步提高估计结果。提出了一种优化策略，通过在不需要用户参与的情况下在收集器端引入虚拟迭代来减少网络延迟和提高估计精度。通过理论分析和广泛的实验结果验证了这些解决方案的正确性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PrivKV: Key-Value Data Collection with Local Differential Privacy

Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same "perturbation-calibration" paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量