DP-tree:在差分隐私下索引多维数据(仅摘要)

Shangfu Peng, Y. Yang, Zhenjie Zhang, M. Winslett, Yong Yu
{"title":"DP-tree:在差分隐私下索引多维数据(仅摘要)","authors":"Shangfu Peng, Y. Yang, Zhenjie Zhang, M. Winslett, Yong Yu","doi":"10.1145/2213836.2213972","DOIUrl":null,"url":null,"abstract":"e-differential privacy (e-DP) is a strong and rigorous scheme for protecting individuals' privacy while releasing useful statistical information. The main idea is to inject random noise into the results of statistical queries, such that the existence of any single record has negligible impact on the distributions of query results. The accuracy of such randomized results depends heavily upon the query processing technique, which has been an active research topic in recent years. So far, most existing methods focus on 1-dimensional queries. The only work that handles multi-dimensional query processing under e-DP is [1], which indexes the sensitive data using variants of the quad-tree and the k-d-tree. As we point out in this paper, these structures are inherently suboptimal for answering queries under e-DP. Consequently, the solutions in [1] suffer from several serious drawbacks, including limited and unstable query accuracy, as well as bias towards certain types of queries. Motivated by this, we propose the DP-tree, a novel index structure for multi-dimensional query processing under e-DP that eliminates the problems encountered by the methods in [1]. Further, we show that the effectiveness of the DP-tree can be improved using statistical information about the query workload. Extensive experiments using real and synthetic datasets confirm that the DP-tree achieves significantly higher query accuracy than existing methods. Interestingly, an adaptation of the DP-tree also outperforms previous 1D solutions in their restricted scope, by large margins.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"DP-tree: indexing multi-dimensional data under differential privacy (abstract only)\",\"authors\":\"Shangfu Peng, Y. Yang, Zhenjie Zhang, M. Winslett, Yong Yu\",\"doi\":\"10.1145/2213836.2213972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"e-differential privacy (e-DP) is a strong and rigorous scheme for protecting individuals' privacy while releasing useful statistical information. The main idea is to inject random noise into the results of statistical queries, such that the existence of any single record has negligible impact on the distributions of query results. The accuracy of such randomized results depends heavily upon the query processing technique, which has been an active research topic in recent years. So far, most existing methods focus on 1-dimensional queries. The only work that handles multi-dimensional query processing under e-DP is [1], which indexes the sensitive data using variants of the quad-tree and the k-d-tree. As we point out in this paper, these structures are inherently suboptimal for answering queries under e-DP. Consequently, the solutions in [1] suffer from several serious drawbacks, including limited and unstable query accuracy, as well as bias towards certain types of queries. Motivated by this, we propose the DP-tree, a novel index structure for multi-dimensional query processing under e-DP that eliminates the problems encountered by the methods in [1]. Further, we show that the effectiveness of the DP-tree can be improved using statistical information about the query workload. Extensive experiments using real and synthetic datasets confirm that the DP-tree achieves significantly higher query accuracy than existing methods. Interestingly, an adaptation of the DP-tree also outperforms previous 1D solutions in their restricted scope, by large margins.\",\"PeriodicalId\":212616,\"journal\":{\"name\":\"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2213836.2213972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2213836.2213972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

e-差分隐私(e-DP)是一个强大而严格的方案,在保护个人隐私的同时发布有用的统计信息。其主要思想是在统计查询的结果中注入随机噪声,这样任何单个记录的存在对查询结果分布的影响都可以忽略不计。这种随机结果的准确性在很大程度上取决于查询处理技术,这是近年来一个活跃的研究课题。到目前为止,大多数现有方法都侧重于一维查询。唯一处理e-DP下多维查询处理的工作是[1],它使用四叉树和k-d树的变体对敏感数据进行索引。正如我们在本文中指出的那样,这些结构在回答e-DP下的查询时本质上是次优的。因此,[1]中的解决方案存在几个严重的缺点,包括有限和不稳定的查询准确性,以及对某些类型的查询的偏见。基于此,我们提出了一种新的索引结构DP-tree,用于e-DP下的多维查询处理,消除了[1]中方法遇到的问题。此外,我们还展示了使用有关查询工作负载的统计信息可以提高dp树的有效性。使用真实数据集和合成数据集进行的大量实验证实,dp树的查询精度明显高于现有方法。有趣的是,在有限的范围内,dp树的适应性也大大优于以前的一维解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DP-tree: indexing multi-dimensional data under differential privacy (abstract only)
e-differential privacy (e-DP) is a strong and rigorous scheme for protecting individuals' privacy while releasing useful statistical information. The main idea is to inject random noise into the results of statistical queries, such that the existence of any single record has negligible impact on the distributions of query results. The accuracy of such randomized results depends heavily upon the query processing technique, which has been an active research topic in recent years. So far, most existing methods focus on 1-dimensional queries. The only work that handles multi-dimensional query processing under e-DP is [1], which indexes the sensitive data using variants of the quad-tree and the k-d-tree. As we point out in this paper, these structures are inherently suboptimal for answering queries under e-DP. Consequently, the solutions in [1] suffer from several serious drawbacks, including limited and unstable query accuracy, as well as bias towards certain types of queries. Motivated by this, we propose the DP-tree, a novel index structure for multi-dimensional query processing under e-DP that eliminates the problems encountered by the methods in [1]. Further, we show that the effectiveness of the DP-tree can be improved using statistical information about the query workload. Extensive experiments using real and synthetic datasets confirm that the DP-tree achieves significantly higher query accuracy than existing methods. Interestingly, an adaptation of the DP-tree also outperforms previous 1D solutions in their restricted scope, by large margins.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信