Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks

Dan Feldman, C. Xiang, Ruihao Zhu, D. Rus
{"title":"Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks","authors":"Dan Feldman, C. Xiang, Ruihao Zhu, D. Rus","doi":"10.1145/3055031.3055090","DOIUrl":null,"url":null,"abstract":"Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location of key community facilities, and locations of gathering places. Can we publish and run queries on mobile sensor network databases without disclosing information about individual nodes?Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This paper describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, we give the first k-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data's dimension d. Previous results introduced errors that are exponential in d.We implemented this algorithm and used it to create differentially private location data from GPS tracks. Specifically our algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. We provide experimental results for the system and algorithms, and compare them to existing techniques. To the best of our knowledge, this is the first practical system that enables differentially private clustering on real data.","PeriodicalId":228318,"journal":{"name":"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055031.3055090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

Abstract

Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location of key community facilities, and locations of gathering places. Can we publish and run queries on mobile sensor network databases without disclosing information about individual nodes?Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This paper describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, we give the first k-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data's dimension d. Previous results introduced errors that are exponential in d.We implemented this algorithm and used it to create differentially private location data from GPS tracks. Specifically our algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. We provide experimental results for the system and algorithms, and compare them to existing techniques. To the best of our knowledge, this is the first practical system that enables differentially private clustering on real data.
差分私有k均值聚类的核心集及其在移动传感器网络中的应用
移动传感器网络是一个重要的数据来源。通过移动传感器节点收集用户社区中个人的数据,例如使用他们的智能手机,我们可以了解全球信息,例如城市交通拥堵模式,关键社区设施的位置,聚集地的位置。我们可以在不泄露单个节点信息的情况下发布和运行移动传感器网络数据库上的查询吗?差异隐私是一种强烈的隐私概念,它保证数据库中的个人记录几乎不会被了解,无论攻击者已经知道或希望了解什么。然而,目前还没有一个实用的系统将差分隐私算法应用于真实数据库上的点聚类。本文描述了计算一组点的k-均值聚类同时保持微分隐私的小核心集的构造。因此,我们给出了第一个k-means聚类算法,该算法既具有差分私有性,又具有亚线性依赖于数据维度d的近似误差。之前的结果引入了d的指数误差。我们实现了该算法并使用它来创建来自GPS轨迹的差分私有位置数据。具体来说,我们的算法允许对从移动节点生成的GPS数据库进行聚类,同时让用户控制由于隐私而引入的噪声。我们提供了系统和算法的实验结果,并与现有技术进行了比较。据我们所知,这是第一个在真实数据上实现差异化私有集群的实用系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信