Generating Synthetic Decentralized Social Graphs with Local Differential Privacy

Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security Pub Date : 2017-10-30 DOI:10.1145/3133956.3134086

Zhan Qin, Ting Yu, Y. Yang, Issa M. Khalil, Xiaokui Xiao, K. Ren

{"title":"Generating Synthetic Decentralized Social Graphs with Local Differential Privacy","authors":"Zhan Qin, Ting Yu, Y. Yang, Issa M. Khalil, Xiaokui Xiao, K. Ren","doi":"10.1145/3133956.3134086","DOIUrl":null,"url":null,"abstract":"A large amount of valuable information resides in decentralized social graphs, where no entity has access to the complete graph structure. Instead, each user maintains locally a limited view of the graph. For example, in a phone network, each user keeps a contact list locally in her phone, and does not have access to other users' contacts. The contact lists of all users form an implicit social graph that could be very useful to study the interaction patterns among different populations. However, due to privacy concerns, one could not simply collect the unfettered local views from users and reconstruct a decentralized social network. In this paper, we investigate techniques to ensure local differential privacy of individuals while collecting structural information and generating representative synthetic social graphs. We show that existing local differential privacy and synthetic graph generation techniques are insufficient for preserving important graph properties, due to excessive noise injection, inability to retain important graph structure, or both. Motivated by this, we propose LDPGen, a novel multi-phase technique that incrementally clusters users based on their connections to different partitions of the whole population. Every time a user reports information, LDPGen carefully injects noise to ensure local differential privacy.We derive optimal parameters in this process to cluster structurally-similar users together. Once a good clustering of users is obtained, LDPGen adapts existing social graph generation models to construct a synthetic social graph. We conduct comprehensive experiments over four real datasets to evaluate the quality of the obtained synthetic graphs, using a variety of metrics, including (i) important graph structural measures; (ii) quality of community discovery; and (iii) applicability in social recommendation. Our experiments show that the proposed technique produces high-quality synthetic graphs that well represent the original decentralized social graphs, and significantly outperform those from baseline approaches.","PeriodicalId":191367,"journal":{"name":"Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"157","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3133956.3134086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 157

Abstract

A large amount of valuable information resides in decentralized social graphs, where no entity has access to the complete graph structure. Instead, each user maintains locally a limited view of the graph. For example, in a phone network, each user keeps a contact list locally in her phone, and does not have access to other users' contacts. The contact lists of all users form an implicit social graph that could be very useful to study the interaction patterns among different populations. However, due to privacy concerns, one could not simply collect the unfettered local views from users and reconstruct a decentralized social network. In this paper, we investigate techniques to ensure local differential privacy of individuals while collecting structural information and generating representative synthetic social graphs. We show that existing local differential privacy and synthetic graph generation techniques are insufficient for preserving important graph properties, due to excessive noise injection, inability to retain important graph structure, or both. Motivated by this, we propose LDPGen, a novel multi-phase technique that incrementally clusters users based on their connections to different partitions of the whole population. Every time a user reports information, LDPGen carefully injects noise to ensure local differential privacy.We derive optimal parameters in this process to cluster structurally-similar users together. Once a good clustering of users is obtained, LDPGen adapts existing social graph generation models to construct a synthetic social graph. We conduct comprehensive experiments over four real datasets to evaluate the quality of the obtained synthetic graphs, using a variety of metrics, including (i) important graph structural measures; (ii) quality of community discovery; and (iii) applicability in social recommendation. Our experiments show that the proposed technique produces high-quality synthetic graphs that well represent the original decentralized social graphs, and significantly outperform those from baseline approaches.

查看原文本刊更多论文

生成具有局部差分隐私的合成分散社交图

大量有价值的信息存在于分散的社交图中，没有实体可以访问完整的图结构。相反，每个用户在本地维护一个有限的图视图。例如，在电话网络中，每个用户都在自己的手机中本地保存一个联系人列表，并且无法访问其他用户的联系人。所有用户的联系人列表形成了一个隐式社交图，这对于研究不同人群之间的交互模式非常有用。然而，由于隐私问题，不能简单地收集用户不受约束的本地观点，重建一个去中心化的社交网络。在本文中，我们研究了在收集结构信息和生成具有代表性的合成社交图的同时确保个体局部差异隐私的技术。我们发现现有的局部差分隐私和合成图生成技术不足以保留重要的图属性，这是由于过度的噪声注入，无法保留重要的图结构，或者两者兼而有之。受此启发，我们提出了LDPGen，这是一种新的多阶段技术，它基于用户与整个人群中不同分区的连接来增量地聚类用户。每次用户报告信息时，LDPGen都会小心地注入噪声以确保本地差异隐私。在此过程中，我们得到最优参数，将结构相似的用户聚类在一起。一旦获得了良好的用户聚类，LDPGen就会采用现有的社交图生成模型来构建合成的社交图。我们在四个真实数据集上进行了全面的实验，使用各种度量来评估获得的合成图的质量，包括(i)重要的图结构度量;(ii)社区发现的质量;(三)社会推荐的适用性。我们的实验表明，所提出的技术产生高质量的合成图，很好地代表了原始的分散社会图，并且显著优于基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

自引率

0.00%

发文量