A Randomized Response Model for Privacy-Preserving Data Dissemination

Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong
{"title":"A Randomized Response Model for Privacy-Preserving Data Dissemination","authors":"Xiaoqian Jiang, Shuang Wang, Zhanglong Ji, L. Ohno-Machado, Li Xiong","doi":"10.1109/HISB.2012.63","DOIUrl":null,"url":null,"abstract":"Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HISB.2012.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Public dissemination of medical data encourages meaningful research and quality improvement. However, there is a big concern that improper disclosure may put sensitive personal information at risk. To maintain the research benefits and customize the privacy protection, we propose a novel and practical randomized response model (k-shuffle) and a statistical information recovery procedure. The former mixes distribution of patient records with samples drawn from k-1 pre-determined distributions to ensure differential privacy. The latter allows data receivers to recover statistical properties (e.g., the mean and variance) of interested sub-populations with accuracy proportional to the size of the sub-population. That is, our algorithm provides stronger privacy protection to smaller groups, and offers high data usability to studies targeted at larger population. Most importantly, with differential privacy guarantee, data receiver cannot reconstruct the record-to-identity mapping for each individual. In summary, our approach offers a scalable privacy-preserving data dissemination mechanism that can be applied in both centralized and distributed fashion, which makes it possible for perturbed data to be outsourced (in the cloud) with mitigated privacy risks. Our experimental results demonstrated the performance of our model in terms of privacy protection, information loss, and classification accuracy using both synthetic and real-world datasets.
隐私保护数据传播的随机响应模型
医疗数据的公开传播鼓励有意义的研究和质量改进。然而,有一个很大的担忧是,不适当的披露可能会使敏感的个人信息处于危险之中。为了保持研究成果的有效性和个性化的隐私保护,我们提出了一种新颖实用的随机响应模型(k-shuffle)和统计信息恢复程序。前者将患者记录的分布与从k-1预先确定的分布中抽取的样本混合在一起,以确保差异隐私。后者允许数据接收者恢复感兴趣的子种群的统计属性(例如,均值和方差),其精度与子种群的大小成正比。也就是说,我们的算法为较小的群体提供了更强的隐私保护,并为针对较大人群的研究提供了高数据可用性。最重要的是,在差分隐私保证下,数据接收方无法重构每个个体的记录到身份映射。总之,我们的方法提供了一种可扩展的保护隐私的数据传播机制,可以以集中式和分布式方式应用,这使得受干扰的数据可以外包(在云中),从而降低隐私风险。我们的实验结果证明了我们的模型在使用合成和真实数据集的隐私保护、信息丢失和分类准确性方面的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信