Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases

Priyanka Nanayakkara, Johes Bater, Xi He, J. Hullman, Jennie Duggan
{"title":"Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases","authors":"Priyanka Nanayakkara, Johes Bater, Xi He, J. Hullman, Jennie Duggan","doi":"10.2478/popets-2022-0058","DOIUrl":null,"url":null,"abstract":"Abstract Organizations often collect private data and release aggregate statistics for the public’s benefit. If no steps toward preserving privacy are taken, adversaries may use released statistics to deduce unauthorized information about the individuals described in the private dataset. Differentially private algorithms address this challenge by slightly perturbing underlying statistics with noise, thereby mathematically limiting the amount of information that may be deduced from each data release. Properly calibrating these algorithms—and in turn the disclosure risk for people described in the dataset—requires a data curator to choose a value for a privacy budget parameter, ɛ. However, there is little formal guidance for choosing ɛ, a task that requires reasoning about the probabilistic privacy–utility tradeoff. Furthermore, choosing ɛ in the context of statistical inference requires reasoning about accuracy trade-offs in the presence of both measurement error and differential privacy (DP) noise. We present Visualizing Privacy (ViP), an interactive interface that visualizes relationships between ɛ, accuracy, and disclosure risk to support setting and splitting ɛ among queries. As a user adjusts ɛ, ViP dynamically updates visualizations depicting expected accuracy and risk. ViP also has an inference setting, allowing a user to reason about the impact of DP noise on statistical inferences. Finally, we present results of a study where 16 research practitioners with little to no DP background completed a set of tasks related to setting ɛ using both ViP and a control. We find that ViP helps participants more correctly answer questions related to judging the probability of where a DP-noised release is likely to fall and comparing between DP-noised and non-private confidence intervals.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"601 - 618"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/popets-2022-0058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Abstract Organizations often collect private data and release aggregate statistics for the public’s benefit. If no steps toward preserving privacy are taken, adversaries may use released statistics to deduce unauthorized information about the individuals described in the private dataset. Differentially private algorithms address this challenge by slightly perturbing underlying statistics with noise, thereby mathematically limiting the amount of information that may be deduced from each data release. Properly calibrating these algorithms—and in turn the disclosure risk for people described in the dataset—requires a data curator to choose a value for a privacy budget parameter, ɛ. However, there is little formal guidance for choosing ɛ, a task that requires reasoning about the probabilistic privacy–utility tradeoff. Furthermore, choosing ɛ in the context of statistical inference requires reasoning about accuracy trade-offs in the presence of both measurement error and differential privacy (DP) noise. We present Visualizing Privacy (ViP), an interactive interface that visualizes relationships between ɛ, accuracy, and disclosure risk to support setting and splitting ɛ among queries. As a user adjusts ɛ, ViP dynamically updates visualizations depicting expected accuracy and risk. ViP also has an inference setting, allowing a user to reason about the impact of DP noise on statistical inferences. Finally, we present results of a study where 16 research practitioners with little to no DP background completed a set of tasks related to setting ɛ using both ViP and a control. We find that ViP helps participants more correctly answer questions related to judging the probability of where a DP-noised release is likely to fall and comparing between DP-noised and non-private confidence intervals.
可视化差异私有数据发布中的隐私效用权衡
为了公众的利益,组织经常收集私人数据并发布汇总统计数据。如果不采取保护隐私的措施,攻击者可能会使用已发布的统计数据来推断有关私有数据集中所描述的个人的未经授权的信息。差分私有算法通过用噪声稍微干扰底层统计数据来解决这一挑战,从而在数学上限制了可以从每次数据发布中推断出的信息量。正确地校准这些算法,以及数据集中所描述的人的披露风险,需要数据管理员为隐私预算参数选择一个值。然而,很少有关于选择的正式指导,这是一项需要对概率隐私-效用权衡进行推理的任务。此外,在统计推断的上下文中选择*需要在测量误差和差分隐私(DP)噪声存在的情况下对精度权衡进行推理。我们提出了可视化隐私(ViP),这是一个交互式界面,可以将数据、准确性和披露风险之间的关系可视化,以支持在查询中设置和分割数据。当用户调整系数时,ViP动态地更新描绘预期精度和风险的可视化。ViP也有一个推断设置,允许用户推断DP噪声对统计推断的影响。最后,我们展示了一项研究的结果,在这项研究中,16名几乎没有DP背景的研究从业者完成了一组与使用ViP和对照设置相关的任务。我们发现ViP可以帮助参与者更正确地回答与判断dp噪声释放可能落在哪里的概率以及比较dp噪声和非私有置信区间有关的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信