Defending Against Membership Inference Attacks on Beacon Services

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Privacy and Security Pub Date : 2023-07-19 DOI:https://dl.acm.org/doi/10.1145/3603627

Rajagopal Venkatesaramani, Zhiyu Wan, Bradley A. Malin, Yevgeniy Vorobeychik

{"title":"Defending Against Membership Inference Attacks on Beacon Services","authors":"Rajagopal Venkatesaramani, Zhiyu Wan, Bradley A. Malin, Yevgeniy Vorobeychik","doi":"https://dl.acm.org/doi/10.1145/3603627","DOIUrl":null,"url":null,"abstract":"<p>Large genomic datasets are created through numerous activities, including recreational genealogical investigations, biomedical research, and clinical care. At the same time, genomic data has become valuable for reuse beyond their initial point of collection, but privacy concerns often hinder access. Beacon services have emerged to broaden accessibility to such data. These services enable users to query for the presence of a particular minor allele in a dataset, and information helps care providers determine if genomic variation is spurious or has some known clinical indication. However, various studies have shown that this process can leak information regarding if individuals are members of the underlying dataset. There are various approaches to mitigate this vulnerability, but they are limited in that they (1) typically rely on heuristics to add noise to the Beacon responses; (2) offer probabilistic privacy guarantees only, neglecting data utility; and (3) assume a batch setting where all queries arrive at once. In this article, we present a novel algorithmic framework to ensure privacy in a Beacon service setting with a minimal number of query response flips. We represent this problem as one of combinatorial optimization in both the batch setting and the online setting (where queries arrive sequentially). We introduce principled algorithms with both privacy and, in some cases, worst-case utility guarantees. Moreover, through extensive experiments, we show that the proposed approaches significantly outperform the state of the art in terms of privacy and utility, using a dataset consisting of 800 individuals and 1.3 million single nucleotide variants.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"1 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Privacy and Security","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3603627","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Large genomic datasets are created through numerous activities, including recreational genealogical investigations, biomedical research, and clinical care. At the same time, genomic data has become valuable for reuse beyond their initial point of collection, but privacy concerns often hinder access. Beacon services have emerged to broaden accessibility to such data. These services enable users to query for the presence of a particular minor allele in a dataset, and information helps care providers determine if genomic variation is spurious or has some known clinical indication. However, various studies have shown that this process can leak information regarding if individuals are members of the underlying dataset. There are various approaches to mitigate this vulnerability, but they are limited in that they (1) typically rely on heuristics to add noise to the Beacon responses; (2) offer probabilistic privacy guarantees only, neglecting data utility; and (3) assume a batch setting where all queries arrive at once. In this article, we present a novel algorithmic framework to ensure privacy in a Beacon service setting with a minimal number of query response flips. We represent this problem as one of combinatorial optimization in both the batch setting and the online setting (where queries arrive sequentially). We introduce principled algorithms with both privacy and, in some cases, worst-case utility guarantees. Moreover, through extensive experiments, we show that the proposed approaches significantly outperform the state of the art in terms of privacy and utility, using a dataset consisting of 800 individuals and 1.3 million single nucleotide variants.

查看原文本刊更多论文

防范信标服务的成员推理攻击

大型基因组数据集是通过许多活动创建的，包括娱乐性家谱调查、生物医学研究和临床护理。与此同时，基因组数据在最初的收集点之外的重用也变得很有价值，但隐私问题往往阻碍了访问。信标服务的出现扩大了对这些数据的可访问性。这些服务使用户能够查询数据集中是否存在特定的次要等位基因，信息可以帮助医疗服务提供者确定基因组变异是虚假的还是有一些已知的临床指征。然而，各种研究表明，这个过程可能会泄露有关个人是否是底层数据集成员的信息。有多种方法可以缓解此漏洞，但它们的局限性在于:(1)通常依赖于启发式方法向Beacon响应添加噪声;(2)仅提供概率隐私保障，忽略数据效用;(3)假设所有查询一次到达的批处理设置。在本文中，我们提出了一种新的算法框架，以确保在Beacon服务设置中使用最少数量的查询响应翻转来保护隐私。我们将这个问题表示为批处理设置和在线设置(查询顺序到达)中的组合优化之一。我们引入了具有隐私性的原则算法，在某些情况下，还具有最坏情况效用保证。此外，通过广泛的实验，我们表明，所提出的方法在隐私和实用性方面明显优于最新技术，使用由800个个体和130万个单核苷酸变体组成的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Privacy and Security Computer Science-General Computer Science

CiteScore

5.20

自引率

0.00%

发文量

期刊介绍： ACM Transactions on Privacy and Security (TOPS) (formerly known as TISSEC) publishes high-quality research results in the fields of information and system security and privacy. Studies addressing all aspects of these fields are welcomed, ranging from technologies, to systems and applications, to the crafting of policies.