Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding.

Proceedings of the ... Asia-Pacific bioinformatics conference Pub Date : 2023-04-01

Nour Almadhoun Alserr, Gulce Kale, Onur Mutlu, Oznur Tastan, Erman Ayday

{"title":"Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding.","authors":"Nour Almadhoun Alserr, Gulce Kale, Onur Mutlu, Oznur Tastan, Erman Ayday","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Researchers need a rich trove of genomic datasets that they can leverage to gain a better understanding of the genetic basis of the human genome and identify associations between phenol-types and specific parts of DNA. However, sharing genomic datasets that include sensitive genetic or medical information of individuals can lead to serious privacy-related consequences if data lands in the wrong hands. Restricting access to genomic datasets is one solution, but this greatly reduces their usefulness for research purposes. To allow sharing of genomic datasets while addressing these privacy concerns, several studies propose privacy-preserving mechanisms for data sharing. Differential privacy is one of such mechanisms that formalize rigorous mathematical foundations to provide privacy guarantees while sharing aggregated statistical information about a dataset. Nevertheless, it has been shown that the original privacy guarantees of DP-based solutions degrade when there are dependent tuples in the dataset, which is a common scenario for genomic datasets (due to the existence of family members). In this work, we introduce a new mechanism to mitigate the vulnerabilities of the inference attacks on differentially private query results from genomic datasets including dependent tuples. We propose a utility-maximizing and privacy-preserving approach for sharing statistics by hiding selective SNPs of the family members as they participate in a genomic dataset. By evaluating our mechanism on a real-world genomic dataset, we empirically demonstrate that our proposed mechanism can achieve up to 40% better privacy than state-of-the-art DP-based solutions, while near-optimally minimizing utility loss.</p>","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"2023 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10306260/pdf/nihms-1902817.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Researchers need a rich trove of genomic datasets that they can leverage to gain a better understanding of the genetic basis of the human genome and identify associations between phenol-types and specific parts of DNA. However, sharing genomic datasets that include sensitive genetic or medical information of individuals can lead to serious privacy-related consequences if data lands in the wrong hands. Restricting access to genomic datasets is one solution, but this greatly reduces their usefulness for research purposes. To allow sharing of genomic datasets while addressing these privacy concerns, several studies propose privacy-preserving mechanisms for data sharing. Differential privacy is one of such mechanisms that formalize rigorous mathematical foundations to provide privacy guarantees while sharing aggregated statistical information about a dataset. Nevertheless, it has been shown that the original privacy guarantees of DP-based solutions degrade when there are dependent tuples in the dataset, which is a common scenario for genomic datasets (due to the existence of family members). In this work, we introduce a new mechanism to mitigate the vulnerabilities of the inference attacks on differentially private query results from genomic datasets including dependent tuples. We propose a utility-maximizing and privacy-preserving approach for sharing statistics by hiding selective SNPs of the family members as they participate in a genomic dataset. By evaluating our mechanism on a real-world genomic dataset, we empirically demonstrate that our proposed mechanism can achieve up to 40% better privacy than state-of-the-art DP-based solutions, while near-optimally minimizing utility loss.

本刊更多论文

使用选择性SNP隐藏调整基因组研究中的隐私-效用权衡。

研究人员需要丰富的基因组数据集，以便更好地了解人类基因组的遗传基础，并确定酚类型与DNA特定部分之间的关联。然而，共享包含个人敏感遗传或医学信息的基因组数据集，如果数据落入坏人之手，可能会导致与隐私相关的严重后果。限制对基因组数据集的访问是一种解决方案，但这大大降低了它们对研究目的的有用性。为了在解决这些隐私问题的同时允许共享基因组数据集，一些研究提出了数据共享的隐私保护机制。差分隐私是这样一种机制，它形式化了严格的数学基础，在共享关于数据集的聚合统计信息的同时提供隐私保证。然而，已有研究表明，当数据集中存在依赖元组时，基于dp的解决方案的原始隐私保证会降低，这是基因组数据集的常见场景(由于家庭成员的存在)。在这项工作中，我们引入了一种新的机制来减轻对包含依赖元组的基因组数据集的差异私有查询结果的推理攻击的漏洞。我们提出了一种效用最大化和隐私保护的方法，通过隐藏家庭成员参与基因组数据集时的选择性snp来共享统计数据。通过在真实世界的基因组数据集上评估我们的机制，我们通过经验证明，我们提出的机制可以比最先进的基于dp的解决方案实现高达40%的隐私保护，同时近乎最优地减少效用损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... Asia-Pacific bioinformatics conference

自引率

0.00%

发文量