Privacy-Preserving Fingerprinting Against Collusion and Correlation Threats in Genomic Data

Tianxi Ji, Erman Ayday, Emre Yilmaz, Pan Li
{"title":"Privacy-Preserving Fingerprinting Against Collusion and Correlation Threats in Genomic Data","authors":"Tianxi Ji, Erman Ayday, Emre Yilmaz, Pan Li","doi":"10.56553/popets-2024-0098","DOIUrl":null,"url":null,"abstract":"Sharing genomic databases is critical to the collaborative research in computational biology. A shared database is more informative than specific genome-wide association studies (GWAS) statistics as it enables do-it-yourself calculations. Genomic databases involve intellectual efforts from the curator and sensitive information of participants, thus in the course of data sharing, the curator (database owner) should be able to prevent unauthorized redistributions and protect genomic data privacy. As it becomes increasingly common for a single database be shared with multiple recipients, the shared genomic database should also be robust against collusion attack, where multiple malicious recipients combine their individual copies to forge a pirated one with the hope that none of them can be traced back. The strong correlation among genomic entries also make the shared database vulnerable to attacks that leverage the public correlation models. In this paper, we assess the robustness of shared genomic database under both collusion and correlation threats. To this end, we first develop a novel genomic database fingerprinting scheme, called Gen-Scope. It achieves both copyright protection (by enabling traceability) and privacy preservation (via local differential privacy) for the shared genomic databases. To defend against collusion attacks, we augment Gen-Scope with a powerful traitor tracing technique, i.e., the Tardos codes. Via experiments using a real-world genomic database, we show that Gen-Scope achieves strong fingerprint robustness, e.g., the fingerprint cannot be compromised even if the attacker changes 45% of the entries in its received fingerprinted copy and colluders will be detected with high probability. Additionally, Gen-Scope outperforms the considered baseline methods. Under the same privacy and copyright guarantees, the accuracy of the fingerprinted genomic database obtained by Gen-Scope is around 10% higher than that achieved by the baseline, and in terms of preservations of GWAS statistics, the consistency of variant-phenotype associations can be about 20% higher. Notably, we also empirically show that Gen-Scope can identify at least one of the colluders even if malicious receipts collude after independent correlation attacks.","PeriodicalId":519525,"journal":{"name":"Proceedings on Privacy Enhancing Technologies","volume":"19 15","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56553/popets-2024-0098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sharing genomic databases is critical to the collaborative research in computational biology. A shared database is more informative than specific genome-wide association studies (GWAS) statistics as it enables do-it-yourself calculations. Genomic databases involve intellectual efforts from the curator and sensitive information of participants, thus in the course of data sharing, the curator (database owner) should be able to prevent unauthorized redistributions and protect genomic data privacy. As it becomes increasingly common for a single database be shared with multiple recipients, the shared genomic database should also be robust against collusion attack, where multiple malicious recipients combine their individual copies to forge a pirated one with the hope that none of them can be traced back. The strong correlation among genomic entries also make the shared database vulnerable to attacks that leverage the public correlation models. In this paper, we assess the robustness of shared genomic database under both collusion and correlation threats. To this end, we first develop a novel genomic database fingerprinting scheme, called Gen-Scope. It achieves both copyright protection (by enabling traceability) and privacy preservation (via local differential privacy) for the shared genomic databases. To defend against collusion attacks, we augment Gen-Scope with a powerful traitor tracing technique, i.e., the Tardos codes. Via experiments using a real-world genomic database, we show that Gen-Scope achieves strong fingerprint robustness, e.g., the fingerprint cannot be compromised even if the attacker changes 45% of the entries in its received fingerprinted copy and colluders will be detected with high probability. Additionally, Gen-Scope outperforms the considered baseline methods. Under the same privacy and copyright guarantees, the accuracy of the fingerprinted genomic database obtained by Gen-Scope is around 10% higher than that achieved by the baseline, and in terms of preservations of GWAS statistics, the consistency of variant-phenotype associations can be about 20% higher. Notably, we also empirically show that Gen-Scope can identify at least one of the colluders even if malicious receipts collude after independent correlation attacks.
针对基因组数据中的串通和相关威胁的隐私保护指纹技术
共享基因组数据库对计算生物学的合作研究至关重要。与特定的全基因组关联研究(GWAS)统计数据相比,共享数据库的信息量更大,因为它可以让人们自己动手进行计算。基因组数据库涉及库主的智力努力和参与者的敏感信息,因此在数据共享过程中,库主(数据库所有者)应能够防止未经授权的再分发并保护基因组数据隐私。单个数据库与多个接收者共享的情况越来越普遍,因此共享的基因组数据库还应具有强大的防串通攻击能力,即多个恶意接收者将各自的副本合在一起,伪造出一份盗版副本,并希望无法追查到他们中的任何一个。基因组条目之间的强相关性也使共享数据库容易受到利用公共相关性模型的攻击。在本文中,我们评估了共享基因组数据库在串通和相关性威胁下的稳健性。为此,我们首先开发了一种名为 Gen-Scope 的新型基因组数据库指纹识别方案。它为共享基因组数据库实现了版权保护(通过实现可追溯性)和隐私保护(通过本地差分隐私)。为了抵御串通攻击,我们利用强大的叛徒追踪技术(即 Tardos 代码)增强了 Gen-Scope。通过使用真实世界的基因组数据库进行实验,我们发现Gen-Scope具有很强的指纹鲁棒性,例如,即使攻击者更改了其接收的指纹副本中45%的条目,指纹也不会被破坏,而且串通者会被高概率地检测到。此外,Gen-Scope 还优于其他基准方法。在相同的隐私和版权保障条件下,Gen-Scope 所获得的指纹基因组数据库的准确性比基线方法高出约 10%;在保留 GWAS 统计数据方面,变体与表型关联的一致性可高出约 20%。值得注意的是,我们还通过实证证明,即使恶意收据在独立相关性攻击后串通一气,Gen-Scope 也能识别出至少一个串通者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信