A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data

2014 IEEE 27th International Symposium on Computer-Based Medical Systems Pub Date : 2014-05-27 DOI:10.1109/CBMS.2014.85

F. Prasser, F. Kohlmayer, K. Kuhn

{"title":"A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data","authors":"F. Prasser, F. Kohlmayer, K. Kuhn","doi":"10.1109/CBMS.2014.85","DOIUrl":null,"url":null,"abstract":"Collaboration and data sharing have become core elements of biomedical research. At the same time, there is a growing understanding of privacy threats related to data sharing, especially when sensitive data from distributed sources become available for linkage. Statistical disclosure control comprises well-known data anonymization techniques that allow the protection of data by introducing fuzziness. To protect datasets from different types of threats, different privacy criteria are commonly implemented. Data anonymization is an important measure, but it is computationally complex, and it can significantly reduce the expressiveness of data. To attenuate these problems, a number of algorithms has been proposed, which aim at increasing data quality or improving efficiency. Previous evaluations of such algorithms lack a systematic approach, as they focus on specific algorithms, specific privacy criteria, and specific runtime environments. Therefore, it is difficult for decision makers to decide which algorithm is best suited for their requirements. As a first step towards a comprehensive and systematic evaluation of anonymity algorithms, we report on our ongoing efforts for providing an open source benchmark. In this contribution, we focus on optimal algorithms utilizing global recoding with full-domain generalization. We present a systematic evaluation of domain-specific algorithms and generic search methods for a broad set of privacy criteria, including k-anonymity, l-diversity, t-closeness and d-presence, and their use in multiple real-world datasets. Our results show that there is no single solution fitting all needs, and that generic search methods can outperform highly specialized algorithms.","PeriodicalId":398710,"journal":{"name":"2014 IEEE 27th International Symposium on Computer-Based Medical Systems","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 27th International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2014.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Collaboration and data sharing have become core elements of biomedical research. At the same time, there is a growing understanding of privacy threats related to data sharing, especially when sensitive data from distributed sources become available for linkage. Statistical disclosure control comprises well-known data anonymization techniques that allow the protection of data by introducing fuzziness. To protect datasets from different types of threats, different privacy criteria are commonly implemented. Data anonymization is an important measure, but it is computationally complex, and it can significantly reduce the expressiveness of data. To attenuate these problems, a number of algorithms has been proposed, which aim at increasing data quality or improving efficiency. Previous evaluations of such algorithms lack a systematic approach, as they focus on specific algorithms, specific privacy criteria, and specific runtime environments. Therefore, it is difficult for decision makers to decide which algorithm is best suited for their requirements. As a first step towards a comprehensive and systematic evaluation of anonymity algorithms, we report on our ongoing efforts for providing an open source benchmark. In this contribution, we focus on optimal algorithms utilizing global recoding with full-domain generalization. We present a systematic evaluation of domain-specific algorithms and generic search methods for a broad set of privacy criteria, including k-anonymity, l-diversity, t-closeness and d-presence, and their use in multiple real-world datasets. Our results show that there is no single solution fitting all needs, and that generic search methods can outperform highly specialized algorithms.

查看原文本刊更多论文

生物医学数据全局最优匿名化方法的基准

协作和数据共享已成为生物医学研究的核心要素。与此同时，人们对与数据共享相关的隐私威胁有了越来越多的了解，特别是当来自分布式源的敏感数据可以用于链接时。统计披露控制包括众所周知的数据匿名化技术，它允许通过引入模糊性来保护数据。为了保护数据集免受不同类型的威胁，通常会实现不同的隐私标准。数据匿名化是一种重要的度量方法，但其计算复杂，并且会显著降低数据的表达能力。为了减轻这些问题，已经提出了一些算法，旨在提高数据质量或提高效率。以前对这些算法的评估缺乏系统的方法，因为它们侧重于特定的算法、特定的隐私标准和特定的运行时环境。因此，决策者很难决定哪种算法最适合他们的需求。作为对匿名算法进行全面和系统评估的第一步，我们报告了我们为提供开源基准所做的持续努力。在这篇文章中，我们专注于利用全域泛化的全局重新编码的最优算法。我们提出了一个系统的评估领域特定的算法和广泛的隐私标准，包括k-匿名，l-多样性，t-接近和d-存在的通用搜索方法，以及它们在多个真实世界数据集中的使用。我们的研究结果表明，没有一个单一的解决方案可以满足所有的需求，通用的搜索方法可以胜过高度专业化的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 27th International Symposium on Computer-Based Medical Systems

自引率

0.00%

发文量