通过最优k-匿名化实现数据隐私

21st International Conference on Data Engineering (ICDE'05) Pub Date : 2005-04-05 DOI:10.1109/ICDE.2005.42

R. Bayardo, R. Agrawal

{"title":"通过最优k-匿名化实现数据隐私","authors":"R. Bayardo, R. Agrawal","doi":"10.1109/ICDE.2005.42","DOIUrl":null,"url":null,"abstract":"Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k - 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a non-trivial dataset under a general model of the problem.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1327","resultStr":"{\"title\":\"Data privacy through optimal k-anonymization\",\"authors\":\"R. Bayardo, R. Agrawal\",\"doi\":\"10.1109/ICDE.2005.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k - 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a non-trivial dataset under a general model of the problem.\",\"PeriodicalId\":297231,\"journal\":{\"name\":\"21st International Conference on Data Engineering (ICDE'05)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1327\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"21st International Conference on Data Engineering (ICDE'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2005.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Data Engineering (ICDE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2005.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1327

摘要

数据去识别化协调了为研究目的而发布数据的需求和对个人隐私的需求。本文提出并评估了一种被称为k-匿名化的强大去识别过程的优化算法。k匿名数据集具有这样的属性，即每个记录与至少k- 1个其他记录无法区分。即使是优化k-匿名的简单限制也是np困难的，这导致了重大的计算挑战。我们提出了一种新的方法来探索可能的匿名化空间，这种方法可以驯服问题的组合，并开发数据管理策略来减少对昂贵操作(如排序)的依赖。通过对真实人口普查数据的实验，我们表明所得到的算法可以在两种具有代表性的成本度量和广泛的k范围下找到最优k-匿名化。我们还表明，在输入数据或输入参数无法在合理时间内找到最优解的情况下，该算法可以产生良好的匿名化。最后，我们使用该算法探讨了不同编码方法和问题变化对匿名化质量和性能的影响。据我们所知，这是在该问题的一般模型下展示非平凡数据集的最佳k-匿名化的第一个结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data privacy through optimal k-anonymization

Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k - 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a non-trivial dataset under a general model of the problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

21st International Conference on Data Engineering (ICDE'05)

自引率

0.00%

发文量