{"title":"Data utility and privacy protection trade-off in k-anonymisation","authors":"G. Loukides, J. Shao","doi":"10.1145/1379287.1379296","DOIUrl":null,"url":null,"abstract":"K-anonymisation is an approach to protecting privacy contained within a dataset. A good k-anonymisation algorithm should anonymise a dataset in such a way that private information contained within it is hidden, yet the anonymised data is still useful in intended applications. However, maximising both data utility and privacy protection in k-anonymisation is not possible. Existing methods derive k-anonymisations by trying to maximise utility while satisfying a required level of protection. In this paper, we propose a method that attempts to optimise the trade-off between utility and protection. We introduce a measure that captures both utility and protection, and an algorithm that exploits this measure using a combination of clustering and partitioning techniques. Our experiments show that the proposed method is capable of producing k-anonymisations with required utility and protection trade-off and with a performance scalable to large datasets.","PeriodicalId":245552,"journal":{"name":"International Conference on Pattern Analysis and Intelligent Systems","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Pattern Analysis and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1379287.1379296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36
Abstract
K-anonymisation is an approach to protecting privacy contained within a dataset. A good k-anonymisation algorithm should anonymise a dataset in such a way that private information contained within it is hidden, yet the anonymised data is still useful in intended applications. However, maximising both data utility and privacy protection in k-anonymisation is not possible. Existing methods derive k-anonymisations by trying to maximise utility while satisfying a required level of protection. In this paper, we propose a method that attempts to optimise the trade-off between utility and protection. We introduce a measure that captures both utility and protection, and an algorithm that exploits this measure using a combination of clustering and partitioning techniques. Our experiments show that the proposed method is capable of producing k-anonymisations with required utility and protection trade-off and with a performance scalable to large datasets.
k -匿名是一种保护数据集中包含的隐私的方法。一个好的k-匿名算法应该匿名化一个数据集,这样包含在其中的私人信息是隐藏的,但匿名数据在预期的应用程序中仍然是有用的。然而,在k-匿名中最大化数据效用和隐私保护是不可能的。现有的方法通过试图最大化效用来获得k-匿名,同时满足所需的保护水平。在本文中,我们提出了一种尝试优化效用与保护之间权衡的方法。我们介绍了一种同时捕获效用和保护的度量,以及一种使用聚类和分区技术组合利用该度量的算法。我们的实验表明,所提出的方法能够产生具有所需效用和保护权衡的k-匿名,并且具有可扩展到大型数据集的性能。