{"title":"独立分量分析和方差控制的隐私保护","authors":"Chih-Ming Hsu, Ming-Syan Chen","doi":"10.1145/2063576.2063709","DOIUrl":null,"url":null,"abstract":"The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"9 1","pages":"925-930"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Privacy preservation by independent component analysis and variance control\",\"authors\":\"Chih-Ming Hsu, Ming-Syan Chen\",\"doi\":\"10.1145/2063576.2063709\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.\",\"PeriodicalId\":74507,\"journal\":{\"name\":\"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management\",\"volume\":\"9 1\",\"pages\":\"925-930\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2063576.2063709\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2063576.2063709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
隐私保护的主要目的是保护公开数据集中的个人机密信息。近年来,人们提出了几种基于仿真的隐私保护方法。其思想是生成一个综合数据集,其约束条件是概率分布尽可能接近原始数据集。本文提出了两种基于仿真的多元数值数据隐私保护框架。第一个框架称为PRIMP (PRivacy preserving by Independent coMPonents),基于独立组件分析(ICA)。经验表明,PRIMP在Spearman的秩相关和Kendall的tau相关方面优于其他基于模拟的方法。第二种方法是结合PRIMP和Cholesky分解技术的混合方法。经验表明,混合方法较好地保留了原始数据的协方差矩阵。该方法还解决了基于cholesky的方法产生好的种子的问题。虽然实证结果表明,混合方法在Spearman的秩相关和Kendall的tau相关方面并不总是优于PRIMP方法,但从理论上讲,混合方法下的信息泄露风险远小于PRIMP方法。
Privacy preservation by independent component analysis and variance control
The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.