{"title":"具有局部差分隐私的多维数据发布","authors":"Gaoyuan Liu, Peng Tang, Chengyu Hu, Chongshi Jin, Shanqing Guo","doi":"10.48786/edbt.2023.15","DOIUrl":null,"url":null,"abstract":"This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"18 1","pages":"183-194"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multi-Dimensional Data Publishing With Local Differential Privacy\",\"authors\":\"Gaoyuan Liu, Peng Tang, Chengyu Hu, Chongshi Jin, Shanqing Guo\",\"doi\":\"10.48786/edbt.2023.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.\",\"PeriodicalId\":88813,\"journal\":{\"name\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"volume\":\"18 1\",\"pages\":\"183-194\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48786/edbt.2023.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Dimensional Data Publishing With Local Differential Privacy
This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.