{"title":"具有样本平衡功能的隐私和分布保护生成式对抗网络","authors":"Haoran Sun , Jinchuan Tang , Shuping Dang , Gaojie Chen","doi":"10.1016/j.eswa.2024.125181","DOIUrl":null,"url":null,"abstract":"<div><p>Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no guarantee for perfect protection on the data point. Due to noisy gradients, the training can converge to a suboptimum, or offer no protection when encountering a noise equilibrium. To address the above issues, this work proposes a balanced Two-Stage DP-GAN (TS-DPGAN) framework. In Stage I, we use a data balancing algorithm with sampling techniques to reduce the bias and learn features from previously undertrained classes. Compared to a sampling strategy with fixed reference, a reference interval is introduced to reduce duplication in oversampling and information loss in undersampling. Then, the framework directly perturbs the balanced samples rather than gradients to achieve data-wise DP and improve sample diversity. Since data balancing uniformizes distribution, a feature-holding strategy was used in Stage II to keep important features from Stage I while restoring the original data distribution. Simulations show our framework outperforms other when compared with the SOTA algorithms on image quality, distribution maintaining, and convergence.</p></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"258 ","pages":"Article 125181"},"PeriodicalIF":7.5000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy and distribution preserving generative adversarial networks with sample balancing\",\"authors\":\"Haoran Sun , Jinchuan Tang , Shuping Dang , Gaojie Chen\",\"doi\":\"10.1016/j.eswa.2024.125181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no guarantee for perfect protection on the data point. Due to noisy gradients, the training can converge to a suboptimum, or offer no protection when encountering a noise equilibrium. To address the above issues, this work proposes a balanced Two-Stage DP-GAN (TS-DPGAN) framework. In Stage I, we use a data balancing algorithm with sampling techniques to reduce the bias and learn features from previously undertrained classes. Compared to a sampling strategy with fixed reference, a reference interval is introduced to reduce duplication in oversampling and information loss in undersampling. Then, the framework directly perturbs the balanced samples rather than gradients to achieve data-wise DP and improve sample diversity. Since data balancing uniformizes distribution, a feature-holding strategy was used in Stage II to keep important features from Stage I while restoring the original data distribution. Simulations show our framework outperforms other when compared with the SOTA algorithms on image quality, distribution maintaining, and convergence.</p></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"258 \",\"pages\":\"Article 125181\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424020487\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424020487","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
差分隐私(DP)生成式对抗网络(GAN)可以生成受保护的合成样本,供下游分析使用。然而,在不平衡的数据集上进行训练会使网络偏向多数类,导致少数类训练不足。同时,DP 中的梯度扰动无法保证对数据点的完美保护。由于梯度存在噪声,训练可能会收敛到次优状态,或者在遇到噪声平衡时无法提供保护。为解决上述问题,本研究提出了一种平衡的两阶段 DP-GAN 框架(TS-DPGAN)。在第一阶段,我们使用数据平衡算法和采样技术来减少偏差,并从以前未训练过的类别中学习特征。与固定参考的采样策略相比,我们引入了一个参考区间,以减少过度采样时的重复和不足采样时的信息丢失。然后,该框架直接扰动平衡样本而不是梯度,以实现数据化 DP 并提高样本多样性。由于数据平衡会统一分布,因此在第二阶段使用了特征保持策略,以保留第一阶段的重要特征,同时恢复原始数据的分布。模拟结果表明,与 SOTA 算法相比,我们的框架在图像质量、保持分布和收敛性方面都优于其他算法。
Privacy and distribution preserving generative adversarial networks with sample balancing
Differential privacy (DP) generative adversarial networks (GANs) can generate protected synthetic samples from downstream analysis. However, training on unbalanced datasets can bias the network towards majority classes, leading minority undertrained. Meanwhile, gradient perturbation in DP has no guarantee for perfect protection on the data point. Due to noisy gradients, the training can converge to a suboptimum, or offer no protection when encountering a noise equilibrium. To address the above issues, this work proposes a balanced Two-Stage DP-GAN (TS-DPGAN) framework. In Stage I, we use a data balancing algorithm with sampling techniques to reduce the bias and learn features from previously undertrained classes. Compared to a sampling strategy with fixed reference, a reference interval is introduced to reduce duplication in oversampling and information loss in undersampling. Then, the framework directly perturbs the balanced samples rather than gradients to achieve data-wise DP and improve sample diversity. Since data balancing uniformizes distribution, a feature-holding strategy was used in Stage II to keep important features from Stage I while restoring the original data distribution. Simulations show our framework outperforms other when compared with the SOTA algorithms on image quality, distribution maintaining, and convergence.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.