Differentially private synthetic data generation for robust information fusion

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-06-12 DOI:10.1016/j.inffus.2025.103373

Xiaohong Cai, Yi Sun, Zhaowen Lin, Ripeng Li, Tianwei Cai

{"title":"Differentially private synthetic data generation for robust information fusion","authors":"Xiaohong Cai, Yi Sun, Zhaowen Lin, Ripeng Li, Tianwei Cai","doi":"10.1016/j.inffus.2025.103373","DOIUrl":null,"url":null,"abstract":"<div><div>Synthetic data is crucial in information fusion in term of enhancing data representation and improving system robustness. Among all synthesis methods, deep generative models exhibit excellent performance. However, recent studies have shown that the generation process faces privacy challenges due to the memorization of training instances by generative models. To maximize the benefits of synthesis data while ensuring data security, we propose a novel framework for the generation and utilization of private synthetic data in information fusion processes. Furthermore, we present differential private adaptive fine-tuning (DP-AdaFit), a method for private parameter efficient fine-tuning that applies differential privacy only to the singular values of the incremental updates. In details, DP-AdaFit adaptively adjusts the rank of the low-rank weight increment matrices according to their importance score, and allows us to achieve an equivalent privacy policy by only injecting noise into gradient of the corresponding singular values. Such a novel approach essentially reduces their parameter budget but avoids too much noise introduced by the singular value decomposition. We decrease the cost on memory and computation nearly half of the SOTA, and achieve the FID of 19.2 on CIFAR10. Our results demonstrate that trading off weights contained in the differential privacy fine-tuning parameters can improve model performance, even achieving generation quality competitive with differential privacy full fine-tuning diffusion model. Our code is available at <span><span>DP-AdaFit</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103373"},"PeriodicalIF":14.7000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004464","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Synthetic data is crucial in information fusion in term of enhancing data representation and improving system robustness. Among all synthesis methods, deep generative models exhibit excellent performance. However, recent studies have shown that the generation process faces privacy challenges due to the memorization of training instances by generative models. To maximize the benefits of synthesis data while ensuring data security, we propose a novel framework for the generation and utilization of private synthetic data in information fusion processes. Furthermore, we present differential private adaptive fine-tuning (DP-AdaFit), a method for private parameter efficient fine-tuning that applies differential privacy only to the singular values of the incremental updates. In details, DP-AdaFit adaptively adjusts the rank of the low-rank weight increment matrices according to their importance score, and allows us to achieve an equivalent privacy policy by only injecting noise into gradient of the corresponding singular values. Such a novel approach essentially reduces their parameter budget but avoids too much noise introduced by the singular value decomposition. We decrease the cost on memory and computation nearly half of the SOTA, and achieve the FID of 19.2 on CIFAR10. Our results demonstrate that trading off weights contained in the differential privacy fine-tuning parameters can improve model performance, even achieving generation quality competitive with differential privacy full fine-tuning diffusion model. Our code is available at DP-AdaFit.

查看原文本刊更多论文

基于鲁棒信息融合的差分私有合成数据生成

在信息融合中，合成数据是增强数据表示能力和提高系统鲁棒性的关键。在所有的综合方法中，深度生成模型表现出优异的性能。然而，最近的研究表明，由于生成模型对训练实例的记忆，生成过程面临隐私挑战。为了最大限度地发挥综合数据的效益，同时保证数据的安全性，我们提出了一种新的信息融合过程中私有综合数据的生成和利用框架。此外，我们提出微分私有自适应微调（DP-AdaFit），这是一种仅对增量更新的奇异值应用微分隐私的私有参数有效微调方法。DP-AdaFit根据低秩权重增量矩阵的重要性评分自适应调整其秩，并允许我们仅通过在相应奇异值的梯度中注入噪声来实现等效的隐私策略。这种新方法在减少参数预算的同时，避免了奇异值分解带来的过多噪声。我们将内存和计算成本降低了近一半，并在CIFAR10上实现了19.2的FID。我们的研究结果表明，权衡差分隐私微调参数中包含的权重可以提高模型性能，甚至可以实现与差分隐私完全微调扩散模型竞争的生成质量。我们的代码可以在DP-AdaFit上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.