{"title":"基于分区的差异私有合成数据生成","authors":"Meifan Zhang , Dihang Deng , Lihua Yin","doi":"10.1016/j.ins.2025.122675","DOIUrl":null,"url":null,"abstract":"<div><div>Private synthetic data sharing is beneficial as it better retains the distribution and nuances of the original data compared to summary statistics such as means and frequencies. Current state-of-the-art methods follow a select-measure-generate paradigm, but measuring large-domain marginals often leads to significant errors, and managing the privacy budget poses challenges. Our partition-based approach addresses these issues, effectively reducing errors and improving the quality of synthetic data, even with a limited privacy budget. Experimental results show that our method outperforms existing approaches, yielding synthetic data with enhanced quality and utility, making it a preferred option for private data sharing.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122675"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Partition-based differentially private synthetic data generation\",\"authors\":\"Meifan Zhang , Dihang Deng , Lihua Yin\",\"doi\":\"10.1016/j.ins.2025.122675\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Private synthetic data sharing is beneficial as it better retains the distribution and nuances of the original data compared to summary statistics such as means and frequencies. Current state-of-the-art methods follow a select-measure-generate paradigm, but measuring large-domain marginals often leads to significant errors, and managing the privacy budget poses challenges. Our partition-based approach addresses these issues, effectively reducing errors and improving the quality of synthetic data, even with a limited privacy budget. Experimental results show that our method outperforms existing approaches, yielding synthetic data with enhanced quality and utility, making it a preferred option for private data sharing.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"723 \",\"pages\":\"Article 122675\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525008084\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008084","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Partition-based differentially private synthetic data generation
Private synthetic data sharing is beneficial as it better retains the distribution and nuances of the original data compared to summary statistics such as means and frequencies. Current state-of-the-art methods follow a select-measure-generate paradigm, but measuring large-domain marginals often leads to significant errors, and managing the privacy budget poses challenges. Our partition-based approach addresses these issues, effectively reducing errors and improving the quality of synthetic data, even with a limited privacy budget. Experimental results show that our method outperforms existing approaches, yielding synthetic data with enhanced quality and utility, making it a preferred option for private data sharing.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.